In the first edition of this multi-part series, I briefly touched upon the evolution of the machines with respect to our minds and how AI predictions are taken out of context. Also, I made a couple of counter-intuitive observations about jobs of the future. But the key takeaway is: AI is far from magic (at least when this post was written), and we cannot solve real-world problems using AI leaving the humans out of the loop. In this part, I would like to go a little bit deeper to show how real-world implementations of AI will create new ways to make money if not “jobs”. Also, this post is intended for a wide range of audience. So, please bear with me if some parts are too obvious. I promise I didn’t mean to dumb it down.
So what will be some new ways to make money with the ascension of AI? Here is a couple: (while the first one is straightforward for the AI savvy ones, the 2nd one is the surprise element.)
- Human-In-The-Loop (HITL)
- For verifying the AI Outcome
- For tagging or labeling data
- The Unknown gig
Today AI is largely supervised. i.e. real-world experiences are hand fed into an evolving learning agent which gets better at emulating human intelligence in that narrow domain. If that sounds a mouthful and you are not tech savvy, try this. Say, we need to build an image recognition app. The state of the art algorithms today to recognize say, an apple off an image, needs to be trained with 10s of 1000s of “labeled” apple images. Which is to say someone has to manually tag an apple image as “apple”. So for problems at a larger scale, we need 1000s of labeling to be done and the accuracy of the systems are far from perfect.
While simply involving humans in the AI process is HITL AI, there are some subtle nuances to it. Humans can be involved at different phases iteratively as shown in the figure below. One of the well-known phases is taking human’s help to tag data or create training data without which AI solutions cannot comprehend the real world. The other style is to let the human verify and compensate for the lack of accuracy in the AI outcome. But whenever a human overrides the outcome to compensate for the lack accuracy that instance is taken as a training data and fed back to the system so it becomes a perpetual learning machine with the hope that someday it would have caught up with the human.
Due to the rise of crowdsourcing, companies don’t actually have to “hire” all the HITL needed or in fact know who they are or where they are from. While that’s true, the good news is all HITL are not created equal, some cases need special skills and companies might have to hire them, at least until crowdsourcing platforms can attract enough subject matter experts in niche domains like Genome editing or Human Cognition. HITL AI platforms are dime a dozen but really good ones are rare. Figure eight and Gengo are 2 popular crowdsourcing / HITL platforms but Gengo specializes in NLP and Figure eight operates on multiple domains. Let’s take a deep dive into each of these HITL flavors.
HITL for verifying the AI outcome:
Letting machines take the first pass at a problem and allowing the human in the loop to override or change the outcome of the machines if required is the idea behind HITL. The accuracy (or the lack of it) of the state of the art AI forced market facing AI solutions to keep humans in the loop to compensate.
Some real-world use cases:
Consider this case to understand why HITL gigs are crucial and very real.
Imagine a home security company training a model to detects and report intrusions. Typically they combine multiple parameters like movements in the camera feed, sounds, safety score of the neighborhood and if the residents are on a vacation etc. But notoriously these systems raise false alarms, even if the system is 95% accurate, the 5% inaccuracy can create enough customer dissatisfaction, not to mention the cost to the company. Because they would scramble their security team to reach the site for every false alarm comes with a cost. The number of houses installed in a neighborhood x 5% false alarms, you do the math. Do you think the customers will care about the 95% accuracy? So the intrusion detection system has to take the first pass and the security team has to take a judgment call to compensate for the 5% inaccuracy. While HITL verifying the outcome is crucial, they come with different skill requirements.
Contents of platforms like Quora, Facebook and Amazon (user reviews) all have tighter regulations around their content. For instance, they moderate for harassment, strong language & abuse, spam and graphic content, but they all take a hybrid approach. Looks like the idea is to let humans and machines play to their strengths in moderating content and it will make sense if you think about it :
- Sheer volume and velocity: you simply cannot have enough people to manually mark the volume of content at the velocity they come into Facebook or Quora, it’s near impossible.
- We simply don’t have such deep AI technologies yet that can automatically classify content with perfect accuracy.
So there should be a similar workflow in place where a “system” does the 1st round and presents the results to the HITL. The results might have a confidence/ probabilistic score, which the HITL can use to their advantage. There are implementations which only flag the content with a confidence score less than the threshold and sends notifications to a user calling for an action. In other instances, the users shall be given a user interface like below with some useful filter options, the users can use their best judgment by simply running through this list or filter only content with confidence score say, less than 70% depending on their experiences with the system and its accuracy. It really depends on the use case.
As you can see in the Quora, Amazon or Facebook case the HITL doesn’t really need to be “skilled” per se and it’s not a lot of work (say, they don’t have to understand the subject matter or fact check it to moderate it). They just have to apply common sense and know how to respond the content notifications.
While the HITL in the Facebook and Quora cases are employees of the organization, there are community-based platforms like Wikipedia. The user community is the one which creates and moderates the content for quality, abuse, vandalism etc. Writing or Verifying a Wikipedia page on how CRISPR edits genomes needs subject matter experts to uphold the content quality. So editors in the Wikipedia case or a crowdsourced Encylopedia case don’t just need skills but expertise. But then where is the money in Wikipedia editing? Enter Everipedia, think of it as a Blockchain based Wikipedia with monetary incentives for people who contribute quality content. In essence, you get paid for your skills and expertise. You can be a well-meaning local who has exclusive first-hand information about a local event, a language translator willing to translate Wikipedia pages in your local language or a subject matter expert in niche areas willing to author Wikipedia pages. Whoever you are, you get paid for your altruism.
In summary, HITL comes in different shapes and forms. They could be as simple as a high-school educated person leveraging their common sense or as sophisticated as a Rhodes scholar. It totally depends on the business case. So as you can see humans are more important than before with HITL because the best judgments are made by the HITL. Aren’t these unexpected spurts in “jobs”?
HITL for tagging or labeling data:
As for as tagging or creating training data, from day one humans were involved in some capacity, but now the AI community is trying to leverage crowdsourcing platforms like CrowdFlower (now figure-eight), Amazon Mechanical Turk to get their training data like labeled images in the image recognition case for a small cost. Because for data scientists the time is better spent in framing the problem, extracting knowledge from data and creating the necessary impact rather than cleansing and preparing data.
Again, creating training data is not necessarily a mindless job. It might also need different skills based on the use case. The image recognition case was a contrived example and sounds frivolous. Let’s consider a Natural Language or Linguistics related training data. For instance, If the need is to recognize and classify terms or phrases like Apple, Microsoft or IBM or Steve Madden occurring in documents as an “Organisation” or “Brand”, the HITL is expected to identify, differentiate between the contexts in which Apple is used (as fruit vs an organisation) and label accordingly. Similarly, there are much more complicated labeling cases where specific domain skills are required.
In both cases i.e. HITL for verifying or for tagging data, are you wondering if the humans are digging their own graves by teaching machines for a small cost? Because machines will “actively learn” and eventually catch up and once that happens there will be nothing more to teach, isn’t it? If that’s your concern, This paper on a HITL incentive model might interest you. It proposes a model to recognize knowledge provenance and incentivize the knowledge owner commensurately.
The Unknown gig
While most of what we discussed isn’t earth-shattering, they are a true and real tangible money making opportunities. But here is what you might not have seen coming. There are going to be some radically unforeseen (at least by most of the general populous) ways to make money.
Fact checker (aka fake news moderator)
Today due to the rise of crowdsourcing and platform economics, most of the content is community or crowd-generated. There are 2 challenges in this model 1. Crowd generated, hence very vulnerable to fake news/reviews. (We all know that It is alleged that U.S 2016 election was tipped off in favor of Trump with fake targetted news in facebook by Cambridge Analytica) 2. Most systems like Facebook and Amazon are centralized, hence vulnerable to censorship and/or collusion. As we speak blockchain based emergent solutions are catching up, say even if there is a e-commerce or a social media or micro-blogging platform based on the blockchain, only censorship problem will be addressed, fake news or astroturfing are hard problems to solve. MediaSifter is a platform trying to solve the fake news problem with its decentralized platform and community. Each curated news article will be investigated for veracity and accuracy by actors like fact checkers, reviewers and supporters. The news is presented with evidence and other alternative perspectives. The community gets incentivized for their contribution. Mediasifter’s tagline is “Decentralize Influence, Distribute Truth” and I love it. It’s only a matter of time other platforms with the community or user-generated content will bring these features.
This is just one example of an unknown gig that will become mainstream soon. There will be more. Technology, in general, has a strange trajectory and has this inexplicable power to create new opportunities organically at an unprecedented pace and scale. This cartoon sums it up.
Sure certain kind of jobs are on the verge of being extinct, some in line are a travel agent or a librarian or a taxi dispatcher. But, AI is not like globalization or outsourcing. It will create more opportunities than it takes away. All you creatives out there your jobs are safe, the rest of the world, keep calm and prepare to complement and coexist with machines, there will be plenty of opportunities.
Over to you now, what are your thoughts?