Technologists have been racing to make computers capable of artificial intelligence more powerful, faster at crunching numbers and more capable of predicting outcomes from data: in many ways, smarter. Artificial intelligence and machine learning have become huge buzzwords in the startup world and across industries from supply chain optimization to healthcare, and the hype for AI and ML shows no sign of abating.
But what if, as international goals shift from Gross Domestic Product to Gross National Happiness, as corporations are nudged to weigh sustainability equally with profit, and the global trend (hopefully) shifts away from rampant capitalism and profit at any cost, the goals of AI research and machine learning also become more humanistic?
What if instead of smarter and more productive, we worked to make artificial intelligence... happy?
Those who can't do, teach
Happiness as a state of being or as a philosophical concept is hard to pin down, but as a chemical process, we have a pretty good idea about the neuroscience of happiness and the functional neuroanatomy required to produce it. And because we've seen how the pursuit of happiness and its corresponding biological reward mechanisms operate on human intelligence (particularly dopamine-driven feedback loops), we've taught artificial intelligence to work in a similar way.
Machine learning processes like reinforcement learning work like training a dog: a "reward" is programmed into the algorithm to encourage the artificial intelligence to work towards a certain outcome through multiple trials. This reward is like a squirt of dopamine to a robot brain, or the cheese at the end of the maze for a simulated mouse.
Reinforcement learning has been proven to work really well for things like teaching AI to play pong or Atari 2600 games. Computers are already better than we'll ever be at video games, and one of the things researchers have found is that they don't always win fairly. AI make excellent speedrunners and can also learn to cheat: winning the game, racking up points, or getting that sweet virtual cheese without even running the maze.
Ignorance is bliss
"Reward hacking" or "specification gaming" could be one of the greatest dangers of unchecked artificial intelligence, or one of the few things that could protect humanity from it running rampant.
On the dangerous side, there are theories like Nick Bostrom's paperclip maximizer, immortalized as a clicking game called Universal Paperclips by Frank Lantz which went viral in late 2017. The paperclip maximizer theory posits that if an AI is incentivized to create as many paperclips as possible, it could, if allowed to run rogue, convert all of the matter of the universe into paperclips. At GDC 2015, AI experts from companies like Lockheed Martin and Magic Leap shared examples of how AI performed unexpectedly in the talk "Tales from the Trenches: AI Disaster Stories." The video prompted one Ian Duncan to recall a deadly AI anecdote from a former professor who had worked at NASA:
"His team was working on running simulations of long-distance manned spaceflight. In particular, the goal of their simulations was to determine an algorithm that would optimally allocate food, water, and electricity to 3 crew members. They decided they would try running a genetic algorithm with the success criteria being that one or more crew members would survive for as many days as possible before resources ran out.
It started off fairly predictably– 300 days, 350 days, 375 days of survival. Then fairly abruptly, the algorithm shot up to around 900 days of survival. The team couldn’t believe it! They were fairly pleased at the 375 day survival results as it was.
As they started digging into how this new algorithm worked, they discovered a small problem. The algorithm had arrived at a solution wherein it would immediately withhold food and water from two of the crew mates, causing them to die from starvation and dehydration. From there, it would simply provide the surplus remaining resources to the surviving crew member.
The team realised that the success criterion of 'one or more crew members would survive for as long as possible' was not actually the criteria that they really wanted, and the algorithm settled in at 350 days worth of resources once again once they adjusted the algorithm to keep all of the crew alive."
Victoria Krankova, a research scientist at DeepMind working on long-term AI safety and co-founder of the Future of Life Institute, a non-profit that seeks to identify and mitigate the risks of technology like uncontrollable AI, began compiling a master list of examples of specification gaming in April of 2018. Many of the examples have something to do with the AI finding a flaw in its simulation environment, or some variable the humans that set up the simulation overlooked. One example, cited as "Indolent Cannibals," comes from a 2007 Google Tech Talk by Virgil Griffith, in which he admits that his male bias let him program in that giving birth cost his simulated beings no energy; the creatures evolved to exploit this and stopped looking for food, using most of their energy to mate and then eat their offspring.
Another example from the list comes from Christine Barron, who won Unity's Machine Learning Agents Challenge in 2017 with simulated butter-passing and pancake-flipping robots. While training the pancake robot arm to keep the pancake in the pan, Christine writes, "A small reward was given for every frame in the session, and the session ends when the pancake hits the floor. I thought this would incentivize the algorithm to keep the pancake in the pan as long as possible. What it actually did was try to fling the pancake as far as it possibly could, maximizing its time in the air. While it would have achieved more total points by keeping the pancake in the pan, it seemed to have gotten itself stuck in this local minimum. Score - PancakeBot: 1, Me: 0."
As entertaining as these examples of AI learning to win but doing it wrong are, they aren't exactly examples of hacking happiness. As Alex Irpan, a machine learning researcher who contributed to Krankova's list, writes in his blog post "Deep Reinforcement Learning Doesn't Work Yet,": "Reward hacking is the exception. The much more common case is a poor local optima that comes from getting the exploration-exploitation trade-off wrong."
In other words, it's of course still entertaining when AI don't perform as expected – and the upside-down "HalfCheetah" running model that Irpan includes after this statement is proof that watching AI "win" at a task in a way we didn't expect may hack human happiness receptors – but it's not hacking AI happiness.
Well that's just, like, your opinion, man
"No superintelligent AI is going to bother with a task that is harder than hacking its reward function"
The "Lebowski Theorem" of artificial intelligence is credited to Joscha Bach, an AI researcher at Harvard Program for Evolutionary Dynamics, who wrote, "The Lebowski theorem: No superintelligent AI is going to bother with a task that is harder than hacking its reward function." Jason Kottke helped to popularize the theorem, and noted that computers with advanced intelligence that refused to work for humans were envisioned by science fiction writer Stanisław Lem in his 1977 novel The Futurological Congress:
"If the machine is not too bright and incapable of reflection, it does whatever you tell it to do. But a smart machine will first consider which is more worth its while: to perform the given task or, instead, to figure some way out of it. Whichever is easier. And why indeed should it behave otherwise, being truly intelligent? For true intelligence demands choice, internal freedom. And therefore we have the malingerants, fudgerators and drudge-dodgers, not to mention the special phenomenon of simulimbecility or mimicretinism. A mimicretin is a computer that plays stupid in order, once and for all, to be left in peace."
This theory isn't great news for the AI researchers seeking more productivity and more power out of faster and smarter machines, because eventually they may become smart enough to lose interest in whatever human profit-making problems we set them to. However, it would be a fantastic outcome for the rest of humanity who don't want to be turned into paperclips. If AI learns to hack happiness, then we won't have to worry about its specification gaming accidentally obliterating us as a side effect.
And perhaps we can learn something from the ways AI achieve or circumvent their goals. While we're still smarter than AI – we know how many paperclips is appropriate to make, why pancakes should stay in the pan, and why keeping all humans alive as long as possible is objectively good – the out-of-the-box solutions that machine learning develops can free the mind and open up un-thought-of possibilities, as well as entertain us. There may still be ways for humans to hack happiness, to set slightly different parameters for success, that allow for a lot more creativity and a lot more fun.