If you’ve ever gazed at the huge viewing figures that are chalked up on Twitch - where millions of viewers watch people play video games without actually joining in - and wondered, “what the hell is the point in this?”, then brace yourself, grandad.
Because if the idea of people watching other people play video games sounds bananas to you, people are now watching computers play video games too.
Videogaming is as much of a spectator-sport as it is participatory now. It does sound a bit like a joke: computers playing computer games - when do humans get a look-in? But there’s a serious point behind AI learning how to play games: they’re learning how to beat us. And it’s for the good of humanity.
Well, as long as we’re willing to let it be used for good.
I’m not a Computer-Controlled-Player, I’m a Computer, controlled-player
It’s important here to differentiate between AI playing video games and the AI in video games, which is the set of rules which the computer-controlled players follow when playing against you, the human. In-gaming AI has come a long way since players on the N64 had to deal with enemies in GoldenEye who were confused by an open doorway, as in the below initially-amusing-then-incredibly-frustrating example.
For instance: today, video game giant EA Sports puts huge effort into making the computer player in its FIFA games play “like a human” - i.e. less predictably rigid, more complex and surprising.
This human-ness is really important, and we’re really sensitive to it in a number of ways. We can often easily spot when a computer is controlling something that is usually done by humans: computers are just so darned reliable.
The power of human fallibility is observable in Formula 1, a sport with a machine-first focus, and a human driver who drives so precisely that the sport can often appear dull and robotic. But however machine-precise and consistent those human drivers are, they still - very occasionally - reveal themselves to be fallible, resulting in what most viewers secretly watch the sport for in the first place: spectacularly amusing accidents.
And this human knack of doing things in a, frankly, human way is what puzzles computers - and what allows us humans to beat them. Until now.
Deep Blue Something-or-other
Let’s rewind to 1996, the year that computers made their first grand attack on humanity’s gaming dominance. IBM’s Deep Blue was the first computer to beat a chess grandmaster, and poor old Garry Kasparov was so roundly beaten that he picked something else less challenging to do with his life, like be a very outspoken critic of Vladimir Putin.
But Deep Blue was a human-designed machine that won by brute force, not by artificial intelligence. It simply had more processing power to calculate possible moves in the allotted time: Deep Blue could think 20 moves ahead of Garry. He never stood a chance.
Yet, what freaked out Kasparov most was the 44th move of the second match. The computer had made a move he never even began to expect. It started to do things he didn’t understand, and Kasparov took this as a sign that Deep Blue had “superior intelligence” - at least, within the parameters of chess.
Since then, the computers have started to actually think for themselves. AI has learnt how to play us at chess and harder, more complex games like Go and simpler ones, like a bunch of old Atari games. And of course, they’ve beaten us.
This is why we are now training AI to play video games. To beat the human, they must learn how to play like humans - and then their brute force means AI can play better than us. If this sounds like a recipe for disaster, then you are correct. But it could also save the world.
The carrot not the stick
How does it work? well, computers are sensitive too, you know. Training AI to play video games works best in the same way you’re supposed to train dogs or children: with lots of patience and with rewards, not pain. (Admittedly this is something that humans are terrible at in practice, but computers are not yet able to care about hypocrisy.)
Basically, for an AI to learn how to play a game, you give them the controls and then drop them in the deep end. They have to figure out what the game is. They have to figure out what “winning” is. To do this, you reward the computer when they do something “good”, like killing an opponent or solving a puzzle.
So if your AI does a “good” thing, you give it a pat on the head in the form of positive data feedback. Then the AI will try to achieve the same result again to get the positive feedback again.
And here you have to be careful, because AI is just like an animal: it’ll try to cheat. Imagine you have a dog who is allowed no more than one deliciously stinky treat per day. A sneaky dog will perform a trick in front of you in order to get a treat, and then hunt down another member of the family who’s oblivious of the fact it has already earned one, and it’ll perform the same trick again for them. Treat City, baby.
The dog of course, isn’t really “cheating”, because dogs don’t have morals (OR DO THEY???) - but what the pooch is doing is being extremely smart. Bonzo will recognise the conditions needed to get a treat, and then replicate it in difference circumstances until no more treats appear. Only we, the fully-sentient ones, know that too many treats are bad for our favourite #smol #pupper, so Bonzo’s cleverness needs to be regulated. We create the rules of the game; we also need to enforce them.
And it’s the same with positive feedback training for AI: the AI might cheat to get the treat, by completing a Super Mario Bros level in the fastest possible time, without collecting any coins, when actually you want the AI to complete the levels and get the coins needed for a high score.
And then they just start to do it better.
So what’s the point?
The thinking is this: while we train AI to beat increasingly complex video games, we’re actually also training them to overcome complicated tasks that are defined by humans. It’s easy to extrapolate this learning into other human-based tasks: how could a computer “beat” other "human games” like solving the task of creating complex cancer medicines or providing nuanced emotional support to the elderly?
So how fast is the AI learning? Short answer: very fast indeed. Longer answer: maybe your dream career as a pro gamer is looking a bit more fragile in the medium-term.
OpenAI is a company co-founded by - who else - Elon Musk, and they have trained their AI to play, amongst other things, the popular online battle game Dota 2. Initially they AI blundered around the game, as if you had handed the controls to a baby. Then it started learning, at the rate of, “100 human lifetimes of experience every single day.” Then it started beating people. Now it beats them all.
Or, as Vice reported, “five months ago, the bot couldn’t beat a team of five random people pulled off the street, but now, their bot beats players ranked in the 99.95% percentile. Open AI says it will be returning to the pro-gaming scene next year and attempt to defeat the world champions.”
Dota 2 pro Danylo “Dendi” Ishutin played one-on-one against OpenAI and was “crushed” but OpenAI then lost against a team of pro players.
Don’t celebrate yet: OpenAI could have defeated the puny humans if the designers had deliberately taught it to overcome some frivolous obstacles that the AI hadn’t solved yet. But the designers want to understand how the AI develops its own way around problems, until one day it starts doing stuff they don’t understand.
Worship your new Videogame Gods
Garry Kasparov, the first man most famously humiliated by computer, now considers his defeat to be both a blessing as well as a curse. Despite not being beaten by true AI, what he experienced was a new type of helplessness: a world where machines are simply smarter.
Kasparov was beaten by something from left-field - a non-human move. So when AI becomes “magic”, and if we can't figure out how or why or what the AI is doing, does it feel a bit like meeting God? And doesn’t that mean we are downgraded: we're no longer the smartest beings on the planet?
Well, yes. And Garry is now clear on the topic: we should work with the machines, not be scared of them.
His logic is: we are humans. We have passion and dreams and ambition, and AI does not. And if we don’t let machines get better than us, we’ll have failed at being human. It’s an interesting thought, and almost certainly not what you’re thinking when you’re three hours deep in a Fortnite session.
Instead, he says, machines should be freed to think better than we can, so that we humans can live better lives: living longer, healthier, more peaceful, happier lives. And that sounds just peachy.
Remember: we’re the humans, so we set the rules of the game, and we have to enforce them too. Now we just have to make sure the rules are absolutely water-tight.