We’re releasing an analysis showing that since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.5 month-doubling time (by comparison, Moore’s Law had an 18-month doubling period). Since 2012, this metric has grown by more than 300,000x (an 18-month doubling period would yield only a 12x increase). Improvements in compute have been a key component of AI progress, so as long as this trend continues, it’s worth preparing for the implications of systems far outside today’s capabilities.
Let's assume the analysis is correct (there aren't a lot of datapoints). What can we say about the near-future progress in AI?
I would say that there are two basic kinds of material limitations (i.e. we are excluding theoretical limitations from discussion) to machine learning progress:
* Data limitations -- that is, it's hard to find enough data to train models.
* Computation limitations -- people have long known how to make progress on a problem, as the data are plentiful; it's just that there isn't enough computing power.
Those problems that are data-limited are things like Machine Reading Comprehension and Video Question-Answering. I think plentiful data will soon arrive with advanced BCIs to make serious progress on those problems. On the other hand, problems that are computation-limited will see serious progress after people feel they can get the results they are looking for, using vastly more computing power -- i.e. their estimation of the risk of failure improves. The 3.5 month doubling time reflects a combination of improvements on the hardware side, reduction in cost, as well as an understanding of the risk profile of throwing more computing power at problems.
So what kinds of problems are computation-limited, but not data-limited? Here are a few:
* Video synthesis. There is plenty of video data -- petabytes, in fact. It's just computationally-demanding to train large models to generate video, so the models used currently probably don't absorb enough world knowledge to do a good job.
* Text synthesis. Again, there is lots and lots of text out there, but models are nowhere near large enough to absorb it all, to produce more coherent text.
* Robots completing complex tasks in virtual environments (which can hopefully be transferred to the real world). You can train robots endlessly in virtual environments; but to train a large model, with lots of capabilities, you need lots and lots of training sessions, and hence lots of computing power is needed.
* Playing board games or even videogames like Starcraft II -- basically any game with definite rules that can be run many times.
* Learning to browse the web and complete a task, by using keyboard and mouse, given a text description of what to do. Endless amounts of training data can be generated automatically.
* Solve programming competition problems.
* Solve math competition problems that require generating a short (but hard-to-find) proof.
And there are lots more.
Many of these you can see would have enormous economic value (e.g. the video synthesis and robot tasks, if they could be transferred to the real world).
I would say that if the amount of computing power thrown at the most popular problems continues unabated, we will see some scary impressive progress on many of these in the next 3 years -- "scary" in the same sense as the reaction from people when AlphaGo beat the top players at Go; most people never saw it coming.