Jump to content

Welcome to FutureTimeline.forum
Register now to gain access to all of our features. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, post status updates, manage your profile and so much more. If you already have an account, login here - otherwise create an account for free today!

These ads will disappear if you register on the forum

Photo

Assuming compute power devoted to the largest AI project doubles every 3.5 months, what kind of progress can we expect in the near-future?

AI Deep Learning

  • Please log in to reply
11 replies to this topic

#1
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPip
  • 198 posts
Recall this posting by OpenAI from earlier today:

https://blog.openai....ai-and-compute/
 

We’re releasing an analysis showing that since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.5 month-doubling time (by comparison, Moore’s Law had an 18-month doubling period). Since 2012, this metric has grown by more than 300,000x (an 18-month doubling period would yield only a 12x increase). Improvements in compute have been a key component of AI progress, so as long as this trend continues, it’s worth preparing for the implications of systems far outside today’s capabilities.


Let's assume the analysis is correct (there aren't a lot of datapoints). What can we say about the near-future progress in AI?

I would say that there are two basic kinds of material limitations (i.e. we are excluding theoretical limitations from discussion) to machine learning progress:

* Data limitations -- that is, it's hard to find enough data to train models.

* Computation limitations -- people have long known how to make progress on a problem, as the data are plentiful; it's just that there isn't enough computing power.


Those problems that are data-limited are things like Machine Reading Comprehension and Video Question-Answering. I think plentiful data will soon arrive with advanced BCIs to make serious progress on those problems. On the other hand, problems that are computation-limited will see serious progress after people feel they can get the results they are looking for, using vastly more computing power -- i.e. their estimation of the risk of failure improves. The 3.5 month doubling time reflects a combination of improvements on the hardware side, reduction in cost, as well as an understanding of the risk profile of throwing more computing power at problems.

So what kinds of problems are computation-limited, but not data-limited? Here are a few:

* Video synthesis. There is plenty of video data -- petabytes, in fact. It's just computationally-demanding to train large models to generate video, so the models used currently probably don't absorb enough world knowledge to do a good job.

* Text synthesis. Again, there is lots and lots of text out there, but models are nowhere near large enough to absorb it all, to produce more coherent text.

* Robots completing complex tasks in virtual environments (which can hopefully be transferred to the real world). You can train robots endlessly in virtual environments; but to train a large model, with lots of capabilities, you need lots and lots of training sessions, and hence lots of computing power is needed.

* Playing board games or even videogames like Starcraft II -- basically any game with definite rules that can be run many times.

* Learning to browse the web and complete a task, by using keyboard and mouse, given a text description of what to do. Endless amounts of training data can be generated automatically.

* Solve programming competition problems.

* Solve math competition problems that require generating a short (but hard-to-find) proof.


And there are lots more.

Many of these you can see would have enormous economic value (e.g. the video synthesis and robot tasks, if they could be transferred to the real world).

I would say that if the amount of computing power thrown at the most popular problems continues unabated, we will see some scary impressive progress on many of these in the next 3 years -- "scary" in the same sense as the reaction from people when AlphaGo beat the top players at Go; most people never saw it coming.
  • Casey and Yuli Ban like this

#2
TranscendingGod

TranscendingGod

    2020's the decade of our reckoning

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,795 posts
  • LocationGeorgia

Shoot I didn't even know what Go was before that fateful event. That rate of resource addition is just so mind boggling that at this point i'm gonna go with Vernor Vinge's original prediction of 2023 for the Singularity. Haha or barring that at least we will accomplish what Peter Weyland said of creating "cybernetic individuals" (Youtube video Ted Talk 2023- tie in with the Prometheus movie).


  • Casey and starspawn0 like this

The growth of computation is doubly exponential growth. 


#3
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPip
  • 167 posts

From the prehistoric days of 1997:

 

 

''It may be a hundred years before a computer beats humans at Go -- maybe even longer,'' said Dr. Piet Hut, an astrophysicist at the Institute for Advanced Study in Princeton, N.J., and a fan of the game. ''If a reasonably intelligent person learned to play Go, in a few months he could beat all existing computer programs. You don't have to be a Kasparov.''...But winning the $1.4 million prize promised by the Ing foundation to a program that beats a human champion may be an impossible dream. The offer expires in the year 2000. Go programmers are hoping it will be extended for another century or two.

https://www.nytimes....cient-game.html

 

From the ancient days of 2006:

 

 

A very rough estimate might be that the evaluation function [for computer Go] is, at best, 100 times slower than chess, and the branching factor is four times greater at each play; taken together, the performance requirements for a chess-like approach to Go can be estimated as 1027 times greater than that for computer chess. Moore's Law holds that computing power doubles every 18 months, so that means we might have a computer that could play Go using these techniques sometime in the 22nd century.

https://www.theguard...chnologysection


  • Casey, Yuli Ban and starspawn0 like this

#4
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPip
  • 198 posts
There seems to be some debate online about what this new analysis by OpenAI means. I think it's pretty clear what it means. It doesn't mean we're beating Moore's Law on the hardware side -- it means that Moore's Law is still puttering along, that chips are getting cheaper, and perhaps most important, that peoplea are willing to use more compute (at ever-increasing cost), as they are more confident they can get it to produce good results. That last thing can be just as important as technological improvements. It won't last, of course, but could still keep going strong for a few more years.

One of the best comments I have come across is by Salesforce's Stephen Merity:

https://twitter.com/...897276118290437

Being guarded against hype is important - but being cognizant of how quickly the landscape may change in our field of endeavor (and the industries and communities tied to it) is equally important.


People who have had their skeptic high-beams turned on too high for too long, prepared to take down any little error in a popular science article about Deep Learning, should tread cautiously about this.
  • Casey and Yuli Ban like this

#5
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPip
  • 198 posts
Text synthesis is getting better, and may be one of the first to see "scary" progress in the next 3 years:

https://arxiv.org/abs/1805.06064
 

With two series of Turing tests, where the human judges are asked to distinguish the system-generated abstracts from human-written ones, our system passes Turing tests by junior domain experts at a rate up to 30% and by non-expert at a rate up to 80%.


Writing an essay or short story might be one of those seemingly out-of-reach tasks that might be doable, given enough computing power.

Let's think about it: it turns out there are only a small number of different types of stories:

https://www.theguard...es-ever-written

And there are only a small number of acceptable ways to write them. A probability distribution on narrative chains, and latent composition laws (recombine chains to produce new ones), can be learned. Some amount of common sense can be learned from free text, too. Language modeling works well on individual sentences. A set of "editor" modules could check the grammar, coherence, and logic. Put it all together, and I could see a story-writing system that can produce creative short stories being possible in the next couple years -- the whole thing would be trained on massive amounts of free text and also large numbers of short stories. The outputs would fool most humans; and people will complain, "But does it really know what it's writing?"
  • Yuli Ban likes this

#6
Alislaws

Alislaws

    Democratic Socialist Materialist

  • Members
  • PipPipPipPipPipPipPip
  • 1,054 posts
  • LocationLondon

It would be neat to be able to go to a website or similar and fill in a form where you pick various elements, and it writes you a custom story.

 

If you can do a coherent story with narrative structure etc. then it would theoretically be possible to have videogames with procedurally generated plotlines and side missions etc. which could allow for infinitely variable, but highly detailed virtual worlds right?

 

This might even be easier because you have a fully designed world that the computer system could understand more easily than the real world in which to create the storylines. The storylines could also adapt on the fly to player actions (eventually)

 

Also you could play D&D with a computer DM. 


  • Yuli Ban and starspawn0 like this

#7
Yuli Ban

Yuli Ban

    Nadsat Brat

  • Moderators
  • PipPipPipPipPipPipPipPipPipPipPip
  • 18,455 posts
  • LocationAnur Margidda

Might you write a cohesive post for /r/MediaSynthesis listing all the things we might expect in the next five to ten years if this rate of advancement continues?


  • Casey and starspawn0 like this
Nobody's gonna take my drone, I'm gonna fly miles far too high!
Nobody gonna beat my drone, it's gonna shoot into the sky!

#8
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPip
  • 198 posts

I'll pass.  But this piece might be good for your forum:

 

https://www.bloomber...about-two-weeks


  • Yuli Ban likes this

#9
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPip
  • 167 posts

What do you think of this claim?

 

 

That seems like such an odd unit of measure to use Petaflops/s days. The dimensional analysis would suggest that the seconds would just cancel out from the top and bottom. I suggest a better unit of measure would be BED- (human) brain equivalent days. Human brain can process roughly 1 exaflop per second. So, if you had a human working away for 24 hours you would have 1 BED. Thus, AlphaGo Zero achieved 10 BED. I will just add some vague claims to, ah, intellectual property for this idea.

 
This is really awesome! What I am seeing here is that the increased high CPU capability is being quickly translated into enhanced AI functionality. It seems clear to me that over the next 5 years there should be a dramatic change in the range of artificial behaviors. If all that is needed is to add CPU that is essentially already baked in. We are now on a countdown to the first wave of AI that should roll out over the next 2-3 years.
 
I also wonder how Google could access so many flops. Wonder whether they might have found a way to use PCs CPUs during searches. How many flops/s would a billion PCs give you?

http://infoproc.blog...l#disqus_thread


  • Casey likes this

#10
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPip
  • 198 posts
As to the units used, it's similar to one of the standard ones in Deep Learning papers.

As to the rest of it: I am unsure about intelligent behavior rolling out. I just think if things keep going as they have, then we'll soon see lots more big tasks solved -- like video synthesis and robots that act in virtual environments in very complex ways.

Some skeptics on the web have pointed out how some of the big successes achieved with large amounts of compute were quickly superseded by smarter training methods and orders of magnitude less compute. To that I would say that that first big success probably is the reason those better models were found. A big success shows that it's possible to solve the problem using no more than a certain amount of resources; and that then spurs people on to find more efficient solutions. This is exactly what happens in the other sciences. The first solution to a problem is often messy and very complicated; and then later solutions are much cleaner and simpler.

People really underestimate -- or don't even consider -- the problem of risk estimates. Show it won't be a waste of time to pursue a particular approach, and whole teams of people will try it.

Google's Machine Translation work started with them trying a few experiments using comparatively small amounts of data, and small neural nets. Seeing how successful it was, and how the success seemed to scale with the size of the network, they finally chose to run much, much larger experiments -- and were successful. Without that inital signal showing how success scales with compute, they would not have even attempted to go further. They needed to be sure that their efforts wouldn't be a waste of time.

The story is the same with AlphaGo, and many of the other experiments.
  • Casey likes this

#11
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPip
  • 167 posts

Some skeptics on the web have pointed out how some of the big successes achieved with large amounts of compute were quickly superseded by smarter training methods and orders of magnitude less compute. To that I would say that that first big success probably is the reason those better models were found. A big success shows that it's possible to solve the problem using no more than a certain amount of resources; and that then spurs people on to find more efficient solutions. This is exactly what happens in the other sciences. The first solution to a problem is often messy and very complicated; and then later solutions are much cleaner and simpler.

That's the process I've always thought AGI would go through. The first one will be a hot mess, running off of some massive server farm with a huge electricity bill, frequently crashing, and with such a long, convoluted code that fixing it or even understanding how it works will be nearly impossible. Over time, it will grow more efficient in every way, until there are AGIs that could run on the hardware we have in 2018. 



#12
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPip
  • 198 posts

I'd like to add some things to this thread that I neglected to do when I wrote the OP:

First is a Tweet by OpenAI's co-founder Ilya Sutskever (who was a student of Hinton's and worked at Google Brain):

 

https://mobile.twitt...798116593483776
 

fun fact: 2 million years ago, biological evolution reached a similar conclusion re utility of making certain neural networks larger.


He shows a graph of brains getting much larger (and much smarter) over a very short period of time.

I recall he said in a talk recently that when they trained their Dota 2 bots, they saw continued improvement the more compute they threw at it. In fact, the improvement was "exponential" -- like it was getting exponentially smarter -- in the sense that the ELO or equivalent ratings continued to climb linearly with training time. One can interpret this as "exponential improvement", as small changes in the ELO score translate into "exponentially more likely to win" (if I recall correctly, a score of x+100 means you are twice as likely to win against someone with a score of x).

Sutskever seems to be a believer in taking simple algorithms people worked out decades ago, and simply add "MOAR compute!"

....

Another interesting thing -- and I recall seeing Sutskever speculate on this before in a talk -- is that there is now some empirical evidence that if you train multiple agents in an environment using Deep Reinforcement Learning (and probably also Neuroevolution), the individual agents learn to model the others. It's like saying that agents naturally evolve a kind of "empathy" or "social understanding":

https://arxiv.org/abs/1805.06020

I recall Sutskever saying that this might be a path to AGI: simply create a sophisticated enough environment, with lots of different agents, and let them evolve to complete certain tasks. The agents will learn to model the intentions of the other agents, and a social mind will emerge naturally.

Perhaps they will acquire a kind of long-term memory with episodic-procedural-declarative-iconic features, reasoning, language, and many other aspects of human intelligence. To be useful to humans, one would have to guide this evolution, so that the AIs learn to speak in English or some other human language, I suppose.

Maybe rudimentary versions of this experiment are being tried right now in the servers at OpenAI...


  • Yuli Ban and Alislaws like this





Also tagged with one or more of these keywords: AI, Deep Learning

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users