Jump to content

Welcome to FutureTimeline.forum
Register now to gain access to all of our features. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, post status updates, manage your profile and so much more. If you already have an account, login here - otherwise create an account for free today!
Photo

Computation does not equal intelligence


  • Please log in to reply
23 replies to this topic

#1
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,015 posts

For obvious reasons, I don't think we'll be able to create a human-level AGI until we have computers whose hardware is as powerful as a human brain. The best estimate is that a human brain does the equivalent of 10^16 calculations per second, and today, only the best supercomputers are that fast. Of course, Moore's Law guarantees that the computers will get even faster quite soon. 

 

However, I'm struck by a troubling realization: Even today's PC desktops have processing speeds greater than insect brains, yet no one has figured out how to build insect-level AI. See this graph:  

 

https://www.reddit.c..._ray_kurzweils/

 

On the other hand, I wonder if some of our computers DO vastly exceed insect-level intelligence in some domains. For example, the computers that drive autonomous cars, play games like Go and Starcraft, and synthesize written text (GPT-3) might be much better at those tasks than an insect-level AGI ever could be. 

 

Maybe we COULD have built an insect-level AGI by now had we chosen to devote the resources to that task. 

 

I don't know. What do you guys think? Are you also troubled by the gap between the theoretical performance of our computer hardware and the actual performance of the best "AI" software running on it? 



#2
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,961 posts

A lot of the old views about how the brain works are wrong.  Take, for example, the motor cortex.  It has a lot of neurons (I don't know exactly how many); but it exhibits what is called "low-dimensional dynamics".   This means that you can effectively emulate large parts of its functioning using much less computation than it would appear that you would need.  In fact, relatively simple neural nets, with just a few hundred artificial neurons, can emulate the behavior of the primate motor and pre-motor cortex in reaching and grasping tasks:

https://www.biorxiv....2884v1.full.pdf

I'm not exactly sure why there is so much redundancy built in to the brain, but it could be used as a type of error-correction mechanism: as neurons die, if the motor cortex only had a small number of neurons, the effect would be large, impairing the ability to move accurately. But if you have a very large number of neurons to work with, then losing 1% of them will have much less effect.

It is believed that the brain uses "population codes" to do lots of things, besides motor control.

Also, a large chunk of the brain is devoted to things like keeping the heart beating, and regulating breathing.

If you add it all up, mammal brains probably do much less processing than one would think.

....

Another way to look at it is in terms of the amount of information needed to build the brain. People are born with a few instincts and capabilities, and then much of what makes them them is in the environment. The total amount of information in the genome is pretty small, especially when you consider that much of the non-coding part (which is around 98% of the genome) doesn't really do that much; so most of the "programming" has to come from the environment.

And how much information does the environment produce, above the noise level? According to this,

https://www.britanni...eory/Physiology

the body sends the brain about 11 million bits or about 1.4 million bytes -- let's call it 1 million bytes -- each second. But there's going to be a lot of noise, and not all of it is going to get used. Let's say about 10,000 to 100,000 bytes each second are actually used in "programming" the brain.

In 1 day that amounts to about 1 billion to 10 billion bytes; over 10 years that amounts to about 3 trillion to 30 trillion bytes; and the 10-year-old brain reaches a pretty high level of functioning.

Now consider how many synapses there are in the brain: there are about 100 billion neurons, and the typical neuron has about 10,000 connections. So, there are about 1 quadrillion synapses. Assuming each synapse can be described by 1 byte of information, that's about 30x to 300x larger than the amount of information entering the brain over 10 years.

This argument has been made before (I think by Hinton), to indicate that there is a lot of redundancy in the brain.

....

What I think is the case is that a lot of the "intelligence" we observe the brain producing comes from the rich senor information -- not just seeing, hearing, smelling, and so on; but the kind of seeing, hearing, etc. e.g. We can move through the world quickly, compared to robots; and we can change what information we receive at will -- e.g. by swiveling our heads.

 

And the algorithms the brain uses to map this data to "programming" are probably pretty dumb.  It isn't something that you need a "genius" to come up with.  Listen to this podcast, for example, which has an interview with Princeton professor Uri Hasson:

 

https://braininspired.co/podcast/63/

Unfortunately, we are a very long way away from producing robots with the mechanical speed and manipulation abilities of humans; so, we can't train robots in the same rich way that humans were trained.

But we can hope to extract enough of that information from BCIs, from neural population recordings...

....

But GPT-3 and similar systems might give machines enough information about how we think, to where they can emulate abstract human thought.

It is known, for instance, that large language models show activity patterns that are eerily similar to those seen in the brain. People have applied neuroscience techniques to map individual neurons in a large language model neural net to neural population in the human brain; and the match is far better than any model of the brain anyone has ever come up with -- shockingly good:

https://www.abstract...sentation/66073

What that says to me is that models like GPT-3 are not just "memorizing" like how skeptics seem to think. They are actually capturing fundamental human cognitive abilities, somehow.  (And their architectures are relatively simple.  So, again, "genius" isn't needed to make them to do incredible things, if you have enough data.)

It really shouldn't be surprising, as one would expect that deep statistical patterns in our use of language should say at least something about the brains that generated it.



#3
Erowind

Erowind

    Anarchist without an adjective

  • Members
  • PipPipPipPipPipPipPip
  • 1,481 posts
Reading this thread gave me a sense of horrific mortal dread. Eternalist ramblings aside, I am uncomfortably reminded of how fragile the hardware my consciousness relies on really is. Contempt for the flesh.

#4
tomasth

tomasth

    Member

  • Members
  • PipPipPipPipPip
  • 294 posts

fragile ? That it can continue in many many many functions and capabilities , with many many many maladies disfunctions and disaabilities , is a proof of how very robust it is.

 

It is very much hoped that humanity can invent better , but the flesh is survived many evolutionary pressures disasteres and evil  adverserial pertubations.



#5
Raklian

Raklian

    An Immortal In The Making

  • Moderators
  • PipPipPipPipPipPipPipPipPipPip
  • 7,243 posts
  • LocationRaleigh, NC

Reading this thread gave me a sense of horrific mortal dread. Eternalist ramblings aside, I am uncomfortably reminded of how fragile the hardware my consciousness relies on really is. Contempt for the flesh.

 

What's even more horrific is the certainty we will find the brain remarkably unremarkable in that it's a lot more primitive and redundant than we supposed, one that we will easily emulate, upgrade and mass produce to the point it's cheap and ubiquitous. One day, a tossed trash in some dystopian dark alley will have more compute and intelligence than that of our brains. 


What are you without the sum of your parts?

#6
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,015 posts

 

Reading this thread gave me a sense of horrific mortal dread. Eternalist ramblings aside, I am uncomfortably reminded of how fragile the hardware my consciousness relies on really is. Contempt for the flesh.

 

What's even more horrific is the certainty we will find the brain remarkably unremarkable in that it's a lot more primitive and redundant than we supposed, one that we will easily emulate, upgrade and mass produce to the point it's cheap and ubiquitous. One day, a tossed trash in some dystopian dark alley will have more compute and intelligence than that of our brains. 

 

Ouch. The truth hurts. 



#7
scientia

scientia

    Member

  • Members
  • PipPip
  • 10 posts

 Of course, Moore's Law guarantees that the computers will get even faster quite soon.

Moore's law actually ended in 2010 at 32nm. I still see people (like Kurzweil) pushing this myth, but it isn't real. We can go over this. Both Intel and AMD were producing 32nm products in 2010. Dropping down one full step every two years and a half step in between would be: 28nm: 2011

22nm: 2012

20nm: 2013

16nm: 2014

14nm: 2015

12nm: 2016

10nm: 2017

8nm: 2018

7nm: 2019

6nm: 2020

 

AMD, with production at Global Foundries, dropped a half node to 28nm in 2014. GF, with process technology licensed from Samsung  had 14nm in 2017 and 12nm in 2018. GF currently has no plans for 7nm production. AMD's 7nm production began in 2019 at TSMC.

 

Intel began producing 22nm in 2012, 14nm in 2014 and 10nm in 2018 (canceled and delayed to 2020).

 

It's worse than it looks. AMD's 14nm process was actually 22nm and the 12nm process is 16nm. And, TSMC's and Samsung's 7nm are only equal to Intel's 10nm. So, we're at least 3 years behind. None of this has been smooth. Intel was still making 22nm chips in 2015 because it was unable to get enough volume from 14nm. In 2018, Intel could not meet demand for 14nm chips and was having trouble with heat dissipation on Coffee Lake. Cannon Lake was canceled so all of the Ice Lake chips were produced on 14nm. Today, Intel is still making 14nm chips because it can't get enough volume from 10nm. Both Intel and AMD have gone to hybrid chips with chiplets using two process sizes because one process can't fulfill the requirements.Today, no one knows how to get to a 7nm chip. So, when you hear numbers thrown around like 5nm, understand that they are talking about something more like 9nm. People also get confused between what is possible for memory (smallest), what is possible with a low power chip (middle), and what is possible with a high performance chip (largest). In other words, you can build SRAM smaller than you can make a low power processor (like is used on a cell phone) and the cell phone chip can be made smaller than a desktop processor chip. We are hitting the wall. We've had to resort to EUV immersion lithography and still have to do quad patterning. And some of those gains were due to rearranging the layout of the chip rather than making the components smaller. If you are counting on Moore's Law, you've probably hitched your wagon to a lame horse.

 

There are ways to get at least a few more generations once we run out, but they won't be popular since they can't use current development techniques. And we might still increase wafer size from 300mm to 400mm.

 

For obvious reasons, I don't think we'll be able to create a human-level AGI until we have computers whose hardware is as powerful as a human brain. The best estimate is that a human brain does the equivalent of 10^16 calculations per second, and today, only the best supercomputers are that fast.

For 32 bit operations, 10^16 seems about right. However, Summit can do 14.8 * 10^16 FLOPS now.  Aurora and Frontier are both expected in 2021 with Aurora having 100 * 10^16 FLOPS and Frontier having 150 * 10^16 FLOPS. I am quite confident that none of them will be AGI systems.
 

However, I'm struck by a troubling realization: Even today's PC desktops have processing speeds greater than insect brains, yet no one has figured out how to build insect-level AI. See this graph: 

https://www.reddit.c..._ray_kurzweils/

I'm not sure what you mean by this. What is it that you're saying an insect can do that a computer can't?

 

On the other hand, I wonder if some of our computers DO vastly exceed insect-level intelligence in some domains. For example, the computers that drive autonomous cars, play games like Go and Starcraft, and synthesize written text (GPT-3) might be much better at those tasks than an insect-level AGI ever could be.

There's no such thing as "insect-level AGI" that I'm aware of. The term 'artificial intelligence' was coined at the Dartmouth conference in 1956 and since that time the goal has been human-level AI. Many people including Alan Turing and John Nash assumed that brains were just a type of computer and that at some point computers would be able to do whatever brains could. The Turing Test for example would make sense if the assumption of brain/computation equivalence turned out to be true. Unfortunately, it didn't work out that way. The first Cray-1 was built in 1976 with 160 megaFLOPS and people began to have doubts about human-level. It wasn't just that the hardware wasn't fast enough, computer scientists couldn't figure out how to even theoretically define human reasoning, much less explain how it worked.

 

Nevertheless, there were those who suggested that all we needed to do was reach some level of performance using the closest techniques available and then, once we were in the ballpark, we could figure it out. However, Japan's 5th and 6th generation projects collapsed without making progress nor did Cyc over the following decade. Finally, in 1997, Gubrud coined the term 'artificial general intelligence' to try to distinguish the goal from AI. The Singularity Institute for Artificial Intelligence was founded by Yudkowsky in 2000. Legg and Goertzel began using the term 'AGI' in 2002. By 2006. Cray's Red Storm was nearly a million times faster that Cray-1 with 101 TeraFLOPS. Some confidence returned to the field either because Yudkowsky or perhaps the more powerful hardware and people began talking about it. Today, we have DeepMind, the Human Brain Project, and OpenAI which all talk around AGI while still being unable to define it, much less design it or build it. The SIAI is now the Machine Intelligence Research Institute (MIRI) and Goertzel developed the OpenCog Prime software for OpenCog.

 

To me, the hype on this topic is staggering. According to self-driving car enthusiasts, the technology is either here already (but just hasn't been officially approved) or will be here within months. This is all nonsense. My mother's 2015 Subaru Forester has features like warning if you drift out of your lane, automatic braking, and vehicle following for cruise control. This would be Level 2. The best systems available today like Tesla's Autopilot or the Openpilot system are Level 3. No one knows how to get to Level 4 while a true autonomous system would be Level 5.

 

People seem constantly confused about the concept of development. This is an engineering process. It's how you slowly improve something over time such as making it more powerful, cheaper, or more suited to a task. For example, once tiny locomotives were available to replace horses they were steadily developed over the next century until they weighed a million and half pounds and put out over 4,000 HP. The triple expansion steam engines on the Titanic delivered 15,000 HP. However, when development reaches its practical limit you have to switch to something else. So, piston steam engines on ships switched to steam turbines which are also used in power plants. Locomotives switched from steam piston engines to diesels. Eventually diesels have got big enough to replace steam turbines on ships and to used for power generation. Piston engines on aircraft were replaced by jet engines. Jet engines are also used for power generation.

 

The biggest misunderstanding about AGI seems to be the assumption that it can be developed from AI. This is incorrect. AGI is not a bigger, faster, or more complex version of AI -- it's a completely different technology which isn't derived from computational theory (like AI). Watson and Alpha Zero will never be AGI, regardless of what hardware they are run on. In a similar fashion, Autopilot and OpenPilot will never be autonomous nor will Alexa or Siri ever be a genuine personal assistant. AI to AGI is like switching from a piston engine to a jet engine -- they are fundamentally different. You can't have insect-level AGI because insect functionality is similar to a finite automaton while AGI is considerably more complex. Curiously, MIRI seems well aware of this:

 

Wikipedia

MIRI researchers have expressed skepticism about the views of singularity advocates like Ray Kurzweil that superintelligence is "just around the corner". MIRI has funded forecasting work through an initiative called AI Impacts, which studies historical instances of discontinuous technological change

Based only on what is publicly known from the published research, AGI wouldn't seem likely within the next 50 years.

 

 
I don't know. What do you guys think? Are you also troubled by the gap between the theoretical performance of our computer hardware and the actual performance of the best "AI" software running on it?

It's about what I would expect given the limitations of AI.
 

starspawn0

And the algorithms the brain uses to map this data to "programming" are probably pretty dumb.  It isn't something that you need a "genius" to come up with.  Listen to this podcast, for example, which has an interview with Princeton professor Uri Hasson:

https://braininspired.co/podcast/63/

I think you might be exaggerating what he said.
 

Uri Hasson:

At the same time, these artificial networks, as opposed to humans, fail miserably in situations that require generalization and extrapolation across contexts

Yes, that's the problem that everyone has seen. So, does Hasson know how to solve it?
 

Uri Hasson

How high-level cognitive functions emerge from brute-force,over-parameterized BNNs is likely to be a central question for future cognitive studies.

No, he doesn't seem to.
 

starspawn0

But GPT-3 and similar systems might give machines enough information about how we think, to where they can emulate abstract human thought.

I wrote the theory for abstraction in 2016, but, because of specific conflicts, still have not submitted it for publication. So, I would naturally be interested if someone had an implementation.
 

starspawn0

What that says to me is that models like GPT-3 are not just "memorizing" like how skeptics seem to think. They are actually capturing fundamental human cognitive abilities, somehow.

That would be remarkable. Let's look:
 

Tierman

Despite GPT-3's improved benchmark results over GPT-2, OpenAI cautioned that such scaling up of language models could be approaching or already running into fundamental capability limitations of the current approach

Fundamental scaling limitations at this level would not seem to fit with the idea of general abstraction or human-type cognition.



#8
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,015 posts

Excellent post, scientia. That's a lot for me to digest. Off the bat, I have one question. 

 

 

Moore's law actually ended in 2010 at 32nm. I still see people (like Kurzweil) pushing this myth, but it isn't real. We can go over this. Both Intel and AMD were producing 32nm products in 2010. Dropping down one full step every two years and a half step in between would be: 28nm: 2011

22nm: 2012

20nm: 2013

16nm: 2014

14nm: 2015

12nm: 2016

10nm: 2017

8nm: 2018

7nm: 2019

6nm: 2020

 

AMD, with production at Global Foundries, dropped a half node to 28nm in 2014. GF, with process technology licensed from Samsung  had 14nm in 2017 and 12nm in 2018. GF currently has no plans for 7nm production. AMD's 7nm production began in 2019 at TSMC.

 

Intel began producing 22nm in 2012, 14nm in 2014 and 10nm in 2018 (canceled and delayed to 2020).

 

It's worse than it looks. AMD's 14nm process was actually 22nm and the 12nm process is 16nm. And, TSMC's and Samsung's 7nm are only equal to Intel's 10nm. So, we're at least 3 years behind. None of this has been smooth. Intel was still making 22nm chips in 2015 because it was unable to get enough volume from 14nm. In 2018, Intel could not meet demand for 14nm chips and was having trouble with heat dissipation on Coffee Lake. Cannon Lake was canceled so all of the Ice Lake chips were produced on 14nm. Today, Intel is still making 14nm chips because it can't get enough volume from 10nm. Both Intel and AMD have gone to hybrid chips with chiplets using two process sizes because one process can't fulfill the requirements.Today, no one knows how to get to a 7nm chip. So, when you hear numbers thrown around like 5nm, understand that they are talking about something more like 9nm. People also get confused between what is possible for memory (smallest), what is possible with a low power chip (middle), and what is possible with a high performance chip (largest). In other words, you can build SRAM smaller than you can make a low power processor (like is used on a cell phone) and the cell phone chip can be made smaller than a desktop processor chip. We are hitting the wall. We've had to resort to EUV immersion lithography and still have to do quad patterning. And some of those gains were due to rearranging the layout of the chip rather than making the components smaller. If you are counting on Moore's Law, you've probably hitched your wagon to a lame horse.

 

There are ways to get at least a few more generations once we run out, but they won't be popular since they can't use current development techniques. And we might still increase wafer size from 300mm to 400mm.

 

How do you square that with what this guy says? 



#9
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,961 posts

I think you might be exaggerating what he said.

....

Yes, that's the problem that everyone has seen. So, does Hasson know how to solve it?

....

No, he doesn't seem to.


I didn't exaggerate much. I invite everyone reading this to listen to it. It sounds to me like an admission of defeat; it contrasts with what he believed in the past about how the brain works. He says in there -- I forget where -- he asked his students how much of the brain's functioning is just the brute-force computations of the type he talks about. And he said the opinions vary, as I recall; and I think he said his students think between 90% and 95%. So, there's a little extra, maybe certain biases and a few other things coming from genetics / evolution; but the bulk of how it works is brute-force, and not particularly "deep".

He also says he doesn't seem to understand how artificial neural nets learn (perhaps even how to "extrapolate"), despite how simple the training algorithm. That doesn't mean they don't work, and work in ways similar to the brain.

Obviously, you need a few additional biases or priors; because, for example, "learning" the "right" function in some context would be almost impossible without it. One can easily come up with pairs of "simple" functions that are almost identical, where one is rather more natural than the other, from the human perspective. And if you don't have those biases, you won't know which one to pick!

And about his comment on "extrapolation": that's a word that gets tossed around a lot; but, for example, Tom Dietterich once Tweeted:

https://twitter.com/...470125053374464
 

And of course the way you extrapolate is to change the representation so that extrapolation becomes interpolation. "No extrapolation without representation"


Yann Lecun responds:
 

Representation: that's what deep learning is all about.


Dan Roy tweets:
 

Exactly.


 
For example:  let's say you want to "extrapolate" in the 2D plane.  e.g. you have some points on a line, and you want to know how the line "continues" as you go towards infinity.  By doing a sterographic projection of the extended complex plane to a sphere, you can map that line to a circle.  And, now, if you have enough points on the line / circle, you can "interpolate", to complete it.  And so, extrapolation has become interpolation.
 
But there is something you've done in setting this up that has to be investigated more closely:  there is more than one way to "interpolate".  If you interpolate in the wrong way, you won't get a circle, or anything like it.
 
Well, so what means is that "interpolation" is more mysterious than it seems at first; in fact, it's just as mysterious / difficult as "extrapolation"!  (Actually, you can push the mysteriousness /difficulty into the representation you use, and always interpolate / extrapolate in the same trivial way.)
 
So, when people say, "it's just doing interpolation, not extrapolation"; they are not being precise.  What they mean is that it's just doing interpolation in a very specific sense that it is well-understood and "easy" to come up with, with respect to a background representation space -- not in some more complicated sense.
 
....
 

That would be remarkable. Let's look:


Yes, let's look at what I actually said, in context:
 

People have applied neuroscience techniques to map individual neurons in a large language model neural net to neural population in the human brain; and the match is far better than any model of the brain anyone has ever come up with -- shockingly good:

https://www.abstract...sentation/66073

What that says to me is that models like GPT-3 are not just "memorizing" like how skeptics seem to think.


What I cited was a conference poster abstract by Alex Huth and his student (he has similar works on LSTM models, that are actually published), where he uses it to build an encoding model. An encoding model maps stimuli to brain responses. One way to do this is to feed the stimuli into a neural net, and then for each voxel, try to find a linear combination of the neuron states as the stimuli are fed in, to predict brain responses. And he claims that Transformer neural net language models -- not specifically GPT-2 or GPT-3 -- do much better at this prediction than any other method. And then he goes on to claim that:
 

We find that attention heads in each layer behave very differently and seem to encode diverse information, predicting distinct parts of the cortex. These results suggest that the transformer can effectively capture diverse semantics and varying levels of compositionality, allowing it to successfully predict responses across much of the cortex and providing deeper insight into how the brain uses compositionality for language understanding.


That doesn't sound like "memorizing" to me.
 

Fundamental scaling limitations at this level would not seem to fit with the idea of general abstraction or human-type cognition.


Tierman, as in the Zdnet staff writer Tiernan Ray?:

https://www.zdnet.co...-models-for-ai/

This issue of "scaling" has been discussed on other forums. It's mostly a question of whether theoretical/empirical (equations derived from empirical tests on small # of parameters and datasets) predictions of performance on a number of tasks will continue, as you scale up the resources.

See page 17 of this paper:

https://arxiv.org/pdf/2001.08361.pdf

Look at Figure 15. What they have are two curves showing how test loss responds to increases in compute.

Two things:

1. The place where the predictions have to break down is 10x to 100x beyond the current levels (according to Gwern, and I think also nosalgebraist).

2. That doesn't mean that the system won't keep improving. It just means that they aren't able to predict what one expects to see.
 
Of course, when they switch to multi-modal data, completely different scaling prediction curves will need to be developed.
 

I wrote the theory for abstraction in 2016, but, because of specific conflicts, still have not submitted it for publication. So, I would naturally be interested if someone had an implementation.


You sound more like one of these classic AI people circa the year 2000, or even 1980s, like CNOT on the old Kurzweil forums. "Theory" of that sort is probably not going to get anywhere. It will just get bulldozed over.

#10
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,015 posts

 

 

A lot of the old views about how the brain works are wrong.  Take, for example, the motor cortex.  It has a lot of neurons (I don't know exactly how many); but it exhibits what is called "low-dimensional dynamics".   This means that you can effectively emulate large parts of its functioning using much less computation than it would appear that you would need.  In fact, relatively simple neural nets, with just a few hundred artificial neurons, can emulate the behavior of the primate motor and pre-motor cortex in reaching and grasping tasks:

https://www.biorxiv....2884v1.full.pdf

I'm not exactly sure why there is so much redundancy built in to the brain, but it could be used as a type of error-correction mechanism: as neurons die, if the motor cortex only had a small number of neurons, the effect would be large, impairing the ability to move accurately. But if you have a very large number of neurons to work with, then losing 1% of them will have much less effect.

It is believed that the brain uses "population codes" to do lots of things, besides motor control.

Also, a large chunk of the brain is devoted to things like keeping the heart beating, and regulating breathing.

If you add it all up, mammal brains probably do much less processing than one would think.

I don't challenge your point. The human brain does about 10^16 calculations per second, but it is so inefficient that a human-level AI could, in theory, only need to run on a computer that could do 10^15 calculations per second or less. 

 

However, I do challenge any notion that the first human-level AI we create will not need hardware that is at least as powerful as a human brain. In fact, I think the first AI will probably require a supercomputer that is much MORE powerful than a human brain, just to achieve the same level of general intelligence as one human. This would be in keeping with the pattern of the first examples of any new technology to be inefficient and barely functional. Consider steam engines. The first, commercially successful one was the Neucomen Engine, and it was huge, inelegant, and had a terrible energy efficiency of 0.5%. 

 

I predict the first AGI will, thanks to inefficient algorithms, need hardware well in excess of what a human brain is capable of. As time passes and the technology develops, the algorithms will improve and other software shortcuts will be found, and we'll get AGI programs that could run on a supercomputer from the year 2000. 

 

https://www.youtube....h?v=HC6LUWSBXjk



#11
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,961 posts

I'm not sure how "inefficient" they will be in the not-too-distant future.  Interestingly, Joscha Bach said something in a podcast episode posted 2 days ago that echos what I said about the motor cortex requiring few neurons or parameters to do complicated things.  It's 2 hours, 8 minutes, 41 seconds into this video:
 
https://www.youtube....SZrBM&t=2h8m41s
 
He specifically talks about walking, but it applies to all motor tasks -- and all combinations.  He goes on to discuss "common sense reasoning", and how it also takes relatively little information.  I'm not sure I agree with the suggestion that it can be broken down into clean concepts that one can articulate; a lot of it is implicit, and "subsymbolic".  But I think he's right that it's not really that much information.  But the old ways of using hand-designed "knowledge representations" isn't going to cut it.  Machine Learning is proving to be the best way.

I think one way that some of this motor control capability can be transferred efficiently out of the brain is to use BCIs. There's also some evidence for it, in primate experiments.

One can also bias models to make them more human-like using BCI data, in combination with a method called "Representational Similarity Analysis". This is showing good results.

Overall, I'm not sure how "inefficient" approaches for ever more capable AI will be in the near future. I think the efficiency is going to improve at a faster rate than in the past.

....

 
And if common sense reasoning -- another thing Bach discusses, as I said -- can be addressed sufficiently well, that's a big chunk of the challenge A.I. has been pursuing for ages.

Are we making progress?
 
Well... Gary Marcus, Ernie Davis, and others recently wrote an article on the Winograd Schemas Challenge, commonsense reasoning, and where we stand today; and they make an important concession near the end, basically saying that Levesque's thesis that the Winograd Schemas Challenge is sufficient to measure commonsense reasoning ability, is falsified:

https://arxiv.org/abs/2004.13831
 

Levesque et al. [2012] claimed that because of the use of twin sentences, “clever tricks involving word order or other features of words or groups of words will not work [emphasis added].” This prediction has been falsified, at least as far as the dataset produced with that paper is concerned. The paper did not anticipate the power of neural networks, the rapid advances in natural language modelling technology resulting in language models like BERT, and the subtlety and complexity of the patterns in words that such technologies would be able to find and apply.

The systems that have succeeded on the Winograd Schema Challenge have succeeded on the pronoun disambiguation task in small passages of text, but they have not demonstrated either the ability to perform other natural language understanding tasks, or common sense. They have not demonstrated the ability to reliably answer simple questions about narrative text [Marcus and Davis, 2019] or to answer simple questions about everyday situations. Similarly, text generated using even state-of-the-art language modeling systems, such as GPT-2, frequently contains incoherences [Marcus, 2020].

The commonsense reasoning and the natural language understanding communities require new tests, more probing than the Winograd Schema Challenge, but still easy to administer and evaluate. Several tests have been proposed and seem promising. The problem of tracking the progress of a world model in narrative text is discussed by Marcus [2020]. A number of proposed replacements for the Turing Test [Marcus et al., 2016] likewise draw heavily on various forms of commonsense knowledge.


I have not looked at the [Marcus 2020] paper cited here much about testing models, but know he (Marcus) has sort of treated some of them like Turing Test subjects in the past, which is not the best way to probe them (looking at internal representations is better, but is a lot of work). Even using that methodology, however, I wouldn't be surprised if GPT-3 does a lot better than the previous models Marcus tested -- though he'd probably say, "Yes, getting 10,000 feet higher up the mountain is progress, but it's not going to get us to the moon."

Hasson said in that podcast interview I posted before that people will keep saying, "Ok, well the model solved that problem... but there's this one over here it can't solve."; and then if that is one solved, the goal posts move again. He said that, to him, all that matters is the behavior. Don't try to second-guess it, and compare it to what one thinks the brain is doing, as one might be very badly wrong (and the brain is not the "great understander" we think it is). If it acts intelligently, then just assume that it is intelligent.

What is going to happen is that some computer scientists will start to believe that what they're building is really intelligent. Not all at once. It will start with some small percent, and then over a period of years, the percent will shift higher and higher.  At no point will they say, collectively, that "true AI" or "sentient AI" or "AGI" or whatever term you want to use, has been achieved.


The next iteration of the GPT series (maybe GPT-4, if they name it), might be able to do things like this:

https://www.futureti...obots/?p=282720

so will be another couple thousand feet up the mountain, as it were.

Already, some who were previously major skeptics are seeing signs of commonsense reasoning in GPT-3, and will see even more in that next iteration. One former skeptic is Nasrin Mostafazadeh, who worked for David Ferrucci's Elemental Cognition (and now works in a "secret startup"). She got her ph.d. studying classical knowledge representation approaches to story understanding, as I recall. That's a GOFAI approach. Here's her Tweet thread from about 2 weeks ago:

https://mobile.twitt...216675248144390
 

GPT-3 with 175B params,10x any prior LM,is out.Setting aside controversies around implications of training such ever-larger models, the fact that they show high zero-shot performance on challenging reasoning tasks is pretty interesting. Ppl often ask me how I feel about all this:

Back in 2016,when we developed Story Cloze Test (SCT) with the goal of pushing commonsense reasoning,we intentionally didn't provide a direct-supervision training data,to prevent the task from becoming yet another pattern-recognition/memorization task,& actually require reasoning

Then in 2018, GPT-1 came out, setting SCT SOTA to ~86% with finetuning. Was it doing reasoning? We thought likely it had just done a great job picking up the hidden intricate biases of our narrow dev set, so tested it on our new debiased SCT v1.5. The high performance persisted!!

This was when I actually started believing that these large pretrained models are potentially a good chunk of our way forward for encoding large amount of world and commonsense knowledge into our AI system. I talk about this more with
@samcharrington here

Now we are in 2020, and a ridiculously gigantic LM has picked up enough background knowledge about the world, that it gets ~83% on SCT right out of the box! Is this how having some level of commonsense looks like? I'd say YES! We've made a small step forward, but of course...

...have a very long way to go! Back in 2016 I believed that "predicting what happens next" in a short story is a great test for commonsense,& I still do,but all AI benchmarks are inherently narrow,by design,& we have to keep moving the landmark even if we get to 100% on any task!



#12
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,015 posts

 

 

I'm not sure how "inefficient" they will be in the not-too-distant future.  

Here's another precedent for what I think will happen:

 

 

Kasparov, who lost his world champion title to Vladimir Kramnik in 2000, is still ranked as the world’s best chess player. He was famously beaten by IBM’s computer Deep Blue in 1997, but went on to draw against computer program Deep Junior in February 2003. Kramnik drew against Deep Fritz in October 2002. Fritz has beaten Deep Blue and Deep Junior in the past.

 
The Fritz software runs on four parallel processors and is capable of analysing about two million moves per second. This is a sharp contrast to Deep Blue, which ran 256 processors and could calculate around 200 million moves per second.
 
But Fritz uses its power more efficiently by probing the outcomes of only a few interesting moves deeply, while the less interesting ones receive only a shallow analysis.

https://www.newscien...with-x3d-fritz/



#13
scientia

scientia

    Member

  • Members
  • PipPip
  • 10 posts

Excellent post, scientia. That's a lot for me to digest. Off the bat, I have one question. 

 

How do you square that with what this guy says?

I suppose I would have asked you if you noticed all the things he left out, the distortions in the graphs, the pleading without evidence, etc. But that would probably be a waste of time.

 

starspawn0
You sound more like one of these classic AI people circa the year 2000, or even 1980s, like CNOT on the old Kurzweil forums. "Theory" of that sort is probably not going to get anywhere. It will just get bulldozed over.

To be honest, you sound just like the dozens of people who've told me how cutting edge Integrated Information Theory is and how it will change everything. I'm still waiting for that. I'll leave you to it.



#14
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,961 posts

To be honest, you sound just like the dozens of people who've told me how cutting edge Integrated Information Theory is and how it will change everything. I'm still waiting for that. I'll leave you to it.



I recall reading Scott Aaronson's takedown of IIT, where he challenged the notion that the computation can't be organized in space in certain ways, in order to be "conscious"; that the organization somehow matters, switching some systems from being more conscious to less conscious, just by reorienting the parts.  I agree with Scott.  Another theory that will get bulldozed over in time.

#15
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,015 posts

 

Excellent post, scientia. That's a lot for me to digest. Off the bat, I have one question. 

 

How do you square that with what this guy says?

I suppose I would have asked you if you noticed all the things he left out, the distortions in the graphs, the pleading without evidence, etc. But that would probably be a waste of time.

 

What are the three biggest mistakes or omissions he made? 



#16
davea0511

davea0511

    New Member

  • Members
  • Pip
  • 1 posts

Interesting discussion.  I'm sorry, but I think though we may have the hardware to do 10^16 computations, we should keep in mind that when it comes to conscious decision making the subconscious is dumber than a box of rocks, and most of your brain is for that involuntary stuff.

 

When was the last time you had a dream and woke up and thought ... whoa that was brilliant.  More like, whoa my subconsciousness is messed up.  Have you ever written down a "great idea" at 3am to later read it and realize that your subconsciousness is clueless?  I have.  It's really good at creativity.  But at performing calculations and making sense of stuff?  My subconsciousness is an idiot.

 

And I think that's true for most people, or we'd all just go to sleep and let our brains figure out stuff for us between the hours of 10pm and 6am.  But we don't, because most of our brain is involuntary stuff ... the stupid part of us.  Really, doing valuable and reliable mental work has to be done consciously, and compared to computers our conscious decision is nowhere near 10^16 computations ... more like "uh, carry the one ... uh what was I doing?"

 

Another thing, does it require consciousness, self-awareness? I may be wrong, but I think dogs have that and their brains are small. Which leads me to believe that at least consciousness does not magically happen once you reach a certain speed and quantity of operations.  There's something else going on that even a dumb dog's brain does.  A pet can learn how to open a door, or suss out a complex way to escape a backyard better than a Boston Dynamics dog hooked up to Watson.  So if you're looking at calculations per second, the bar's pretty low. Not even a human brain is needed.

 

All this points to us just simplifying the problem too much ... we're missing something.

 

Regarding that, when quantum computing becomes the norm, ubiquitous and cheap, everything could change on a dime.  That's probably still a dozen years out or more, but I think we have some time.



#17
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,015 posts

I started this thread to talk about human-level intelligence, not consciousness. I agree with you that some animals are probably conscious, so that ability is largely independent of intelligence level. 



#18
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,961 posts

I think consciousness plays a role in how people assess progress in AI, and the concept is intimately linked in their minds to intelligence.  It often makes an appearance through the word "understanding" -- e.g. when people say that systems like GPT-3 "have no understanding of language" or "it's just memorizing" (where "memorizing" is taken in some nebulous and more generalized sense than one is accustomed to).  When you press them about what they mean by "understanding", they usually either change the subject or say something like that, "Well, that's really the whole problem.  We don't have an operational definition of what we even mean by `understanding'.  Coming up with a good definition is also an important challenge!"  
 
What I think is going on in their mind is something like the argument of Searle called "The Chinese Room Argument":
 
https://en.wikipedia...ki/Chinese_room
 
The basic setup is: imagine you have a room, occupied by someone who doesn't know Chinese. All they have is a rulebook to help them decide the appropriate Chinese characters to write down when presented with Chinese characters that state the question / query. Searle argues there is no real understanding going on here, just like in computers that process language.

This way of thinking is so compelling that it even caused a massive Twitter debate by Natural Language Processing (NLP) researchers about two years ago, where they debated whether or not language models "understood" language. Here is a link to this massive debate (just one part of large number of threads):

https://twitter.com/...037792018587649
 

I'd argue that it's possible to build a satisfying approach to semantics with only entailments and no grounding. This is a category fight, so I won't push to hard, but I'm not the only one—have a look at work on natural logic for some very formal attempts.


Here's a "behaviorist" way of looking at the problem: let's suppose I have a lot of training data about how humans use language, and let's say that it's just text. Trillions and trillions of words of text. And let's say I feed it into a large statistical model, and ask it to construct a generic "brain" that could generate this type of output (through writing or speaking). So, we map

Word associations --> An abstract model of a brain.

Now, you have so many words, and so many associations, that not only can the model emulate human grammar and syntax use, but it even learns visual representations. This may seem impossible; but people have shown before that you can recover visual information from word statistics. For example Max Louwerse et al have shown how you can recover geographic information, and even knowledge about the relative location of body parts, just from word statistics.

We could add images and video to the model, if we want to make it more "grounded". But I specifically chose only words to make the point that: at the end of the day, all we care about is that the model gets the right bits that describe and define knowledge about the world, in different modalities. There isn't some mysterious "soul" hovering over the bits that says that this bit is an "audio" bit, and this bit is a "video" bit. Bits are bits; and if you can get the audio bits through text associations, so be it! This is what the bits of audio look like in the computer:

1010101110101011011110111001000110111010100...

and this is what the video bits look like:

1110110000101001101101101101101011011011010...

They look the same; and it doesn't matter how you come about them.

....

The failure to understand this basic point is part of the reason why some people underestimate AI progress. Their intuition tells them that those bits in the computer have to have experience behind them. They can't just be abstract things picked up from correlations. And, therefore, strong AI is 100 years away, because we don't yet have bits that can feel.

A similar hangup exists in assessing "art": let's say you have a priceless painting, by Rembrandt. Now let's say you make an almost perfect reproduction, and hang it in a museum. You even tell people it's not the original; it's only a reproduction. Some large percent of the population will feel cheated. They came there to see the original Rembrandt, and you've shown them a fake. There's something about seeing the original, to them, that matters.  They feel there is more than just the dull information of the painting; it matters to them where the information came from.

In fact, if all of the art of the world were thrown in a pile and burned, we would not be deprived of the "art"; because there are digital copies everywhere of all the masterpieces, and we could reproduce it. Likewise, if famous buildings of Europe are demolished, we could rebuild them.



#19
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,961 posts

Here's a brand new result, by some researchers at MIT:
 
https://mobile.twitt...832575022137344
 
Paper:
 
https://www.biorxiv.....06.26.174482v1
 
The punchline is that OpenAI's GPT-2xl model (they didn't get the chance to test GPT-3), shows remarkable correlation with the brain when performing the same tasks.  They say:
 

We tested 43 language models spanning major current model classes on three neural datasets (including neuroimaging and intracranial recordings) and found that the most powerful generative transformer models (Radford et al., 2019) [OpenAI] accurately predict neural responses, in some cases achieving near-perfect predictivity relative to the noise ceiling.


Of course, they only tested the model's ability to predict brain responses in language areas of the brain. I wouldn't be surprised if they could also predict responses in other parts of the brain. It's a bit of a heresy, given that GPT-2 was only trained on text-prediction, to say that it could predict responses in auditory, somatosensory, motor, visual, and olfactory areas, but I wouldn't doubt it -- in fact, I think this is what Alex Huth et al's work might show.

GPT-3 probably mostly closes the remaining gap in predictability of the language areas of the brain, though they didn't get to test it (as it was too new). Using brain data, directly, to improve these models might close the gap in all the other areas.

What this says to me is that we are making a lot more progress on AI than you might think, given pessimistic takes in the media.  More compute and data might be all -- or almost all -- we need!



#20
R8Z

R8Z

    Member

  • Members
  • PipPip
  • 15 posts
  • LocationGermany (soon to be uploaded to a Matrioshka brain)

[,..]

What this says to me is that we are making a lot more progress on AI than you might think, given pessimistic takes in the media.  More compute and data might be all -- or almost all -- we need!

So, what this sort of means is that we could plug the Neuralink output into the training model and voila, we have a modeled, sort of predictable human brain?

Given the fact it wasn't even GPT-3, if this is the case, this will be big, very big.

The training scenarios would be interesting to try to imagine, by the way, how would we tell the models what the person was doing to match the patterns with the outside world. Would we include additional input from cameras, microphones and sensors in the training data? Interesting to think about it.






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users