Jump to content

Welcome to FutureTimeline.forum
Register now to gain access to all of our features. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, post status updates, manage your profile and so much more. If you already have an account, login here - otherwise create an account for free today!
Photo

What 2029 will look like


  • Please log in to reply
133 replies to this topic

#21
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPip
  • 769 posts

If it weren't for the ridiculous topic, I wouldn't have suspected a machine wrote that article; I would have thought it was a somewhat awkward translation of an article written in Spanish to English. 


  • Yuli Ban, Erowind and starspawn0 like this

#22
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts

As I said, human-level media-synthesis is almost here.

 

This system is actually even more impressive.  Without any additional training it can translate, read and answer comprehension questions, and scores 7 points higher on the Winograd schema (commonsense reasoning test) than the state of the art, reaching over 70% accuracy.  

 

And they want to train it with even more data!...

 

And, as @gwern pointed out on my forum, they didn't even use state-of-the-art Transformer neural net technology.  What if they did, and magnified the data 100x?


  • Casey, Yuli Ban, funkervogt and 1 other like this

#23
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts
Given how well we now know text-synthesis "language models" perform, when scaled-up, we might see something like a smart conversational system demoed at Google I/O 2019 in May. Last year the main draw was Duplex -- a super-good conversational system is one of the few things I could see them demoing that would top it. If they do, maybe they will even give it a voice, so that you can talk to it like talking to a normal person, and it will talk back in a very natural voice.

Imagine Sundar Pichai unveiling this year's version of Duplex: the video rolls, and it shows a guy talking to a Google Home, that responds in a Duplex-like, super-realistic voice. They have a little chat about politics, film, and sports. And it's all so human-like, and natural; and the thing seems to have a decent grasp of context, remembering what the guy said a few minutes ago.

The whole thing is a good 5 minutes long -- many rounds of conversation.

The audience gasps all through the video, and then when it ends, they nervously clap. The room fills with noise as people talk to their neighbors about what they had just seen. Pichai says, "We are still working the bugs out. It's not yet ready for the public, but we plan to make it available to our Pixel 3 and soon-to-be-released Pixel 4 phones."

....

Who at Google would build it?

I could see someone like Quoc Le attempting to scale up OpenAI's work, using a lot more data taken from online conversations (that Google has access to), and using better-performing neural nets. He was one of the guys behind "A Neural Conversational Model" four years ago, that used orders of magnitude less data than OpenAI's project, yet still got eyebrow-raising results. And he recently wrote a paper on what kind of commonsense and world knowledge large neural nets trained on massive text data absorb. He likes to scale models up, until they break.

I could definitely see a Neural Conversational Model 2.0 absorbing lots of world knowledge, commonsense at the level of 75% to 80% accurate on Winograd Schema challenge, ability to write little stories and poems in response to a comment (if it is appropriate), ability to detect and respond to sarcasm and emotion, translate between languages (if there are enough examples), do basic question-answering, answer reading comprehension questions, solve simple logic puzzles, answer trivia questions, basic analogical reasoning, and all with high levels of coherence and with a consistent personality.

It would not necessarily be super-accurate on all of these; but it will be much better than random guessing, and better than many baseline models for individual tasks. On some, like Winograd Schema, it would be state-of-the-art.

One thing I have pointed out before (on other forums), is that there are a lot of educational apps and chatbots that could be used for training LMs in additional skills. A large company (like Google) with access to a large number of chat logs as people interact with chatbots to teach them history, biology, math, physics, etc. could add that to the corpus to train the language model. There's a good chance it would learn some of the logic behind many of these skills, not just to imitate the language superficially. There are probably gigabytes of potential educational app chat logs out there.

Furthermore, we will see replications of OpenAI's work from teams in China. They like to wait and see what U.S. companies come up with, and then replicate it and improve, using more data and compute, together with more fine-tuning.
  • Yuli Ban and Jakob like this

#24
Yuli Ban

Yuli Ban

    Born Again Singularitarian

  • Moderators
  • PipPipPipPipPipPipPipPipPipPipPip
  • 20,662 posts
  • LocationNew Orleans, LA

^ If this is true, it means the Turing Test can conceivably be passed well before 2029, possibly even before 2025. Which also means we will need a modified test if we're still trying to test for general intelligence.


  • Casey and starspawn0 like this

And remember my friend, future events such as these will affect you in the future.


#25
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPip
  • 769 posts

^ If this is true, it means the Turing Test can conceivably be passed well before 2029, possibly even before 2025.

Don't even whisper it, friend. 



#26
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts
There are lots of opinions about what is possible for a Language Model to learn from pure text. I've said before that people who think, "It's just using text. NO WAY it can learn perceptual knowledge about the world!" are wrong. There is "embodied world knowledge" to be found in pure text, in the form of word co-occurrence statistics. It's just that you need a lot of text to extract it, as text is a low-quality data source for that kind of information. Here is a related ph.d. thesis:

https://pure.uvt.nl/..._30_06_2016.pdf

Another framework that incorporates linguistic and perceptual processing is the
symbol interdependency hypothesis (Louwerse, 2007). This hypothesis states that
conceptual processing can be explained by both embodied and symbolic mechanisms,
although it focuses on different aspects than Mahon and Caramazza (2008). There are
three components of the Symbol Interdependency Hypothesis. First, perceptual
information is encoded in language. This aspect differentiates the symbol
interdependency hypothesis from the previous “hybrid” approaches by assuming that
many of the benefits previously found to support embodiment theory is actually already
encoded in the language itself. By using language analysis tools, such as latent semantic
analysis, language can be used to predict semantic relationships, as well as temporal and
spatial relationships (Louwerse, Cai, Hu, Ventura, & Jeuniaux, 2006). Therefore, the
facilitated activations that have previously been ultimately attributed to perceptual
simulation can be attributed to language itself, at least in a significant number of cases.

Second, language users rely on language statistics and perceptual simulation during
cognitive processes. Zwaan and Yaxley (2003) found that iconic orientation does
facilitate judgment (e.g., when a participant sees basement over attic rather than the
reverse orientation). However, it was more recently found that the order of word cooccurrences
can also facilitate this judgment (Louwerse & Jeuniaux, 2008). Louwerse and Jeuniaux
used the same paradigm as Zwaan and Yaxley (2003), with using iconic
and reverse-iconic relationships with word pairs. In one experiment, it was again found
that iconicity facilitated judgment. However, this facilitation was also found for the
words that occurred more frequently together (i.e., as determined by Latent Semantic
Analysis). This demonstrated that language use can also impact cognition, alongside
embodied cognition. In a second experiment, the materials and procedure were similar,
however the instructions differed in that participants were instructed to make a lexical
judgment. For the second experiment, there was again a significant effect of iconicity and
semantic relationship. In a third experiment, the same items were presented horizontally
and half of the participants were instructed to make a semantic judgment and the other
half were instructed to make a lexical judgment. Support was found for semantic
relatedness, however not for iconicity. These findings were explained in terms of depth of
processing. Semantic relatedness requires deeper processing than a lexical judgment.
Therefore, the situation, such as whether quick or deep processing is more necessary, can
influence which kind of processing takes place.

Finally, the dominance of either the embodied or symbolic system is dependent on
the type of task and stimulus. The symbol interdependency hypothesis posits that there is
an interdependence between the (presumably amodal) linguistic symbols, as well as the
perceptual references that those symbols represent. Furthermore, it has also been shown
that there are situations that can influence whether more symbolic or more perceptual
cognition will be used. Louwerse and Jeuniaux (2008; 2010) were able to demonstrate
that symbolic cognition will dominate in the early stages of cognition; whereas when
deeper cognition is necessary or more time is available, perceptual cognition will be more
utilized. The participant relied on whichever system that was most efficient. The
evaluation of an unusual orientation facilitated turning to another system, statistical
linguistic frequencies, that was more efficient to process distance judgments. In
summary, grounded cognition has been supported in many domains, but certainly not in
all circumstances.

In short, language can be used as a shortcut to more efficiently process cognition in
some situations. We use the symbolic system to garner a fuzzy, good-enough
representation that can facilitate cognition. This system still accounts for the perceptual
approach, when more thorough processing is required. Therefore, the Symbol
Interdependency Hypothesis takes into account previous embodied cognition findings,
however it also provides for a fuller approach when pinpointing how language processing
occurs. Now that it has been demonstrated that language itself can influence cognition
beyond perceptual experiences, it is necessary to test this possibility further by showing
the impact of language systems.


  • Yuli Ban likes this

#27
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPip
  • 769 posts

 

I predict the following will be true by 2029:

  • "Foldable" smartphones will be commercially available. Folded up, they will be the same size as today's smartphones, but you'll be able to open them like a manila folder, exposing a larger inner screen. They will obsolete mini tablets. 
  • Augmented reality glasses that have fixed Google Glass' shortcomings will be on the market. The device category will come back. 
  • It will be cheaper to buy an electric version of a particular model of car than it will be to buy the gas-powered version. 
  • China's GDP will be higher than America's. 

 

 

I'm two months away from being right:

 

Samsung’s foldable now has a name, the Samsung Galaxy Fold, and the company is revealing more about what this unique smartphone can do. Samsung is planning to launch the Galaxy Fold on April 26th, starting at $1,980. 

 

https://www.theverge...ze-announcement


  • Yuli Ban likes this

#28
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts
I highly recommend this Slate Star Codex piece:
 
https://slatestarcod...l-intelligence/
 

The point is, GPT-2 has faculties. It has specific skills, that require a certain precision of thought, like counting from one to five, or mapping a word to its acronym, or writing poetry. These faculties are untaught; they arise naturally from its pattern-recognition and word-prediction ability. All these deep understanding things that humans have, like Reason and so on, those are faculties. AIs don’t have them yet. But they can learn.

....

Nobody told the model to learn Chinese history, cell biology, or gun laws either. It learned them in the process of trying to predict what word would come after what other word. It needed to know Sun Tzu wrote The Art Of War in order to predict when the words “Sun Tzu” would come up (often in contexts like “The Art of War, written by famous Chinese general…). For the same reason, it had to learn what an author was, what a gun permit was, etc.

Imagine you prompted the model with “What is one plus one?” I actually don’t know how it would do on this problem. I’m guessing it would answer “two”, just because the question probably appeared a bunch of times in its training data.

Now imagine you prompted it with “What is four thousand and eight plus two thousand and six?” or some other long problem that probably didn’t occur exactly in its training data. I predict it would fail, because this model can’t count past five without making mistakes. But I imagine a very similar program, given a thousand times more training data and computational resources, would succeed. It would notice a pattern in sentences including the word “plus” or otherwise describing sums of numbers, it would figure out that pattern, and it would end up able to do simple math. I don’t think this is too much of a stretch given that GPT-2 learned to count to five and acronymize words and so on.


One class of patterns it should be able to learn is how to hold a conversation. The dance of turn-taking is complicated, but not so complicated that it can't be learned.

I think the system will have to be told, though, the context behind the documents in its training data -- e.g. when they are written. It will have to know that, for example, the term "president" refers to different things, depending on when the document is written, and who it is written for. This can be added, so I don't think it will be a problem.
  • Yuli Ban and funkervogt like this

#29
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts
Let's say that Google did try to build a smart conversational agent, where would it get the data?

I think they could pull it from YouTube videos: over a year ago, it was announced that Google had auto-captioned over 1 billion videos! And, given that people upload about 300 hours of video each minute, I would guess there are probably something like 50 billion or more videos; and perhaps billions of hours of video that could potentially be auto-captioned.

Some fraction of these videos will not be of sufficient quality to do good auto-captioning, even if professional human transcribers tried to do it. But I would guess something like 1 billion hours of data would be of sufficiently high quality to make highly-accurate transcriptions.

Imitating natural conversations -- not conversations on text chat or social media, where people have time to think before they write -- will require a small subset of skills needed to write stories and news articles; so, probably a dataset the size of OpenAI's, devoted exclusively to natural conversations, would be enough to build a system capable of passing a Turing Test over 5 minute conversations. And even 100 million hours of conversations would be enough -- people speak at about 150 words per minute, so that would add up to (100 million)*150*60 = 900 billion words, which is over a terabyte of text (each word takes more than one byte). Even if you throw away 95% of that data (maybe only 5% of the words are part of a conversation, versus a monologue or something), it still would add to nearly 50 billion words of data.

Then, using some algorithmic improvements (better Transformer neural nets), that data would go a lot further, resulting in an even more accurate conversational system.

....

I should point out, though, that Google might not be able to productize such a system, since there might be privacy issues involved. There is also the problem that all those conversations might contain foul language or racially or gender-insensitive comments, and you wouldn't want the conversational system to echo that.

Assuming the privacy issues could be dealt with, they could probably add multiple layers of filtering to screen out the bad comments. Even if 1 in 10,000 conversations resulted in a racist remark, it would be a disaster for Google. If they did release it to users, they would have to be very up-front about this possibility, and even give constant reminders so that people don't freak out about it.

Another way they could reduce these kinds of problems is to carefully curate the videos, restricting to a set of YouTube accounts of people "in good standing" -- e.g. the people who subscribe to their channel are "good"; all the comments they leave on YouTube are "good"'; and, in general, there is no sign that they post racist or sexist stuff to any media account Google has access to, and also post a lot of "good" comments.

Given that the average person is "good", I would imagine they would still be left with a lot of data to work with.

So, yeah, Google should have the data to build a really good conversational AI capable of natural human conversation, not just abbreviated social media or text chat. The outputs could be sent to a speech synthesis system like the one used in Duplex. Perhaps, in addition to the raw text, the conversational system could also be trained to predict the prosody of the text (the data to do this would be present in the YouTube videos).

The end-result would be a conversational system something like Samantha from the film Her. It wouldn't be able to send emails for you or anything, but over about a 5 or 10 minute conversation, it might be hard to tell you are talking to a machine.
  • Yuli Ban and funkervogt like this

#30
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPip
  • 769 posts

Such a system could also make use of the tens of thousands of professionally made movies that feature ordered human dialog and have accurate captions. 


  • Yuli Ban and starspawn0 like this

#31
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts
Yes, you could use this to produce very good dialog for movie scripts. And perhaps another system could take these scripts and produce movies. Probably the near-term movie synthesis results would be pretty rudimentary, with blemishes, noise and things; but future versions might be as good as Hollywood films. I think it's going to be a while until we get to that stage (many years), but it will come, eventually.
  • Yuli Ban likes this

#32
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPip
  • 769 posts

How could I have forgotten about this for so long?

 

https://youtu.be/RTq74Ae94T4



#33
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts
These lines from the film Ex Machina could be closer to the truth than we realize:

http://www.slguardia.../Ex-Machina.pdf
 

NATHAN (CONT’D)

If you knew the trouble I had
getting an AI to read and duplicate
facial expressions... Know how I
cracked it?

CALEB

I don’t know how you did any of
this.

NATHAN

Almost every cell phone has a
microphone, a camera, and a means
to transmit data. So I switched on
all the mikes and cameras, across
the entire fucking planet, and
redirected the data through Blue
Book. Boom. A limitless resource
of facial and vocal interaction.

CALEB

You hacked the world’s cell phones?

NATHAN laughs.

NATHAN

And all the manufacturers knew I
was doing it. But they couldn’t
accuse me without admitting they
were also doing it themselves.


Big companies do record their users in order to improve their speech recognition and other systems. Perhaps they could use it to improve emotion-recognition, as well, or even conversational AI.

A few days ago it was revealed that Google Nest devices contain a hidden mic:

https://www.wired.co...curity-roundup/

Somebody at Google ambitious enough and with enough power might find a way to "turn on all the mics", in order to build a very good conversational AI.

Though, they wouldn't really have to, as I said, because there should be enough data on YouTube, that people have freely given; there would just be a lot more of it, if they turned on the mics.
  • Yuli Ban and funkervogt like this

#34
tomasth

tomasth

    Member

  • Members
  • PipPipPipPipPip
  • 228 posts
Wouldn't it better to have just the data from which to extract the generative subtrate , to spab the exact space huma generative substrare generate ?

Heaping any data like so much manure just to improve by a bit , and with wastfull computaion ; seems not ery advancing.

#35
waitingforthe2020s

waitingforthe2020s

    Member

  • Members
  • PipPip
  • 18 posts
  • LocationOrion Arm, Milky Way Galaxy.

By 2029, do you think will we have the ability to make movies by "feeding" a movie script and other details into a program? Will Media Synthesis be that advanced by then? 


I'm a radical demo-publiacrat.

 

This is Scatman's world, and we're living in it.


#36
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts

It will certainly be possible.  The only real question is how it will be built:  there are already programs that can take a script as input, and output a ruidmentary cartoon-like video:

 

https://youtu.be/Nqc9jcgPNmE

 

However, it uses a lot of hand-crafting + multiple NLP algorithms + computer graphics.  The output of this system could be fed into another system to produce more realistic videos.  

 

However, you'd like the whole process to be end-to-end, as that would enable a greater diversity of productions.  It might be possible by 2029, including the soundtrack and realistic dialog.  10 years is a long time, and much could happen between now and then!


  • Casey and Yuli Ban like this

#37
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPip
  • 769 posts

 

 

Let's say that Google did try to build a smart conversational agent, where would it get the data?

I think they could pull it from YouTube videos: over a year ago, it was announced that Google had auto-captioned over 1 billion videos! And, given that people upload about 300 hours of video each minute, I would guess there are probably something like 50 billion or more videos; and perhaps billions of hours of video that could potentially be auto-captioned.

It just occurred to me that Google could also get the necessary training data from Gmail. How people use "Smart Compose" in particular could help train a machine to pass the Turing Test. If millions of humans per day are using Smart Compose, that's a huge amount of human dialog. 

 

https://gsuiteupdate...ose-gsuite.html


  • Yuli Ban and starspawn0 like this

#38
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts
That would probably work reasonably well for what's called "short text conversation", where you have a single conversation round.  Some emails are part of a thread with multiple rounds -- that is more like what you want.  Basically, email threads that resemble text chat.
 
Natural conversations with voice are a little different, in that people don't have as much time to think about what they are going to say -- and that should make it easier on the ML algorithms, requiring less training data.  But, multi-round emails could also work, yes.
 
....
 
A few nice things about multi-round email:
 
* The "speakers" are nicely separated, and you don't have to worry about transcription errors.
 
* Most emails are generally of high quality, with not too many grammatical errors.
 
* Emails are nicely time-stamped.  That's important, for reasons I've mentioned before (e.g. the meaning of "president"  or "the Super-Bowl" depends on the year).  Emails can also be assigned a location / geotagged.
 
* Some emails contain comments about an attached image, video, and/or audio/music.  That could be useful for training the system to understand the contents of images, audio,  and videos; and could also improve its commonsense reasoning and world knowledge.  For example, you might see conversations like this:
 
Person1:  This is my baby. [Image attachment]
 
Person 2:  Oh, that's a beautiful dog.  What breed is it?
 
Person 1:  It's poodle mix.  I don't know the other breed.
 
 
Or, maybe:
 
Person 1:  What do you think about this?  [Video attachment]
 
Person 2:  Wow! [Car collision] Did anybody get hurt?
 
Person 1:  No.  Everybody got out with minor injuries.  I always tell people to SLOW DOWN near that exit when it rains.  

or:

Person 1: I was listening to this the other day. [Music attachment]

Person 2: Ugg... rap. I don't like rap.

Person 1: I wouldn't call it rap. It's closer to Industrial, but it's certainly also "rappy".

Person 2: Call it what you will. It's not my thing, bro.
 
I would guess you could include video, image, and audio features as input to the prediction model, and it would learn how to use them.
 
* There will probably be a lot more technical conversations; so, the AI could learn about medicine, biology, psychology, and maybe even some physics, chemistry, computer science, programming, and math.  For example, the system might learn to write little programs to solve a problem -- as there are a lot of emails with code.  I'd say there are even enough math proofs in emails that the system could write short proofs of mathematical statements -- nothing too complicated, though.  Still, that would be amazing if it could solve basic math problems.  OpenAI's system can already write buggy code; and it would probably do a lot better with more data.
 
 
I'd say it's entirely possible there could be a few terabytes of email text to train models, maybe 10% of which is in the form of multi-round conversations.  That might be enough to build a system capable of holding a conversation about a very broad range of topics, including technical ones, and even about images, videos, and audio.  
 
For example, you could ask the system:  "I've got a patient with 102 fever, 23 hematocrit, pale skin, watery eyes, cough.  What do you think they have?"  And the system might offer a suggestion, or ask followup questions, like, "Have they come in contact with wildlife?  [And ticks]" or something.  The system might pick up this talent from millions of conversations between doctors, or between health workers and patients.
 
You could also ask it for tech help, like, "I'm having trouble uploading videos to YouTube.  It gets stuck at 0% processing."  And then it might offer a suggestion on how to fix it.

Perhaps it could also dispense legal advice, as there would surely be millions of emails between lawyers and lawyers, and between lawyers and clients. "I'm thinking of suing this guy whose dog attacked my cat the other day. What should I do?" And it will offer some instructions on how to get started.

It might also be able to read and answer questions about spreadsheets -- again, there will be millions of emails with spreadsheets. "Here are the expense reports. Notice how much more expensive housing prices have gotten over the years." Perhaps a few modules could be added to the neural net to facilitate learning, like modules for doing basic arithmetic (addition, subtraction, multiplication, division, logs, exponentiation, trig functions).

....
 
The talents it might acquire from so much data could be hard to estimate; with terabytes of data, and a well-designed architecture, anything is possible.  The system might could serve as a general-purpose "Assistant" way better than anything on the market today. The privacy restrictions and the possibility of racist, sexist, foul, or mean / insensitive comments might greatly restrict what Google could offer, even if it's possible technically (and I suspect it is possible).

Chinese companies might be able to go where Google can't. I think fewer of emails in China have the kind of issues I described. Criticism of the Communist state might be the only thing they have to worry about -- but with enough filtering, perhaps they could all but eliminate it. I think some of the chatbots in China are trained using user data, and they are good about tiptowing around criticism of the Communist state. They didn't have the same problems as Microsoft's Tay.

The U.S. Government could build their own in-house version, incidentally... and it would probably do even more. They have access to even more data than Google. In fact, they have so much of it, they have a hard time storing it -- they'd have to throw away 99% of it to build models.
  • Yuli Ban and funkervogt like this

#39
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts
I can see a way Google might yet be able to get around the privacy issues of using Gmail to train a giant model (the issues being that such a model could spit out personal details from someone's Gmail by accident): they could use the Gmail data to build an in-house system, and then use that to locate high-quality training data that people have publicly posted on the web somewhere.

For example, they could search through images and videos having tags, and look to see if their model predicts the tag or caption has high probability of occurrence, relative to a baseline expectation, given the image. If it doesn't, then the image-tag pair is probably of "low quality", and can be discarded or used as a "negative example" of some kind.

They could also look to see if individual Reddit and other online discussion threads (which have been used as training data in the past) are of "high quality" -- again, if the model predicts they have high probability of occurrence, relative to some baseline expectation.

They could even do the same thing for YouTube autocaptions.

While this would get Google around the problem of using private data, much of the data still may be subject to copyright restrictions. Apparently this, too, isn't a problem, if it's done right:

https://www.reddit.c...al_as_training/

So... maybe Google and other companies have a way around privacy restrictions, as far as building super-good AI models that they can sell.

It could be just a matter of time before we see conversational AI systems considerably better than anything on the market, and that can even converse about audio, video and images. They'll handle legal questions, medical questions (users will be warned that they shouldn't accept its advice, that it's only for "entertainment purposes"; though, they probably will listen to it, nonetheless), answer basic homework problems in a variety of areas, entertain bored children and senior citizens, write poetry, write little short stories, summarize text, answer trivia questions, and so on. And all at a much higher level than Alexa and its 50,000+ skills; Alexa won't come anywhere near what such a system can deliver, at least not on its current trajectory, for another 5 years or more.
  • Yuli Ban likes this

#40
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPip
  • 1,234 posts
Humans that are not concentrating are not general intelligences:
 
https://srconstantin...-intelligences/

I partly agree, and partly disagree. There are kind of two extremes of thinking about and conversing about a topic: there's the information-rich variety where you throw together a lot of facts and ideas, but use relatively few inference steps in a chain. The reasoning steps have lots of conjunctions, disjunctions, and negations of terms, but relatively few "therefores". You might say that "the circuit depth is low"; the logic can be diagrammed with a fat-but-not-deep inference graph. I think this is the kind of reasoning you see in fields like medicine, a lot of the time; which explains why machine learning algorithms are good at diagnosis.

Then there is the kind that is skinny-but-deep. The individual claims are short, but there are a lot of "therefores". This is more like what you see in math; and ML algorithms will need a very large number of examples to go very far at proving theorems, unlike with medical diagnosis, as I have hinted at.

GPT-2 is good at the former, but probably not the latter. Though, probably with more and more training data it can learn skills requiring greater and greater depth of inference.

....

People keep using the term "pattern recognition". But I would not call all what language models do merely "pattern recognition". The do genuine reasoning; it's just that the chains are short, but can potentially involve a lot of terms.

You can do a lot with short chains. Commonsense reasoning involves short chains, for example.
  • Yuli Ban likes this




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users