Jump to content

Welcome to FutureTimeline.forum
Register now to gain access to all of our features. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, post status updates, manage your profile and so much more. If you already have an account, login here - otherwise create an account for free today!
Photo

What 2029 will look like


  • Please log in to reply
168 replies to this topic

#141
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts
More examples of the bot's output:
 
https://github.com/m...aster/README.md

Handling questions about things that happened after 2017 where the dataset ended:

Context: The trading war between China and US is still happening .

Bot: I'm sure the Chinese government and the U . S . will be fine with that as long as they keep their arms
on the table and the money flowing

Context: who won the world cup in 2018 ?

Not: Germany , I believe .

Context: Nvidia's Titan RTX is really good .

Bot: I have this card and i love it . No complaints .

Context: Can Porsche beat Tesla with its new Taycan EV ?

Not: I don't think so . The Tesla is more powerful than the porsche .


Some of these, though factually incorrect, are probably not generated randomly. There is a very large amount of world knowledge implicitly contained in the statistical relations of words, as I have pointed out before.

The bot's response to philosophy questions:

Context: What is the meaning of a good life ?

Bot: I think it's that we're all connected to our past lives and the meaning of life is to live the way you want to and how you want to .

Context: Which is more important -- being successful or being happy ?

Bot: Being successful is just endless . Being happy is the only thing that matters .

Context: Is peace the only way to stop war ?

Bot: Not if the world is not willing to work with you . Peace is the ultimate goal , but peace requires more than that .

Context: How much freedom should people have ?

Bot: The right amount of freedom is the freedom to do as you please , as long as you don't hurt people or property .


Those are pretty good! A few imperfections, but correct subject to the right interpretation!

Finally:

Context: What do you think of the DialoGPT repo ?

Bot: It's pretty good , but it's very easy to find a bug .



#142
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts

I thought I would mention a few things about the seemingly really good responses this model gives, and how they might be generated:

 

It could be that some of them are just cobbled-together from Reddit posts; but it could also be that the system is doing something closer to actual "reasoning" -- but maybe not what we would normally think of as reasoning; more like "intuition".  Let me explain:

 

Take the example, "Which is larger, the sun or the moon?"  It got the answer correct, saying "sun".  Maybe if you give it 100 similar questions, it will answer correctly at least 80% of the time -- that test would have to be performed to know for sure.  My suspicion is that it would be correct more than 50% of the time; and probably more than 70% of the time. 

 

How?

 

In learning statistical relationships between words, it encounters combinations like "the sun above...", "the large sun", "father sun, mother moon", and so on.  These combinations connect the word "sun" with words like "above", "large", "father", and so on, all of which signal the sun is something big and important.  The moon probably also has some of these associations, but they aren't as strong.  Then, among the millions of dialogs that involve asking "Which is bigger?", it learns a little computational gadget, which is that the bigger of the two objects is the one with the strongest association with big-sounding words.  Hence, it outputs "The sun is bigger." 

 

That doesn't sound very "smart"; but, actually, there is some evidence that humans learn a lot of intuitive world knowledge this way. 

 

Many of the other responses it generates are probably similar.  You might think it's just regurgitating text from Reddit -- but it actually could be applying a little gadget it has learned to all the statistical relationships among words and word patterns.

 

Take the question about the boiling point of water:  it says that it is 212 F, which is correct.  Again, it could be getting that from a sentence or two from Reddit.  But it could also learn associations between words "water" and "boil" and the number 212.  It could be there are thousands of posts that mention water and boiling, with a long list of numbers; where the most common numbers are 212 and 100, for the Fahrenheit and Centigrade temperatures.  It may actually learn more than just this weak statistical linkage.  For example, one of the little gadgets it learns may be a formula to map centigrade to Fahrenheit and vice versa -- it's a simple linear relation that it ought to be able to learn, given enough examples; though, it also has to learn how to map the numerical digits of a number to a variable value, which is certainly within its capacity.

 

If you iterate these relationships, looking for second-order correlations, third-order correlations, and so on, you can get deeper and deeper knowledge about the world.  It's not easy to appreciate just how deep it can be, until you test it out!



#143
tomasth

tomasth

    Member

  • Members
  • PipPipPipPipPip
  • 285 posts

Can it do explanation and transfer learning ?

 

I just hope people wont get carried away with future version of this that can do some commonsense and pages longer responses and confuse its abilities with the strong AI kind.

 

The better AI get the more people should remember the actual long road ahead.



#144
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts

I don't know what its limitations are, because I haven't seen it. But since it is built on top of GPT-2, and adds even more data and a reranker, I imagine its outputs can be pretty complicated. It can probably do some long, but not-too-long explanations, yes; just like how GPT-2 can write long blocks of text. And because conversational outputs are usually shorter, and don't require as much deep inference, they will probably be more accurate -- that's my guess, anyhow. The fact that the model can beat humans in the three categories relevance, contentfulness, and human-like, it has to be producing good outputs more than 90% of the time. Humans, after all, produce good outputs 90% of the time; so, if you had a bot that only outputted good stuff 85% of the time, say, it would lose in a head-to-head competition with a human for single-round conversations.

Now, there is a lot of stuff it certainly won't be able to do. This isn't an AGI. It's a very, very good socialbot -- better than any you've ever seen before, by a mile. Better than Cleverbot; better Xiaoice; better than those that have come before in ways it's hard to find adjectives to describe. If you entered it in a Loebner Prize competition, it would win, hands-down:

https://en.wikipedia...i/Loebner_Prize

And it won't be limited to just giving simple responses like, "How are you doing?" The examples show it can generate good philosophical text responses; can take into account context; can do question-answering; and can even answer some amount of commonsense-type questions. I'm guessing it has some other skills, too, e.g. maybe it can write short poems or tell jokes -- those are the kinds of skills that GPT-2 has demonstrated. It might even be able to generate short arguments for positions; again, some of GPT-2's output suggests it has learned how to do this at least some of the time.

Would it pass a Turing Test? It might if you gave it to unsupecting humans with low expectations. I doubt it would pass an official, 30 minute test with a skeptical judge.

So why is the public not getting to try it? It seems that the safety issues aren't the researchers' main concern. Their main concern, as they say in the paper, is the "toxicity". They are trying to come up with ways to stop it from producing racist, sexist, lewd, rude, and other kinds of toxic output. (e.g. What if a kid tells the bot it wants to clean out his insides, because he's constipated, and the bot tells the kid to use bleach?)

If they can get this problem solved, and if they could add a few more things (long-term memory, consistent personality), then it would make a great companion for seniors in retirement villages all over the world. They could talk to it for hours, and it would patiently listen and make comments that make it see like it really understands them and cares for them.

 

This is what exists in 2019.  Just imagine how much better it will get on the march to 2029... or even 2025.



#145
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts
Facebook has also made an advance on dialog systems / chatbots:
 
https://parl.ai/proj...e_unlikelihood/
 

Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address. They tend to produce generations that (i) rely too much on copying from the context, (ii) contain repetitions within utterances, (iii) overuse frequent words, and (iv) at a deeper level, contain logical flaws. In this work we show how all of these problems can be addressed by extending the recently introduced unlikelihood loss (Welleck et al., 2019) to these cases.


This will make the conversations even more accurate and coherent.

Now, if only they can take care of the sexist, racist, lewd, unethical, mean, etc. outputs, the public would get to have some incredible conversations with their technology. That's coming... but it may be a while before it's mitigated sufficiently to where large companies are willing to put their brand behind it.

#146
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts

This is a really fascinating post about testing GPT-2's ability to solve "zero-shot Natural Language Understanding (NLU)" tasks:
 
https://rakeshchada....Shot-GPT-2.html
 
The setup here is that the author took GPT-2, fed it some prompts  to let it know the task to perform, and then without any additional training it was able to perform them on the spot!

The name extraction one was incredible, as I doubt that was a skill it learned from the data (there weren't examples of this in the training data for GPT-2 to learn it) -- it had to "improvise" or do something like "reasoning" to solve it (or fake it).

Take, for example, the fact that "Southern Railways" was capitalized, and is a proper noun, but not a person's name. GPT-2 had to figure out not to put that on the list of names; and it's amazing it was able to do that out-of-the-box!

I'm guessing that it has learned lots of little skills, and then when presented with a situation like that, it interpolates "between skills" (or takes some convex-combination of skills), which looks like "improvising" -- like how Google's translation system can learn to translate between language pairs it hasn't seen before. In other words, it not only has representations of terms and basic concepts, but even the more abstract "operators" that act on concepts, that it can flexibly manipulate.

The one about the colors and breeds was equally impressive. Here, it has to know the difference between the two, as well as pick up what the task at hand is to perform. Just incredible!

It's a little surprising that it had trouble learning how to convert numbers into their word-equivalent, as it's a pretty straightforward mapping, with not too many exceptions to the rule; and there probably are several examples in the training data to learn from. It did get at least a few right, though. I'd guess a larger model, with more training data from free text, and it would solve it pretty handily -- e.g. I wouldn't be surprised if something like Megatron could solve that one without additional training.

Problems about multiplication are a little tricky, as multiplication is a hard rule to learn (it's a "deep circuit"). Probably, the way it would learn multiplication is as a mixture of memorization and combination rules for number patterns -- just like how people do! (We learn our times-tables, and then mix together what we memorized with a simple algorithm.)

It seemed to knock the mapping between singular and plural out of the park; and note there is no way it learned all possible mappings like that from the text -- it had to generalize.

Overall, I was very impressed! It's a little scary, when you think about it. Much larger language models might learn enough little skills, and how to combine them together, to do just about any little task like that you throw at it -- even ones that appear to require some improvisation. Think about what that means for future iterations of DialoGPT, Microsoft's super-chatbot built on top of GPT-2. You will be hard-pressed to find any simple word puzzle to reveal that it isn't intelligent. You'll have to try harder, and ask it some tricky problems that require many steps of reasoning.

An example problem one of these large language models might already be able to solve without any additional task-specific training, or soon will: imagine you feed a spreadsheet (text file with columns of numbers) depicting statistics about countries. each row represents a country, and then different columns represent stats. And then suppose you ask it a question like, "Which country had the lowest birth rate?", and it correctly answers -- and you don't even have to say that the countries in each row are actually countries; it has learned that all on its own.

It wouldn't surprise me if a sufficiently deeply trained model could do that -- given what else these models appear able to do already.



#147
Kynareth

Kynareth

    Member

  • Members
  • PipPipPipPip
  • 185 posts

I wrote my 2029 predictions months ago but I am very unsure about them. I was too optimistic for 2019 and I don't want to be wrong again.



#148
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts
Some more GPT-2 amusements. First:

https://mobile.twitt...589059928866820

Trying to get GPT-2 to complete 'A is for apple, B is for Banana' etc.


Results (prompt in bold):

A is for Apple
B is for Banana
C is for Carrot
D is for
Cookie
E is for Egg
F is for...
G is for Ginger
H is for Hedgehog
I am a closed mouth bitch
J is for Jar
K is for Key
L is for Liquid
M is for Maggie
N is for Needle


Well, it mostly got the pattern right -- and bear in mind that it wasn't even explicitly trained to know the sequence of letters in the alphabet. But there was a little *whoopsie* when it got down to "I".

And the second amusement is "AI dungeon 2":

http://www.aidungeon.io/

What's the idea?:

Imagine an infinitely generated world that you could explore endlessly, continually finding entirely new content and adventures. What if you could also choose any action you can think of instead of being limited by the imagination of the developers who created the game?


An example:

Leading Turkeykind to Freedom and World Domination


http://www.aidungeon...-and-world.html

#149
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts

Google has a new experimental chatbot called "Meena" (not for public use at the moment) to converse in a general open-domain setting, that is even better than Microsoft's DialoGPT.  Meena used more training data and parameters, for one thing; and, while DialoGPT is pretty good for single-turn responses, some large percent of the time Meena can handle multi-round conversations, generating very human-like responses.  Here is a link to the paper:

 

https://arxiv.org/abs/2001.09977

 

An example successful conversation:

 


Human: Hi!

 

Meena: Hi! How are you this fine evening?

 

Human: I’m doing well. it’s morning my time!

 

Meena:  Good morning! What are your plans for today?

 

Human:  Get my work done then watch a movie at home in the night

 

Meena: Sounds fun! What movie?

 

Human: I haven’t decided yet, perhaps an Oscar-nominated one. any suggestion?

 

Meena:  I’m not too familiar with recent Oscar winners, but I do know I wanted to see The Grand Budapest Hotel.

 

Human: ah, tell me about that movie!

 

Meena:  It’s from the same director and writer as Moonrise Kingdom. Wes Anderson is pretty well known for making funny, colorful films. I’ve heard it’s pretty good.

 

Human: ha! I’ve searched on the internet and both movies are very colorful!

 

Meena: I’m looking forward to it. I love colorful movies.

 

That's really good!

 

They claim in the paper that ordinary language model "perplexity" (there's a technical definition -- but it's completely classical; no new stuff required) correlates very strongly with "sensibleness" (which measures if the response even makes sense) and "specificity" (being specific), two attributes you want a good chatbot to have.  What this suggests is that it may well be that hammering away even more at improving perplexity is all you need to build a chatbot so good it passes some kind of Turing Test for long conversations, in a general, open-domain setting.

 

They say in the paper:

 

The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity.

 

SSA here means "sensibleness and specificity average".



#150
funkervogt

funkervogt

    Member

  • Members
  • PipPipPipPipPipPip
  • 966 posts

A roundup of the predictions I made on this thread about what 2029 will be like: 

 

  • "Foldable" smartphones will be commercially available. Folded up, they will be the same size as today's smartphones, but you'll be able to open them like a manila folder, exposing a larger inner screen. They will obsolete mini tablets. [Already came true]
  • Augmented reality glasses that have fixed Google Glass' shortcomings will be on the market. The device category will come back. They will find success as niche products. 
  • At the end of 2029, we will look back on the 2020s as the decade in which virtual reality went mainstream. Judging by where the technology is now, I think we're less than 10 years from price, visual quality/immersiveness, and content (e.g. - number of available games, movies and other content) getting past the thresholds to mainstream acceptance. In 10 years, VR gaming won't be the domain of hardcore gamers who are willing to spend $1,000+ on a system. 
  • By the end of 2029, I predict a machine will pass the Turing Test, but will fail some of the subsequent Tests where it is asked harder questions designed to probe its intelligence. Nevertheless, the milestone will be big news, though it will only highlight the limited value of the Turing Test, which is something AI experts have been talking about for years. 
  • It will be cheaper to buy an electric version of a particular model of car than it will be to buy the gas-powered version. 
  • By the end of 2029, I think there will be vastly better autonomous cars than we have now, but they will still be the playthings of technophiles and upper-income people. Maybe 5% of the new private passenger vehicles sold to Americans in 2029 will be capable of fully autonomous driving. However, by the end of 2029, much more progress will have been made turning over the commercial truck fleet to autonomous vehicles. Machines will be able to drive big rig trucks across long but simple highway routes with the same efficiency and safety as human drivers. This category of vehicle will turn over faster because of the profit motivation and because it will be easier from a technological standpoint. About 25% of the new big rig trucks sold in America in 2025 will be capable of fully autonomous driving. It will be a bleak time for human truck drivers, but very profitable for rich guys who own big truck fleets. 
  • China's GDP will be higher than America's. 


#151
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts

A guy combined GPT-2 with Google's BERT, to produce a chatbot that writes posts to Reddit that are much higher quality than GPT-2 models alone.  In fact, they are so good, they fool a lot of humans:
 
https://www.bonkerfi...gpt-2-and-bert/
 
There're probably still a lot of flaws in the output; and the same for other recent chatbots (like Google's Meena).  These will get ironed-out in time.  
 
People seem skeptical that these chatbot models do any kind of world-modelling and interlocutor-modelling.  I think the skeptics are wrong about that; and the reason I say that is that there are papers showing that many of these models learn develop things like "sentiment neurons" that keep track of the tone of the text:
 
https://openai.com/b...ntiment-neuron/
 
They aren't trained to do this -- the neural nets acquire the ability in a completely unsupervised way, just from trying to do a better job at predicting text.  Likewise, it's probably the case there are neurons sensitive to things like the gender of the interlocutor, where they are located (roughly), their personality ("quirky", "logical" etc.), context, and so on.  It's hard to really say what the limitations are, unless you carefully test it.  Cherry-picked examples where the model fails prove nothing.  
 
There is also strong evidence that Transformer language models learn features that match up pretty well with the human brain:
 
https://www.abstract...sentation/66073
 
That's a pretty good reason to believe that these models are learning to do at least some user and world-modelling.
 
Another method for improving these chatbots is to add some brain data -- as I described here:
 
https://www.reddit.c...ual_assistants/
 
I had a conversation with a little bird a while back, and we briefly discussed the work of Leila Wehbe, who is working on using brain data to improve language models:
 
https://arxiv.org/abs/1911.03268
 
I believe they were in contact with some of these scientists, and asked them something like, "What is your fantasy dataset?"  And I think one said, "1 million words".  The recording estimate for that was 142 hours -- my estimate in that Reddit piece is that if you are careful, you can bring that down to 111 hours.  At any rate, it's sub-150 hours.  
 
Current mass-produced BCIs -- like EEG -- don't give you enough data to do much.  FMRI might, however -- and, as I wrote recently, Neuromod is putting out data at a rate of 15 hours per week, only some of it on language tasks.  They should have the data we would need to improve chatbots within a few months:
 
https://www.futureti...rning/?p=274131
 
But, of course, BCI companies working behind the scenes may get there sooner (while also producing better-quality and more useful data than you can get with either FMRI or MEG).  I'm not sure all that they are working on, but strongly suspect they already have generated the data, but are not yet sharing it, except maybe with select scientists like Wehbe.
 
....
 
In other news, Elemental Cognition's Nasrin Mostafazadeh apparently has some showing-and-telling to do today at AAAI 2020:
 

 

https://mobile.twitt...836873442041856
 

Tomorrow at #AAAI2020 #RCQA20 I'll be talking about some exciting work in reasoning that we've been up to during the past year:Building AI systems that learn to make explicit mental models of narratives,moving another tiny step towards giving #AI the commonsense of a 5-year-old!!


Elemental Cognition was founded by David Ferrucci, they lead scientist behind IBM's Watson that won at Jeopardy!. He left IBM, then worked for a while at Bridgewater, before leaving to found his own AI company, Elemental Cognition. Elemental Cognition's primary goal is to train AI systems to read and understand text at a fairly high level, using a mixture of Deep Learning and also traditional "knowledge representation" and logical reasoning methods. Given Ferrucci's track-record of success at large projects, I wouldn't be surprised if he succeeds.



#152
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts

One more thing I'd like to point out from that first piece on combining GPT-2 with BERT:
 

It's a very intuitive and clever concept, and one that I personally feel mirrors the internal dialog that I constantly have in my own brain's decision making system. The critic in my head feels almost like a discriminator algorithm throwing shade on my internal generator algorithm. Anyway, if you're interested in how they work in detail, you can read more here.


This is a combination of a "generator" and a "critic", like I described in this post (and also say that it's standard in chatbot design):

https://www.reddit.c...ual_assistants/

The difference is: I propose building the critic using brain data, not merely another model trained only on text. It may even be possible to use brain data to improve the generator -- though, I think it would be easier to start with just using it to build the critic (or even tweak an existing critic), and leave the generator part to GPT-2 or some other large language model.

Using BERT as the critic probably does reasonably well at filtering out bad responses; but if you had a system that operates a lot more closely to the human brain, it should do even better. For example, BERT contains some commonsense world knowledge, but it's still a bit limited -- you can build tests to ferret-out its weaknesses, and will find that it still makes mistakes that humans don't. However, if you trained a critic with a sufficient amount of human brain data, it should acquire more brain-like features (as Wehbe has already started to demonstrate using limited amounts of data), which I conjecture will translate into much greater commonsense reasoning ability.

And the beautiful thing is that the critic doesn't even have to be perfect. If it can eliminate 80% of the bad or sub-par responses, leaving reasonable ones, that's probably going to produce a state-of-the-art chatbot (better than Meena). That 80% figure comes from the best-in-5 responses in an "interview" The Economist magazine did with GPT-2, where the responses were all pretty good. Crank that 80% up to 90%, and you're probably getting really good responses for long conversations -- with very few dumb errors like contradicting oneself, or talking about the "deserts of Toronto" or "lighting a fire on the bottom of the ocean".

Obviously none of this is proven yet -- it's just a conjecture on my part. But there is some evidence that it should work, as indicated in several of my posts.



#153
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts

This is a really interesting paper from the Allen Institute for AI:

Transformers as Soft Reasoners over Language
 
https://arxiv.org/abs/2002.05867
 

AI has long pursued the goal of having systems reason over *explicitly provided* knowledge, but building suitable representations has proved challenging. Here we explore whether transformers can similarly learn to reason (or emulate reasoning), but using rules expressed in language, thus bypassing a formal representation. We provide the first demonstration that this is possible, and characterize the extent of this capability. To do this, we use a collection of synthetic datasets that test increasing levels of reasoning complexity (number of rules, presence of negation, and depth of chaining). We find transformers appear to learn rule-based reasoning with high (99%) accuracy on these datasets, and in a way that generalizes to test data requiring substantially deeper chaining than in the training data (95%+ scores). We also demonstrate that the models transfer well to two hand-authored rulebases, and to rulebases paraphrased into more natural language. These findings are significant as it suggests a new role for transformers, namely as a limited "soft theorem prover" operating over explicit theories in language. This in turn suggests new possibilities for explainability, correctability, and counterfactual reasoning in question-answering. All datasets and a live demo are available at

https://rule-reasoni...ps.allenai.org/


How is this different from Facebook's bAbI dataset and task?:
 

Although our core datasets may seem similar to the bAbIdataset[Westonet al., 2015] in using synthetic data, ourprobes are qualitatively different. Specifically, apart frombAbI Task 15 (above), the underlying rules needed to inferan answer in the bAbI tasks areimplicit. For example, an-swering Task 1 questions such as “Mary went to the hallway.John went to the office. Where is Mary? A: hallway” re-quires inducing state-change rules such as “X moves to Y→X at Y”. In other words, the bAbI tasks test whether a systemcan learn and apply these underlying rules from examples,while our concern here is reasoning withexplicitrule sets,potentially different for each example (Figure 1).


In other words, Allen AI's dataset is probing the system's ability to do a limited kind of "theorem proving", while the bAbI tasks is more like learning a single decision tree (actually, more complicated) to solve different versions of the same kind of problem. It's in many ways a more difficult kind of problem.

Furthermore, as they say in the paper (and abstract), the system can learn to do even deeper reasoning than it was trained on -- the dataset just needs a few sufficiently deep examples to begin with, and then it will learn to extend that even deeper.

It's amazing that Transformers are able to do well on this task -- and it really shows they have the potential to learn to do very complicated logical reasoning. And this isn't a statistical trick, as the dataset is synthetic, meaning they can test all variations of problems.

 

I should point out that what this also suggests is that large language models like the recent 17 billion parameter one released by Microsoft, Nvidia's Megatron, and OpenAI's GPT-2 are likely capable of much deeper reasoning than we've seen to date.  What's needed is the right dataset -- perhaps even just a sufficiently large text corpus will do.



#154
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts

Recall the article the other day in Tech Review about OpenAI's "messy, secretive" new reality:
 
https://www.technolo...retive-reality/
 
One detail appearing in the article was their work on a massive AI project they have been keeping secret (and have been working on for several months, probably; someone on Reddit said they thought it was started before even the work on GPT-2).  It involves using super-massive amounts of compute and data, trained on text, images, and other media (which, by the way, appear on Reddit; so maybe Reddit+(external links) will again be the source of the training data) -- perhaps it will be called GPT-3?  Anyways, buried in the article is also a reference to another secret that OpenAI let out of the bag only recently, and that is part of what is driving that other secretive project they haven't released any details about.  This "another secret" work was revealed in the form of a paper:
 
https://arxiv.org/abs/2001.08361
 
The secret is not a piece of software, or a brilliant new algorithm.  Rather, it's a piece of empirical-theoretical work showing how the power of language models scales with the number of parameters and type of architecture. 
 
Why is that so important? 
 
It's important, because if you know exactly how much smarter the model becomes when you increase the number of parameters and compute by a certain amount, then you have greatly reduced your perceived risk of failure.  I keep harping on this over and over again -- that reducing perceived risk of failure is very important; it makes all the difference in the world, in terms of keeping people motivated on a project and securing funding.
 
Suppose I told you exactly what you need to do in order to cure cancer, and provide you a cost estimate of an upper limit you are going to have to spend, and let's say it totals $50 billion -- maybe I tell you, "You're going to have to test the efficacy of between 1 million and 3 million compounds in a certain class; and you're going to have to run between 10^10 and 5.0 x 10^10 simulations using this exact method, with this exact distribution, when you feed in the results from the test of the compounds; and based on cost of coding this up, and running it in the cloud, the price I come up with is between $30 billion and $70 billion, with a 95% probability". Suppose that the method I provide to you is airtight:  it is almost at the level of a mathematical proof that can be checked by computer, but has a few biological assumptions that nobody denies, and that is considered "settled science".  If we had that -- theoretical guarantees on the upper limit of the amount of work needed to cure it -- then it would be cured by next year; wealthy individuals, companies, and governments would pool their resources together to hit that $50 billion target, and would get to work on it pronto.
 
OpenAI has something like that in the above paper.  They don't have predictions about when we will reach AGI; but they do have predictions about how great the "test loss" (size of error) is as a function of model size, dataset size, and other parameters.  This test loss is strongly linked with something called "perplexity"; and perplexity, in turn, is strongly linked to chatbot human-likeness:
 
https://arxiv.org/abs/2001.09977
 

We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity.


OpenAI knows what it needs to do to reach that target. Each task you want to test the model on will probably have a similar correlation between test loss and performance; so, they know how to solve those, too. (There's probably more empirical-theoretical work they didn't release in that paper; e.g. how much does adding x gigabytes of image data to the mix improve performance?)

....

On a related matter, here is a paper released just a few days ago by some people at Google:

https://arxiv.org/abs/2002.08910

They show that a large language model fine-tuned to answer questions does as well or better than certain question-answering systems with access to external resources (e.g. Wikipedia). In other words, no knowledge graphs or internet access is needed. That's really incredible, when you think about it!

One wonders what kind of scaling laws there are for performance on reasoning tasks... Has OpenAI researched how reasoning depth increases as a function of dataset size and model size and shape? Given that teaching machines to reason is their stated next step, probably they have.



#155
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts

Another thing worth pointing out, regarding what I wrote about OpenAI's research on scaling-up language models, to get improvements in text-generation:  if you look back at the history of NLP (=Natural Language Processing), this is not at all the kind of situation you see.  There are lots of cases where the improvements as a function of model size -- using older methods, other than Deep Learning -- barely grow at all, beyond a certain point.  e.g. Phrase-based Statistical Machine Translation (which was really hot, prior to about 5 years ago, before Deep Learning supplanted it) saw improvements like this:  BLEU score one year was around 15.0; then the next year it was 15 again; then the next it was 15.1; and so on.  With Deep Learning, the improvements were like:  20 in year 1; 25 in year 2; 30 in year 3; 35 in year 4; and so on.  

 

And if you could predict how the performance would improve as a function of the dataset size, model parameters, and compute used, and you could also predict your expected revenues from the improved model, then you could afford to throw large amounts of money behind the effort, knowing that you will get it back in revenue.

 

One problem, though, is that we live in "post-modern times", where people are a little bit anti-rationalist, and disbelieve in what the performance curves tell them, and invent reasons to discount them -- they think, "There's no way we can predict this thing that accurately.  The universe isn't so well ordered for it to be possible to know -- too many variables; too many ways to be wrong."  
 



#156
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts
Interesting Tweet thread and empirical work by Berkeley AI Research:

https://mobile.twitt...616760595791872

Not everyone can afford to train huge neural models. So, we typically *reduce* model size to train/test faster.

However, you should actually *increase* model size to speed up training and inference for transformers.
Why? [1/6]


Using more compute often leads to higher accuracy. However, since large-scale training is expensive, the goal is typically to maximize accuracy under your budget.

For most people, the go-to strategy is to train small models because they run fast and use little memory. [2/6]


However, we show that using small models is actually the of what you should do.

Instead, when training Transformer models, you should drastically increase model size . So, big model, few epochs. [3/6]


This occurs because, for BERT pretraining and machine translation, larger models converge in far fewer gradient steps. This outweighs their additional computational overhead.

Thus, bigger models achieve better results with respect to wall-clock time. [4/6]


Unfortunately, larger models increase inference costs. However, this trade-off can be reconciled with model compression---we show that larger models are ** compressible.

e.g., for a given test budget, the best models are trained big and pruned/quantized.


See all of this in:

"Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers"

By @zhuohan123, @Eric_Wallace, @nlpkevinl, @shengs1123, Kurt Keutzer, Dan Klein, mejoeyg



Not only that, but larger models tend to learn more general representations, which e.g. leads to smarter question-answering systems capable of deeper inference.

So, what researchers are finding is that the larger the models they work with, the easier they become to train and the smarter the AI that is produced.

#157
Koolah

Koolah

    New Member

  • Members
  • Pip
  • 1 posts
In my humble opinion:

- Robots won't be so widespread and autonomous, a "typical" person won't pay 10,000 $ for an high-tech janitor/post-it/coffee machine. I'd personally rather put this 10,000 $ in a better car, the education of my infant or some works in my house, and that's to say if such a robot assistant even exist at all in 2029.

- Drones won't be so widespread either. There may be more of them but nothing like many drones in the sky for police duty, deliveries, etc... It'll still be marginal.

- Moon may have been visited again, as for Mars, I'm doubtful. Though I'd love to see a manned mission land on Mars before 2030, I want to be prudent.

#158
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts
This is a really interesting paper:
 
https://arxiv.org/abs/2003.03384

Our goal is to show that AutoML can go further: it is possible today to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space. Despite the vastness of this space, evolutionary search can still discover two-layer neural networks trained by backpropagation. These simple neural networks can then be surpassed by evolving directly on tasks of interest, e.g. CIFAR-10 variants, where modern techniques emerge in the top algorithms, such as bilinear interactions, normalized gradients, and weight averaging. Moreover, evolution adapts algorithms to different task types: e.g., dropout-like techniques appear when little data is available. We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction for the field.


That's amazing! Using evolutionary algorithms (randomly try stuff and tweak like how evolution works), the computer rediscovered many core algorithms used in Deep Learning, including Backpropagation. This shows that not only are DL researchers and engineers on the right track, but it also suggests that biological systems may use DL to train brains.

There is actually a whole group of researchers at the intersection of machine learning and neuroscience trying to verify that the brain uses DL. They have about 10 different approaches for this, some more convincing than others. But you might not know about this work if you read all the skeptical pop science articles that say "deep learning is just patterns recognition; we need to move to more sophisticated methods to do reasoning" or "deep learning doesn't have common sense" or "deep learning isn't robust" etc.

#159
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts
This is some enlightening work by researchers at OpenAI:
 
https://distill.pub/...rcuits/zoom-in/

Basically, they figure out what the individual neurons in an image classification neural net are doing, and also what the neurons do in combination. What they discover is that Deep Learning endows these neural nets with the ability to identify individual parts of objects, and then to figure out how the parts fit together to build larger objects. These networks are also sensitive to precisely how the parts fit together -- e.g. a "face" isn't just a collection of 2 eyes, a nose and a mouth thrown together in a region on the screen; the eyes should probably be aligned, with the nose between and below them, and with the mouth below the nose -- that kind of thing.

This runs counter to criticisms that have been lobbed at Deep Learning in the past, where it has been claimed "they are just texture-detectors", or "they are just doing simple statistics", etc. There are actually certain kinds of "adversarial examples" that seem to suggest the networks haven't learned anything anywhere near as sophisticated as the OpenAI work is claiming -- but things are not as they seem! Just because you can fool a network, doesn't mean it isn't doing something fairly involved!

Another line of reasoning to perhaps doubt the networks are doing fancy stuff is the fact that they aren't explicitly trained to do "parts-recognition", and then to accurately assemble the parts, respecting the geometry and orientation of those parts. You might think this would have to be inserted as a "prior" into the network ("How in the world could a network, trained discriminatively, possibly learn that?!") -- but it turns out that, no, all this stuff appears in an emergent fashion in the network when Deep Learning is applied.



The most interesting aspect of this work is what they have discovered about the so-called "polysemantic neurons":

Why would it do such a thing? We believe superposition allows the model to use fewer neurons, conserving them for more important tasks. As long as cars and dogs don’t co-occur, the model can accurately retrieve the dog feature in a later layer, allowing it to store the feature without dedicating a neuron.


One thing this probably means is that networks with too many of these polysemantic units won't generalize as well as networks with fewer of them: if you have complex feature detectors, built up from many levels of progressively smaller and smaller features, and if at each level those feature detectors are polysemantic, then errors can creep in at multiple levels, whenever there is the right kind of conflict. This might box-in the network, so that it can only classify accurately when the distribution of images being presented to it is confined to a very narrow set (relatively speaking), in which the feature conflicts are unlikely to occur.

It would seem that training networks that have a lot more parameters than needed might reduce the prevalence of these polysemantic neurons, and thereby increase the network's ability to generalize. It is known that networks with more parameters than needed tend to generalize better, when trained in just the right way (e.g. small number of epochs).

Maybe we'll find out more about this. They plan to write more posts on "circuits" in the future, and might also address generalization performance.

#160
starspawn0

starspawn0

    Member

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,860 posts
This is hopeful-looking paper:

https://arxiv.org/abs/2004.06100

They show that certain large, pre-trained, Transformer-based neural nets do a decent job in generalizing "out of distribution" (OOD), but earlier neural nets do not. Basically, what they did is this:

* They took several different datasets / tasks, and split them into different types. For example, movie reviews might be split into "expert reviews" and "amateur reviews". Obviously, the distribution of features you find in one class is going to be different from the distribution in the other class. Neural nets have, in the past, had a hard time generalizing high performance on one type of task to the other; they were brittle, and they choked on even slightly different data.

* Next, they took large, pre-trained Transformer-based neural nets, like BERT or RoBERTa, and then fine-tuned them to solve tasks in one of the two classes.

* They then checked to see how will it did on the other class. Amazingly, performance barely dropped at all. But other neural nets, like LSTMs, saw a huge drop in performance.

What this suggests is that Transformer neural nets are showing one of the hallmarks of human intelligence, the ability to fluidly apply what is learned in one specific context to do well in another.

Furthermore, they show that:
 

Pretrained transformers are also more effective at detecting anomalous or OOD examples, while many previous models are frequently worse than chance. We examine which factors affect robustness, finding that larger models are not necessarily more robust, distillation can be harmful, and more diverse pretraining data can enhance robustness. Finally, we show where future work can improve OOD robustness.


Perhaps this means Transformers even learn meta-reasoning strategies from all the data they are trained on. Perhaps they learn "we're not in Kansas anymore", and "know" to adapt their language processing to cope with it. It wouldn't surprise me that a meta-level skill like that was learned from the gargantuan amounts of data they are trained on. It's a good sign that chatbots of the near future are going to be a lot more intelligent, able to handle totally new contexts on the fly.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users