The Singularity - Official Thread

Yuli Ban · Post by **Yuli Ban** » Sun Nov 14, 2021 6:09 am

"Solving Math Word Problems", Cobbe et al 2021
(boosting GPT-3 on math word problems from ~15% to ~60% by self-distilling a critic & best-of=100 sampling)

We’ve trained a system that solves grade school math problems with nearly twice the accuracy of a fine-tuned GPT-3 model. It solves about 90% as many problems as real kids: a small sample of 9-12 year olds scored 60% on a test from our dataset, while our system scored 55% on those same problems. This is important because today’s AI is still quite weak at commonsense multistep reasoning, which is easy even for grade school kids. We achieved these results by training our model to recognize its mistakes, so that it can try repeatedly until it finds a solution that works.

YouTube · Post by **wjfox** » Sun Nov 14, 2021 11:47 am

Yuli Ban · Post by **Yuli Ban** » Sun Nov 14, 2021 4:13 pm

Speculation on the future of language models with long-term memory
(Warning: link may not work if you're not part of this subreddit forum)

Given that people are now starting to give language models the ability to "ponder", as in this recent work (using a scratchpad / inner-voice)

and are seeing success, perhaps the next major "obstruction" on the path towards AGI systems is the need for long-term memory. Currently, language models are limited to a context window of a few thousand tokens, which is too short to hold "memories" of things from any appreciably long time in the past. There have been proposals for building language models with much longer context windows; but unless this is hundreds of thousands to millions of tokens, it probably won't be enough for AGI.

One solution, perhaps, is to build in a separate "memory module"; however, it would be best if one didn't have to fiddle much with existing language model architectures, so that all that training used to build those GPT-3-scale models can be reused. Furthermore, at least when it comes to modeling working memory, language model context window lengths seem adequate; so it's probably not a good idea to replace them with some more general type of "memory".

I could see machine learning engineers keeping language models mostly as they are, and simply making some very minimal changes, in order to greatly expand their memory -- without needing to extend context window much or any at all. One path they might try is something like this: split the context window up into an initial segment of, say, 200 vectors, and then let the rest be for the text stream. Those vectors might represent a section of memory currently under consideration. Initially, the vectors might represent a lossy-compressed version of all the tokens that have ever passed through the model, in chronological order (e.g. the first token represents a compressed version of the first 1,000 tokens the system ever saw; the second one represents a compressed version of tokens 501 through 1,500; the third represents a compressed version of tokens 1,001 through 2,000; and so on). Perhaps better than compression at the token level would be to use some kind of average over embeddings of those tokens or something -- something that would be easier for the model to learn to use, requiring less additional training (it should be easier for it to pick out that a memory block is relevant using features rather than tokens). When the system sees the vector corresponding to each of those first 200 slots, it gets some vague idea of what happened at a given window of time. When it needs greater precision about the memory, it might write

<scratch> Zoom in on the vector 11.</scratch>

That would then cause the "memory manager" to replace the entire set of 200 memory window vectors with a compressed version of tokens 5,001 through 6,000. At this point, the model might have pinpointed a relevant memory to help it solve some problem it was asked about.

Fine-tuning might be used periodically to update its skills (arithmetic, theorem-proving, physics reasoning, etc.), and also to train it to use the scratchpad / inner-voice to zero-in on past memories, to think through problems in greater depth, and also to plan ahead (and explore possibilities exhaustively via backtracking) -- fine-tuning would act kind of like a procedural memory update at various levels.

Thus, perhaps, one doesn't need to wait for breakthroughs in extending the length of the context window, or for fancy new Transformer models (or even post-Transformer models). Like with adding a scratchpad, maybe just some minor tweaks is all that is needed. Just imagine what these language models would be capable of if all the stars line up and what I have described happens...

Yuli Ban · Post by **Yuli Ban** » Thu Nov 18, 2021 12:44 am

Yuli Ban · Post by **Yuli Ban** » Thu Nov 18, 2021 12:49 am

Ozzie guy · Post by **Ozzie guy** » Thu Nov 18, 2021 1:13 am

Yuli Ban wrote: ↑Thu Nov 18, 2021 12:49 am

I swear I can recall Ray Kurzweil saying something like "Human level understanding of language will be enough to have AGI, learning language requires general intelligence and lets you interpret the language to learn other things".

Yuli Ban · Post by **Yuli Ban** » Fri Nov 19, 2021 4:59 am

Ozzie guy · Post by **Ozzie guy** » Fri Dec 10, 2021 8:31 am

I think AI is now improving faster than typical adults can improve themselves.

This was one of my personal milestones as it means AI is now catching up to our capabilities no matter how much we try to change ourselves.

I am seeing articles about AGI related AI improvements from Deepmind, OpenAI, Microsoft etc on average at least once a month.

Can you look at yourself in the mirror and say you are learning one big skill or making massive improvements in an area every month?
If not the field of AI is and AI is catching up to you.

funkervogt · Post by **funkervogt** » Fri Dec 10, 2021 1:35 pm

Set and Meet Goals wrote: ↑Fri Dec 10, 2021 8:31 am I think AI is now improving faster than typical adults can improve themselves.

This was one of my personal milestones as it means AI is now catching up to our capabilities no matter how much we try to change ourselves.

I am seeing articles about AGI related AI improvements from Deepmind, OpenAI, Microsoft etc on average at least once a month.

Can you look at yourself in the mirror and say you are learning one big skill or making massive improvements in an area every month?
If not the field of AI is and AI is catching up to you.

I've long looked at things the same way.

The upper bounds on human intelligence aren't increasing. In other words, the very smartest members of the human race, who are mostly found at top universities, don't seem to be smarter than the people who filled those positions 20, 30 or even 50 years ago.

However, the very smartest supercomputers improve literally every day.

At some point, almost certainly in this century, the two lines on the graph will intersect.

Ozzie guy · Post by **Ozzie guy** » Sun Jan 02, 2022 3:34 am

An Open AI employee says there is a 5% chance of AGI in 2022.

Starspawn0 thinks the prediction is reasonable.

Another commenter says the Open AI employee only uses increments of 5s in his % predictions so saying 5% chance is as low as he can say without it being 0%.

Future Timeline

The Singularity - Official Thread

Re: The Singularity - Official Thread

Re: The Singularity - Official Thread

Re: The Singularity - Official Thread

Re: The Singularity - Official Thread

Re: The Singularity - Official Thread

Re: The Singularity - Official Thread

Re: The Singularity - Official Thread

Re: The Singularity - Official Thread

Re: The Singularity - Official Thread

Re: The Singularity - Official Thread