We don't have one just yet, but we are damn close. So close that I feel the need to create this thread to document our rapid approach.
This is an extension of the AI & Robotics thread, but specifically focusing on the increasingly generalized neural networks currently being worked on.
These early posts will document networks that aren't AGI (in fact, closer to AXI) but are definitely on the way towards them.
OpenAI will almost certainly be the first, but someone else may leapfrog them.
By Scott Alexander
I would be failing my brand if I didn’t write something about GPT-3, but I’m not an expert and discussion is still in its early stages. Consider this a summary of some of the interesting questions I’ve heard posed elsewhere, especially comments by gwern and nostalgebraist. Both of them are smart people who I broadly trust on AI issues, and both have done great work with GPT-2. Gwern has gotten it to write poetry, compose music, and even sort of play some chess; nostalgebraist has created nostalgebraist-autoresponder (a Tumblr written by GPT-2 trained on nostalgebraist’s own Tumblr output). Both of them disagree pretty strongly on the implications of GPT-3. I don’t know enough to resolve that disagreement, so this will be a kind of incoherent post, and hopefully stimulate some more productive comments. So:
OpenAI has released a new paper, Language Models Are Few-Shot Learners, introducing GPT-3, the successor to the wildly-successful language-processing AI GPT-2.
GPT-3 doesn’t have any revolutionary new advances over its predecessor. It’s just much bigger. GPT-2 had 1.5 billion parameters. GPT-3 has 175 billion. The researchers involved are very open about how it’s the same thing but bigger. Their research goal was to test how GPT-like neural networks scale.
Before we get into the weeds, let’s get a quick gestalt impression of how GPT-3 does compared to GPT-2....
This isn't a very technical piece, so it's easy to understand for the laymen. However, I'll skip to the really good part:
What would much more powerful GPT-like things look like? They can already write some forms of text at near-human level (in the paper above, the researchers asked humans to identify whether a given news article had been written by a human reporter or GPT-3; the humans got it right 52% of the time)
So one very conservative assumption would be that a smarter GPT would do better at various arcane language benchmarks, but otherwise not be much more interesting – once it can write text at a human level, that’s it.
Could it do more radical things like write proofs or generate scientific advances? After all, if you feed it thousands of proofs, and then prompt it with a theorem to be proven, that’s a text prediction task. If you feed it physics textbooks, and prompt it with “and the Theory of Everything is…”, that’s also a text prediction task. I realize these are wild conjectures, but the last time I made a wild conjecture, it was “maybe you can learn addition, because that’s a text prediction task” and that one came true within two years. But my guess is still that this won’t happen in a meaningful way anytime soon. GPT-3 is much better at writing coherent-sounding text than it is at any kind of logical reasoning; remember it still can’t add 5-digit numbers very well, get its Methodist history right, or consistently figure out that a plus sign means “add things”. Yes, it can do simple addition, but it has to use supercomputer-level resources to do so – it’s so inefficient that it’s hard to imagine even very large scaling getting it anywhere useful. At most, maybe a high-level GPT could write a plausible-sounding Theory Of Everything that uses physics terms in a vaguely coherent way, but that falls apart when a real physicist examines it.
Probably we can be pretty sure it won’t take over the world? I have a hard time figuring out how to turn world conquest into a text prediction task. It could probably imitate a human writing a plausible-sounding plan to take over the world, but it couldn’t implement such a plan (and would have no desire to do so).
For me the scary part isn’t the much larger GPT we’ll probably have in a few years. It’s the discovery that even very complicated AIs get smarter as they get bigger. If someone ever invented an AI that did do more than text prediction, it would have a pretty fast takeoff, going from toy to superintelligence in just a few years.
Speaking of which – can anything based on GPT-like principles ever produce superintelligent output? How would this happen? If it’s trying to mimic what a human can write, then no matter how intelligent it is “under the hood”, all that intelligence will only get applied to becoming better and better at predicting what kind of dumb stuff a normal-intelligence human would say. In a sense, solving the Theory of Everything would be a failure at its primary task. No human writer would end the sentence “the Theory of Everything is…” with anything other than “currently unknown and very hard to figure out”.
It's clear that GPT-3 is an absolutely astounding piece of hardware, but it's just a primordial form of what's coming. Indeed, as I mentioned elsewhere, it ought not be that difficult to increase the number of parameters another order of magnitude. After all, GPT-1 was ~100 million parameters, and GPT-3 is 175 billion. That's three orders of magnitude in just two years. Assuming a slow down due to the difficulty in scaling up training and energy consumption, at least another would be possible by next year. If not another order of magnitude and then some, perhaps growing to upwards of 4 or 5 trillion data parameters.
This would be easily feasible if a GPT-3-esque network were trained on heavy data, such as, say, image and audio that was broken down into bits.
And gee, what was that "big secret project"
One of the biggest secrets is the project OpenAI is working on next. Sources described it to me as the culmination of its previous four years of research: an AI system trained on images, text, and other data using massive computational resources. A small team has been assigned to the initial effort, with an expectation that other teams, along with their work, will eventually fold in. On the day it was announced at an all-company meeting, interns weren’t allowed to attend. People familiar with the plan offer an explanation: the leadership thinks this is the most promising way to reach AGI.
It's clear something very explosive is coming, and it's coming very very soon.
My prediction is that, sometime next year, we'll hear that a team— almost certainly OpenAI, but a tiny possibility that it's someone else— has unveiled an operational AI that's capable of running a theoretically infinite number of problems, capable of remembering words far past any set time, and able to solve scientific and mathematical theorems through spontaneous humanlike reasoning abilities. This includes being able to program things using natural language (e.g. "Build a game of Pong") and generate incredible lengths of coherent text. Its ability as a cognitive agent would make it the perfect digital assistant, able to easily pass a difficult rendition of the Turing Test and Winograd Schema challenge alike. It will be, in essence and in some limited form, the first artificial general intelligence. Perhaps not the all-powerful god or even a sapient artificial lifeform we wanted, but certainly something far beyond anything anyone expected this soon into the decade, before we even had neurofeedback to further empower it (and that's definitely still coming). It'll be the dawn of a new day.