Language "entropy" model predicts Turing Test will be passed before 2025

Talk about scientific and technological developments in the future
Post Reply
User avatar
funkervogt
Posts: 1171
Joined: Mon May 17, 2021 3:03 pm

Language "entropy" model predicts Turing Test will be passed before 2025

Post by funkervogt »

Penn Treebank (Word Level) | 2024-07-06 | 2024-09-14

Penn Treebank (Character Level) | 2034-11-12 | 2035-04-17

WikiText-103 | 2021-11-11 | 2022-09-27

enwiki8 | 2022-06-15 | 2023-02-06

One Billion Word | 2029-01-06 | 2027-04-22

Text8 | 2022-03-17 | 2025-03-21

Hutter Prize | 2022-06-22 | 2025-03-17

WikiText-2 | 2022-06-10 | 2024-10-11

For regression method 1, the median predicted date is 2022-06-18 and the mean predicted date is 2024-12-23. However, the uncertainty here is quite high; the Penn Treebank (Character Level) benchmark yields a date well ahead of all the others in 2034!

It's worth noting a few methodological issues with my simple analysis (which could be corrected with future research). Firstly, I did not directly attempt to compute the entropy of each dataset, instead testing values near Shannon's 70 year old estimate. Secondly, the data on Papers With Code relies on user submissions, which are unreliable and not comprehensive. Thirdly, I blindly used linear regression on entropy to extrapolate the results. It is very plausible, however, that a better regression model would yield different results. Finally, many recent perplexity results were obtained using extra training data, which plausibly should be considered cheating.

Nonetheless, I would be surprised if a more robust analysis yielded a radically different estimate, such as one more than 20 years away from mine (the mean estimate of ~2025). While I did not directly estimate the entropy for each dataset, I did test the values [0.4,0.5,0.6,0.7,0.8], and found that changing the threshold between these values only alters the final result by a few years at most in each case. Check out the Jupyter notebook to see these results.

I conclude, therefore, that either current trends will break down soon, or human-level language models will likely arrive in the next decade or two.
https://www.metaculus.com/notebooks/832 ... ge-models/
User avatar
Ozzie guy
Posts: 486
Joined: Sun May 16, 2021 4:40 pm

Re: Language "entropy" model predicts Turing Test will be passed before 2025

Post by Ozzie guy »

funkervogt wrote: Fri Oct 22, 2021 12:53 am
Penn Treebank (Word Level) | 2024-07-06 | 2024-09-14

Penn Treebank (Character Level) | 2034-11-12 | 2035-04-17

WikiText-103 | 2021-11-11 | 2022-09-27

enwiki8 | 2022-06-15 | 2023-02-06

One Billion Word | 2029-01-06 | 2027-04-22

Text8 | 2022-03-17 | 2025-03-21

Hutter Prize | 2022-06-22 | 2025-03-17

WikiText-2 | 2022-06-10 | 2024-10-11

For regression method 1, the median predicted date is 2022-06-18 and the mean predicted date is 2024-12-23. However, the uncertainty here is quite high; the Penn Treebank (Character Level) benchmark yields a date well ahead of all the others in 2034!

It's worth noting a few methodological issues with my simple analysis (which could be corrected with future research). Firstly, I did not directly attempt to compute the entropy of each dataset, instead testing values near Shannon's 70 year old estimate. Secondly, the data on Papers With Code relies on user submissions, which are unreliable and not comprehensive. Thirdly, I blindly used linear regression on entropy to extrapolate the results. It is very plausible, however, that a better regression model would yield different results. Finally, many recent perplexity results were obtained using extra training data, which plausibly should be considered cheating.

Nonetheless, I would be surprised if a more robust analysis yielded a radically different estimate, such as one more than 20 years away from mine (the mean estimate of ~2025). While I did not directly estimate the entropy for each dataset, I did test the values [0.4,0.5,0.6,0.7,0.8], and found that changing the threshold between these values only alters the final result by a few years at most in each case. Check out the Jupyter notebook to see these results.

I conclude, therefore, that either current trends will break down soon, or human-level language models will likely arrive in the next decade or two.
https://www.metaculus.com/notebooks/832 ... ge-models/
How good do you think the source is, this coincides with Yuli Ban's proto AGI before 2025 prediction.
User avatar
funkervogt
Posts: 1171
Joined: Mon May 17, 2021 3:03 pm

Re: Language "entropy" model predicts Turing Test will be passed before 2025

Post by funkervogt »

Set and Meet Goals wrote: Fri Oct 22, 2021 1:48 am How good do you think the source is, this coincides with Yuli Ban's proto AGI before 2025 prediction.
I don't know. I've never heard of the guy before and just thought I'd pass it along without comment in the hopes you folks would appreciate it.

I stand by my prediction that the Turing Test will be passed before 2030. However, I also predict that the first machine to pass it will fail some of the re-tests, leading many people to dismiss the significance of the achievement.

The existence of a machine that seems human 99.5% of the time, but speaks jumbled nonsense 0.5% of the time--reminding you it is not actually sentient--will be disturbing.
Redspector
Posts: 37
Joined: Sat Sep 11, 2021 4:57 am

Re: Language "entropy" model predicts Turing Test will be passed before 2025

Post by Redspector »

Set and Meet Goals wrote: Fri Oct 22, 2021 1:48 am
funkervogt wrote: Fri Oct 22, 2021 12:53 am
Penn Treebank (Word Level) | 2024-07-06 | 2024-09-14

Penn Treebank (Character Level) | 2034-11-12 | 2035-04-17

WikiText-103 | 2021-11-11 | 2022-09-27

enwiki8 | 2022-06-15 | 2023-02-06

One Billion Word | 2029-01-06 | 2027-04-22

Text8 | 2022-03-17 | 2025-03-21

Hutter Prize | 2022-06-22 | 2025-03-17

WikiText-2 | 2022-06-10 | 2024-10-11

For regression method 1, the median predicted date is 2022-06-18 and the mean predicted date is 2024-12-23. However, the uncertainty here is quite high; the Penn Treebank (Character Level) benchmark yields a date well ahead of all the others in 2034!

It's worth noting a few methodological issues with my simple analysis (which could be corrected with future research). Firstly, I did not directly attempt to compute the entropy of each dataset, instead testing values near Shannon's 70 year old estimate. Secondly, the data on Papers With Code relies on user submissions, which are unreliable and not comprehensive. Thirdly, I blindly used linear regression on entropy to extrapolate the results. It is very plausible, however, that a better regression model would yield different results. Finally, many recent perplexity results were obtained using extra training data, which plausibly should be considered cheating.

Nonetheless, I would be surprised if a more robust analysis yielded a radically different estimate, such as one more than 20 years away from mine (the mean estimate of ~2025). While I did not directly estimate the entropy for each dataset, I did test the values [0.4,0.5,0.6,0.7,0.8], and found that changing the threshold between these values only alters the final result by a few years at most in each case. Check out the Jupyter notebook to see these results.

I conclude, therefore, that either current trends will break down soon, or human-level language models will likely arrive in the next decade or two.
https://www.metaculus.com/notebooks/832 ... ge-models/
How good do you think the source is, this coincides with Yuli Ban's proto AGI before 2025 prediction.
as good as the source funkervogt reads to prove China couldn't defeat taiwan.
User avatar
wjfox
Site Admin
Posts: 8733
Joined: Sat May 15, 2021 6:09 pm
Location: London, UK
Contact:

Re: Language "entropy" model predicts Turing Test will be passed before 2025

Post by wjfox »

Redspector wrote: Fri Oct 22, 2021 4:42 am
as good as the source funkervogt reads to prove China couldn't defeat taiwan.

If you're going to keep up with personal insults, you won't last much longer here.

Maybe you're under the impression that this place is like 4chan, or some other cool and "edgy" place. Well, it ain't. This forum is for adults to discuss science, technology and the future.

We already lost one exceptional member last year due to trolling, and I won't lose another one. So please – stop.
Post Reply