(I think they will need to integrate certain "priors" into their setup if they want it to perform really well. They should aim to store and retrieve memories a little bit like how the human brain does it, not like how a Turing Machine does it [even Turing Machines that have a kind of blurred memory to make them amenable to gradient descent.])
Learning to Execute and Neural Turing Machines
I'd like to draw your attention to two papers that have been posted in the last few days from some of my colleagues at Google that I think are pretty interesting and exciting:
Learning to Execute: http://arxiv.org/abs/1410.4615
Neural Turing Machines: http://arxiv.org/abs/1410.5401
The first paper, "Learning to Execute", by +Wojciech Zaremba and +Ilya Sutskever attacks the problem of trying to train a neural network to take in a small Python program, one character at a time, and to predict its output. For example, as input, it might take:
print((c+8704) if 2641<8500 else 5308)"
During training, the model is given that the desired output for this program is "12185". During inference, though, the model is able to generalize to completely new programs and does a pretty good of learning a simple Python interpreter from examples.
The second paper, "Neural Turing Machines", by +alex graves, Greg Wayne, and +Ivo Danihelka from Google's DeepMind group in London, couples an external memory ("the tape") with a neural network in a way that the whole system, including the memory access, is differentiable from end-to-end. This allows the system to be trained via gradient descent, and the system is able to learn a number of interesting algorithms, including copying, priority sorting, and associative recall.
Both of these are interesting steps along the way of having systems learn more complex behavior, such as learning entire algorithms, rather than being used for just learning functions.
Furthermore, just a few weeks ago, Sutskever and Mikolov (both work at Google) released a paper that is to appear at NIPS 2014 that shows how to use a neural network to do language translation. The thing that's neat about it is that it's an end-to-end system -- you just feed in to the network your sentence, one word at a time; and then it outputs the translation, one word at a time. There are no phrase tables. No grammars. Nothing. It figures all that out on its own!
And just today there was an annoucement that Deepmind will be partnering with some Oxford academics, after having acquired two Oxford AI startup companies. Here is a link to a FT.com article about it:
It's behind a paywall; but here are two little portions that give some nice details:
DeepMind, the UK artificial intelligence group purchased by Google for £400m earlier this year, has revealed plans to create a broad alliance with the University of Oxford after acquiring two companies spun out of computer science projects at the elite academic institution.
The London-based group, now a department within the US technology group, said it had acquired Dark Blue Labs and Vision Factory, enterprises founded by Oxford professors. DeepMind did not disclose the size of the transactions, but people familiar with the matter said the deals are worth tens of millions of pounds.
I've heard of all these people, and their works. Really impressive stuff!
Dark Blue Labs, led by Nando de Freitas and Phil Blunsom, is creating systems to understand “natural language” that would allow computers to comprehend the meaning of sentences and how people react to them.
Vision Factory, headed by Andrew Zisserman, is developing systems capable of the visual recognition of objects in the real word. This means, for example, giving robots three-dimensional awareness that can allow them to understand how a cup sits on a table.
Blunsom has written several papers on using neural nets to understand language. One of his papers even learns how to convert sentences into code that a computer can understand -- it's a powerful, general-purpose convolutional neural net.
Zisserman and Simoyan et al are known for their image and video recognition work. Among other things, they entered a system in the recent ImageNet competition, and placed second, beaten only by Google and the GoogLeNet system. In addition, their work on video recognition achieves a dramatic improvement over the work of Karpathy et al; however, unlike the work of Karpathy et al, they don't work with the raw piexels, but apply a few low-level features first -- still, it's a top-performing system. It will be interesting to see how much better they can make it!
DeepMind now has the expertise and the technology to take their work to the next level. Could we perhaps see complex systems with multiple modules that integrate language understanding with video and audio perception, as well as long-term memory (as in the above papers that Jeff Dean mentioned)? It's entirely possible. Hard to say what we will see in 5 years; but I'm optimistic.