starspawn0
https://arxiv.org/abs/2002.06177
Systematicity might be one area where language models still aren't perfect. More general sorts of tests than we've seen in these tweets would reveal how systematic they are. I have some theories about why large neural nets should do ok with systematicity. My theory about this uses the fact that if you take different kinds of "objects" (words, phrases, paragraphs, events, or objects in images, etc.) and feed them through a neural net, at each layer (higher layers learn more abstract representations) the neural net maps them to a common representation space. So, a paragraph about vampires will activate some of the same (high-level) "neurons" as an image of a vampire, and those neurons will also activate when the model hears the word "vampire" spoken (audio input to model). There will even be neurons that represent generic "objects" the same way -- when you talk about some initial object and how it relates to some other, secondary object, there will be neurons that light up for "first object" versus "second objects"; and these will happen regardless of the modality, context, and what "object" refers to (vampire, werewolf, President Lincoln, the Blue Danube sweet).
But, now, the model may learn to apply "variables" at any level of this abstraction hierarchy (the higher up in the hierarchy, the higher the layer in the neural net); the neural net has the same topology regardless of what layer you look at, so anything that can happen at the "lower levels" that usually encode "low level features" of the input will apply to the "higher levels" as well. Maybe initially when you train the model on very specific types of problems using variable binding, it learns to apply some rule at a low level, and can't apply the rule in some other modality or to objects in the same modality but of a different type, say. But train it long enough, and it keeps making errors if it doesn't learn to use rules at the more abstract level -- eventually, in order to reduce the training loss yet further, it develops the right "circuits" at these higher levels, and starts showing a capacity to apply the same rule across multiple modalities. Maybe it was only trained to learn to apply new rules across a few hundred contexts; but as a consequence of learning to apply rules at those higher levels, it shows a capacity to apply them to all or almost all contexts. That's one of the hallmarks of systematicity!