Transformers are a neural network architecture. They are behind some of the most successful large language models (LLMs) we see today, like GPT-3, PaLM, BARD, and GPT-4. I have seen several papers claiming that transformers are Turing-complete, meaning that they can be used to simulate any computer program.
But transformer architectures are not Turing-complete. They cannot simulate computer programs. The papers that claim otherwise are making a conceptual error. Transformers have been impressive and extraordinary as tools but we need to be honest about what they can do and about the challenges that lie ahead of us on the path toward true artificial general intelligence.
Every so often a new neural network makes headlines for solving a computation problem. It is sometimes hard for me to judge how impressive these achievements are without diving into the details of the models. But my criteria are always the same and it should be easy for those who are familiar with their models to evaluate based on these criteria. For this purpose I have made a flowchart for how impressed I would be at a neural network. If you know of a new neural net that reaches “wow” please let me know about it, and if it reaches “mind-blown” you have permission to wake me up in the middle of the night – since I know no examples.
Computation is usually conceptualized in terms of input/output functions. For instance, to compute the square root of a number is to somehow solve the function that takes a number as an input and outputs its square root. It is commonly stated that feed-forward neural networks are “universal approximators” meaning that, in theory, any function can be approximated by a feed-forward neural network. Here are some examples of this idea being articulated:
“One of the most striking facts about [feed-forward] neural networks is that they can compute any function at all.”
“In summary, a feed-forward network with a single layer is sufficient to represent any function [to an arbitrary degree of accuracy],…”
– Deep Learning (Section 6.4.1 Universal Approximation Properties and Depth), Ian Goodfellow
“…but can we solve anything? Can we stave off another neural winter coming from there being certain functions that we cannot approximate? Actually, yes. The problem of not being able to approximate some function is not going to come back.”