Transformers are a neural network architecture. They are behind some of the most successful large language models (LLMs) we see today, like GPT-3, PaLM, BARD, and GPT-4. I have seen several papers claiming that transformers are Turing-complete, meaning that they can be used to simulate any computer program.
But transformer architectures are not Turing-complete. They cannot simulate computer programs. The papers that claim otherwise are making a conceptual error. Transformers have been impressive and extraordinary as tools but we need to be honest about what they can do and about the challenges that lie ahead of us on the path toward true artificial general intelligence.Continue reading “Transformers Aren’t Turing-complete, But a Good Disguise Is All You Need”