What Does It Mean for a Machine to Discover an Algorithm?

Most people think of AI as: show the model examples, hope it generalizes and learns patterns, then it applies the pattern to new stuff. That's exactly what a lot of ML is. But there's another problem that I think deserves more attention: can we build a system that doesn't just use algorithms, but actually comes up with new ones?

Implementing vs. Inventing

When a neural network learns to sort a list, what it's really doing is learning a mapping that approximates what a sorting algorithm would output. Feed it enough sorted lists and it gets decent at this. But it's not inventing mergesort. It has no concept of why mergesort beats bubble sort on large inputs. It's just producing things that look sorted.

Inventing an algorithm is a fundamentally different problem. You need to find a procedure that solves a whole class of inputs, that generalizes correctly, and that ideally does something nobody has done before. AlphaGo playing world-class Go is impressive. A system that discovers a new game-theoretic insight about Go is a different animal.

Two examples that actually worked

FunSearch (DeepMind, 2023) is probably the cleanest example so far. It uses an LLM to write programs, scores them against a mathematical objective, and evolves the population toward better solutions. It ended up finding new lower bounds for the cap set problem, a combinatorics result that human mathematicians had missed. That's not pattern matching. That's a machine finding a novel procedure that actually works.

AlphaTensor (DeepMind, 2022) pulled off something similar for matrix multiplication. They framed the search for fast multiplication algorithms as a game, trained an RL agent to play it, and the agent found algorithms that beat Strassen's in specific cases.

Both results are real. Both are also narrow. FunSearch needs a precise mathematical objective you can evaluate automatically. AlphaTensor works because matrix multiplication has a very particular kind of structure to search over. Neither tells you how to do algorithm discovery for arbitrary problems, which is the part that actually matters.

Why this lives rent free in my head

There's a benchmark called AlgoTune (NeurIPS 2025) that tries to measure whether LLMs can produce genuine algorithmic improvements. The short answer is: mostly no. They'll restructure code, apply known optimizations, occasionally get lucky. But they don't discover. They remix.

I think a big part of the reason is that these systems only know that something works on the examples they've seen, not why it works. If you understand that a sorting algorithm is correct because of an invariant it maintains at each step, you have real leverage on the problem. Pure example-based learning doesn't give you that.

Whether you can close this gap by giving a learning system a richer description of the task (not just input-output pairs, but something that captures the structure of what the task actually is) is the question I keep coming back to. I don't have a satisfying answer yet. But I think most current approaches are leaving information on the table by stripping tasks down to examples, and I want to know what happens when you stop doing that.

If you work on anything adjacent to this, or you just want to tell me I'm missing something, just ask the chatbot to the right for my email (also linked at the bottom of the page).