Generalist AI doesn't scale

There has been a lot of talk about AI recently, and one particular point has received sigificant attention in the tech industry: The cost of training models. According to some insiders — and the market capitalization of NVIDIA — the computing power needed for AI training threatens to upend the entire semiconductor industry. This should not be a surprise: Generalist AI doesn't scale.

Reduced to its essentials, the task of training a size-N model is one of hill-climbing in N-dimensional space. You take O(N) inputs, run them through your model, and after each of them you nudge the model slightly uphill towards the desired responses. You need O(N) inputs because with any less than that the model will overfit — essentially memorizing the specific set of inputs rather than generalizing from them — and for each of these inputs you need to perform O(N) computation since you have N parameters in the model to tune. End result: O(N^2) computation.

Now, there are plenty of other problems in AI — one of the largest is generating enough training data (easy enough for Chess or Go where you can have the AI play games against itself, but for general knowledge you eventually run out of textbooks) — and you can push against scaling laws for a while simply by throwing more money at them; but in the end you can't defeat scaling. You'll end up boiling the oceans.

So what's the solution? Don't do Generalist AI. Instead, we need to switch to using a pool of expert AIs. Instead of a single size-N model, split the model into k parts, each trained on N/k inputs. On sub-model learns all about medicine; another learns all about modern art. You still have N inputs, but since each of them is only used to optimize a set of N/k parameters, your training cost is now O(N^2 / k).

And yes, you lose something by doing this — you probably won't get hallucinations of modern artwork depicting polypeptides. But, as with humans, most queries can be answered by the appropriate specialist; and it's better to have a collection of experts than a generalist which is too expensive to train effectively. (One could even have a "dispatcher" sub-model which knows enough to identify which of the specialists to refer a query to.) And by reducing the training cost, you become able to build a collection of models which is larger — and smarter — than a generalist model could ever be.

Specialization isn't just for insects. It's for AIs too.

Posted at 2024-04-06 15:30 | Permanent link | Comments

Daemonic Dispatches

Generalist AI doesn't scale

Recent posts

Monthly Archives

Yearly Archives