Many researchers, companies, and investors are confident on further progress in AI. To understand why, you need to understand one concept: scaling laws!
The background is that when you build AI, you need 3 ingredients: Models. Data. Computation. To scale your AI, you need more of all 3 of them: Larger models, more data, more computing power.
You can slightly change their ratio, and you can improve models, but you cannot scale one without the other.
Now when you scale, something interesting happens: The error decreases, and it does so in a predictable way! (“Error” here refers to how well the model predicts the next word of a text, because at the very core, this is what Large Language Models do.)
As shown in the chart in the title, created by OpenAI in 2020: If you increase compute, then the error (“Validation Loss”) decreases consistently. One nice thing shown in the top left corner: If you use a small model (few parameters), then at a certain point more compute does not decrease the error anymore. The 3 ingredients must scale together.
The gigantic bet of OpenAI and others is that this pattern will hold in the future: To reduce AI’s errors and make models even more powerful, you just need to scale up the 3 ingredients and the scaling laws will do their job.
While this seems plausible, there are some risks that this bet may not work out in the future:
- Scaling laws are not laws of nature. They work today, but they may be different for larger future models. Progress may just stall.
- Data may become scarce. Potentially, at some point in time, there may not be sufficient high-quality data available to scale up the 3 ingredients together.
- We do not know what “lower errors” will mean in practice. With lower error rates, models predict the next word better. But does this mean the AI will then revise contracts better? Or will become perfect at legal translation? Or will there be no practical implications at all? We do not know!
- Further scaling may just not be worth the effort. Scaling laws are not linear, so you need to scale up the 3 ingredients by orders of magnitude to get ever lower decreases in the error rate. At one point in time, the cost-benefit ratio may just not become unfavourable.