Since the publication of the scaling laws, which demonstrated that model performance improves as a predictable power-law function of model size, dataset size, and total training compute, the dominant constraint on progress appeared to be compute itself (the sheer number and quality of GPUs that could be brought to bear on a single training run).
Sì Let’s invest in energy stocks
Invest in my happiness