Three factors drive the advance of AI: algorithmic innovation, data, and the amount of compute available for training. Algorithmic progress has traditionally been more difficult to quantify than compute and data. In this blog algorithmic progress has an aspect that is both straightforward to measure and interesting: reductions over time in the compute needed to reach past capabilities. The number of floating-point operations required to train a classifier to AlexNet-level performance on ImageNet has decreased by a factor of 44x between 2012 and 2019. This corresponds to algorithmic efficiency doubling every 16 months over a period of 7 years. Notably, this outpaces the original Moore’s law rate of improvement in hardware efficiency (11x over this period). We observe that hardware and algorithmic efficiency gains multiply and can be on a similar scale over meaningful horizons, which suggests that a good model of AI progress should integrate measures from both.
Algorithmic improvement is a key factor driving the advance of AI. It’s important to search for measures that provide overall algorithmic progress, even though it’s harder than measuring such trends in compute.
Algorithmic efficiency can be defined as reducing the compute needed to train a specific capability. Efﬁciency is the primary way we measure algorithmic progress on classic computer science problems like sorting. Efficiency gains on traditional problems like sorting are more straightforward to measure than in ML because they have a clearer measure of task difficulty. However, we can apply the efficiency lens to machine learning by holding performance constant. Efficiency trends can be compared across domains like DNA sequencing (10-month doubling), solar energy (6-year doubling), and transistor density (2-year doubling).
1. Performance is often measured in different units (accuracy, BLEU, points, ELO, cross-entropy loss, etc) and gains on many of the metrics are hard to interpret. For instance going from 94.99% accuracy to 99.99% accuracy is much more impressive than going from 89% to 94%.
2.The problems are unique and their difficulties are not comparable quantitively, so assessment requires gaining an intuition for each problem.
3. Most research focuses on reporting overall performance improvements rather than efficiency improvements, so additional work is required to disentangle the gains due to algorithmic efficiency from the gains due to additional computation.
4. The benchmarks of interest are being solved more rapidly, which exacerbates 1) and 2). For instance it took 15 years to get to human-level performance on MNIST, 7 years on ImageNet, and GLUE only lasted 9 months.
Other measures of AI progress
In addition to efficiency, many other measures shed light on overall algorithmic progress in AI. Training cost in dollars is related, but less narrowly focused on algorithmic progress because it’s also affected by improvement in the underlying hardware, hardware utilization, and cloud infrastructure. Sample efficiency is key when we’re in a low data regime, which is the case for many tasks of interest. The ability to train models faster also speeds up research and can be thought of as a measure of the parallelizability of learning capabilities of interest. We also find increases in inference efficiency in terms of GPU time, parameters, and flops meaningful, but mostly as a result of their economic implications rather than their effect on future research progress. Shufflenet achieved AlexNet-level performance with an 18x inference efficiency increase in 5 years (15-month doubling time), which suggests that training efficiency and inference efficiency might improve at similar rates. The creation of datasets/environments/benchmarks is a powerful method of making specific AI capabilities of interest more measurable.
Tracking efficiency going forward
If large scale compute continues to be important to achieving state of the art (SOTA) overall performance in domains like language and games then it’s important to put effort into measuring notable progress achieved with smaller amounts of compute. Models that achieve training efficiency state of the arts on meaningful capabilities are promising candidates for scaling up and potentially achieving overall top performance. Additionally, figuring out the algorithmic efficiency improvements are straightforward since they are just a particularly meaningful slice of the learning curves that all experiments generate.
Measuring long run trends in efficiency SOTAs will help paint a quantitative picture of overall algorithmic progress. Hardware and algorithmic efficiency gains are multiplicative and can be on a similar scale over meaningful horizons, which suggests that a good model of AI progress should integrate measures from both.
AI tasks with high levels of investment algorithmic efficiency might outpace gains from hardware efficiency. Industry leaders, policymakers, economists, and potential researchers are all trying to better understand AI progress and decide how much attention they should invest and where to direct it. Measurement efforts can help ground such decisions.
In fact, this work was primarily done by training PyTorch examples models, with tweaks to improve early learning.