Alex Montgomerie

The Hidden Environmental Cost of Machine Learning

I believe that for researchers, it is important for us to be aware of what the impact of our work is outside of just it's research value. We should be aware of what the wider impact of it is on society as well as the cost incurred in carrying out this research. For the discipline of Machine Learning (ML), we are often told to think about the ethical implications of our work. However one aspect which I think is often overlooked is the environmental impact. For those of us who are lucky enough to have access to powerful servers, we forget that this computational capacity is driven by real energy which does not necessarily come from a sustainable source. And so the natural question is what is the environmental impact of machine learning, and is it worth it? This is the question that myself, along with Diedrick Vink and Aditya Rajagopal explored in a discussion session we held at the IEEE conference on Advances in Communications, Devices and Systems (ACDS), 2021. A copy of the slides from the presentation can be found here.

Background

To answer this question, let's start with some background. The process of deploying a deep learning model has two stages: training and inference. At the training stage, you take a black-box model and update the parameters inside the model to improve the accuracy for a given task. To update these parameters, this is usually done by iterating over a large dataset of example inputs and outputs, and propagating the errors in the models output and expected output backwards, such that the accuracy is improved. This process can take extremely long (many weeks) to run on un-optimized hardware such as a CPU, so what is common is to use GPU or other specialized hardware in parallel to speed up the training. Once a model is trained, then it's deployed to perform inference for it's given task. This stage is typically run in a database, using similar hardware to what it's trained on.

Training

Let's start by looking at the environmental impact of the training side. This can be thought of as a one time cost, however what ML companies will typically do is retrain their models as their datasets grow. Patterson et al. [1], who work on Google's ML computing platform, have given some insight into how much energy is needed to train new ML models. Modern neural networks are becoming increasingly large as their accuracy is improving as well as the hardware to train them on improves. This means that there are now in the order of billions in parameters for networks such as GPT-3. In order to train these massive networks, the common process used requires three stages: Network Architecture Search (NAS), prototyping, and final training. The NAS stage is when the model architecture is generated by trying out different networks in a given design space and getting their accuracy. Once an architecture is found, the hyperparameters are optimized during the prototyping stage. Finally, the network is trained to achieve it's best accuracy. You can imagine that the first two stages on their own can require a significant amount of energy, as the network is trained multiple times. Infact, Strubell et al. [2] estimate that performing the whole training pipeline for GPT-3 can be equivalent to the lifetime of 5 cars, in terms of CO2 emissions. The Google paper argues that the true datacenter efficiency as well as hardware efficiency can dramatically reduce this estimate however. In their paper, they focused on the final retraining step, and gave CO2e estimates for their application case. What they found was that doing the final training run for GPT-3 takes roughly 550 tonnes of CO2, which is equivalent to a plane flying back and forth between San Francisco and New York 3 times. However they suggest that using their hardware (Tensor Processing Units) in their datacenters, a similar model T5 uses the equivalent of 45 tonnes of CO2.

Inference

These staggering numbers for training put into perspective how much CO2 we are actually using to create these models. And the figures given are just for one company running these networks, however there are many companies and universities researching and retraining ML models everyday. More importantly, these models are being used everyday as well; training is one aspect, however what is the impact once we have the network up and running? To try to answer this question, let's look at an example of an ML application: Google Translate.

Google Translate is a popular service for translating text between several different languages. To do this, they use the GNMT model, which is a type of Long-Term Short-Term (LSTM) network used for automated translation. Google have stated in a blog post that around 500 million people use Google Translate worldwide, and that they handle around 100 billion words a day. We also have an idea of the sort of hardware Google use, as they have published and even offer their Tensor Processing Unit (TPU) hardware for deploying ML networks. Using this knowledge, we can estimate the usage of this ML model, but how do we relate this to CO2 emissions? Well to do so I proposed the following model for power consumption at the ACDS conference:

\[ P_{total} = N_{TPU} \cdot \left( \frac{T_{req}}{T_{TPU}\cdot N_{TPU}} \cdot P_{TPU}^{busy} + \left( 1 - \frac{T_{req}}{T_{TPU}\cdot N_{TPU}} \right) \cdot P_{TPU}^{idle} \right) \cdot PUE \]

In this model, we have the following parameters:

\( N_{TPU}\): The number of processors (TPUs) in the datacenter
\( T_{req}\): The frequency of inference requests per second
\( T_{TPU}\): The throughput a single TPU can handle
\( P_{TPU}^{busy}\): The power consumed per TPU whilst it's processing the model
\( P_{TPU}^{idle}\): The power consumed per TPU whilst it's idle
\( PUE\): The power efficiency of the datacenter

From what we have found out about Google Translate, we can populate some of the parameters of the model in the table below. The main question is how many TPUs do Google use to perform inference?

Parameter	Unit	Value
\( T_{req}\)	queries/s	1,200,000
\( T_{TPU}\)	queries/s	175
\( P_{TPU}^{busy}\)	W	96
\( P_{TPU}^{idle}\)	W	72.5
\( PUE\)*	N/A	1.1

* Google publish their datacenter Power Usage Effectiveness (PUE) in sustainability reports.

To produce an estimate for the CO2 emissions of Google Translate, we have to first estimate how many TPUs are actually used. To do this, we must first understand that the workload is not going to be constant. Google quote the average number of queries a day, but at any given moment they may be handling 10 times more or 10 times less than that. Therefore, they must build in some redundancy in order to be able to serve their users. It is worth noting that one way of handling this is through virtualisation, where a computing unit can be running a varied number of tasks at any given time. However TPU hardware is very specialised and is unlikely to support virtualisation. We will look at both an efficient and inefficient datacenter setup to get an idea of the range in environmental impact inference may have.

	Efficient	Redundant
No. TPUs	7000	10000
Efficiency (%)	98	7
Energy per Year (MWh)	5780	71000
CO2e (tonnes)	4100	50000
Average Human per Year (CO2e)	820	10000
Lifetime of a Car (CO2e)	72	880

We can see that the impact of inference is potentially a lot larger than training, with orders of magnitude more CO2e, and only for a single year. However, there are some interesting discussions to be had from these results. For instance, I estimate the CO2e for Google Translate to be equivalent to between 1000 and 10000 people. This is a lot, but imagine 10000 people trying to handle 500 million peoples translation requests everyday. For this application in particular, I do see the benefit we are getting from this service, and although this service impacts the planet, it is a lot less significant than a human-based alternative.

But this is one example, and there are many machine learning applications out there with different impacts on both society and the environment. This example only touches the surface. Amazon Web Services (AWS), who own roughly half of all datacenters worldwide, report that 90% of their ML computing platforms are used for inference. The estimates I have done account for only a small fraction of the impact of ML inference in total.

Conclusion

All in all, it's quite evident that we cannot ignore the impact ML has on the environment, particularly considering how prominent it will be in the future of computing. All of us, even PhD students, should at least consider the environmental impact of our work and think about ways to reduce it. What is definitely more important though, is for ML companies to be more transparent about this aspect of their work. We are all in the dark in terms of what datacenters and particularly ML applications are doing to the environment, and so cannot weigh up this with the social benefits they may have. And to try and reduce the impact of ML, we should look towards the inference side, as this is where the majority of the energy is consumed. This is a great motivation for me to continue looking into the Low-Power side of ML acceleration, as there can be positive real-world outcomes from research in this area.

Alexander Montgomerie-Corcoran, 3rd September 2021

[1]	David A. Patterson, Joseph Gonzalez, Quoc V. Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David R. So, Maud Texier & Jeff Dean. Carbon Emissions and Large Neural Network Training, Apr 2021.
[2]	Emma Strubell, Ananya Ganesh & Andrew McCallum. Energy and Policy Considerations for Deep Learning in NLP, Jun 2019.