Skip to content

llpilla/energy-optimal-federated-learning

Repository files navigation

1. Artifact Identification

  • Article’s title: Scheduling Algorithms for Federated Learning With Minimal Energy Consumption
  • Author’s name and affiliation: Laércio Lima Pilla - University of Bordeaux, CNRS, Bordeaux INP, Inria, LaBRI, UMR 5800, F-33400 Talence, France
  • DOI : 10.1109/TPDS.2023.3240833
  • Abstract: Federated Learning (FL) has opened the opportunity for collaboratively training machine learning models on heterogeneous mobile or Edge devices while keeping local data private. With an increase in its adoption, a growing concern is related to its economic and environmental cost (as is also the case for other machine learning techniques). Unfortunately, little work has been done to optimize its energy consumption or emissions of carbon dioxide or equivalents, as energy minimization is usually left as a secondary objective. In this paper, we investigate the problem of minimizing the energy consumption of FL training on heterogeneous devices by controlling the workload distribution. We model this as the Minimal Cost FL Schedule problem, a total cost minimization problem with identical, independent, and atomic tasks that have to be assigned to heterogeneous resources with arbitrary cost functions. We propose a pseudo-polynomial optimal solution to the problem based on the previously unexplored Multiple-Choice Minimum-Cost Maximal Knapsack Packing Problem. We also provide four algorithms for scenarios where cost functions are monotonically increasing and follow the same behavior. This artifact contains implementations of the aforementioned algorithms and the means to compare the quality of their solutions and their execution times. They enable the reproduction of the article’s results, and they can be adapted for use in real systems or new experiments.

2. Artifact Dependencies and Requirements

This code has been tested on systems with the following basic characteristics:

  • Hardware resources: an x86_64 processor, few GB of RAM, few MB of storage. The running code never consumed more than 1 GB of memory and its results never occupied more than 10 MB in storage. Initial results indicate that the code also runs on ARM 64 processors (A64FX more specifically).
  • Operating systems: a Linux system such as Ubuntu. The code should be compatible with MacOS systems too. Windows systems with Windows Subsystem for Linux should also be compatible.
  • Software libraries needed: Python3 for the base code and Shell script for some simple scripts. We use numpy in the base code and matplotlib, pandas, seaborn, and scipy for the analysis of the results and figure generation. Jupyter can also be used for analysis but it is not required.
  • Input datasets: all inputs are generated by Python3 scripts already available in the artifact. Two sets of results generated in different experimental campaigns are also available to enable the replication of the analysis without requiring new runs.

The specific characteristics of previously employed environments are provided later in Section 4. Reproducibility of Experiments.

3. Artifact Installation and Deployment Process

Installation

If you download this artifact from Zenodo (directly from the website or using wget), you will have to decompress the .zip file (for instance, using unzip [file]). You can skip this step if you are downloading this artifact from GitHub using git clone. You will find the following organization in the base folder:

├── chameleon_requirements.txt
├── chameleon_results
│   ├── Analysis of execution times.ipynb
│   ├── Analysis of execution times.py
│   ├── Analysis of total costs.ipynb
│   ├── Analysis of total costs.py
│   ├── results_of_timing_with_fixed_resources.csv
│   ├── results_of_timing_with_fixed_tasks.csv
│   ├── results_with_constant_marginal_costs.csv
│   ├── results_with_constant_marginal_costs_no_upper_limit.csv
│   ├── results_with_decreasing_marginal_costs.csv
│   ├── results_with_increasing_marginal_costs.csv
│   ├── results_with_random_costs.csv
│   └── run_all_analysis.sh
├── code
│   ├── devices.py
│   ├── __init__.py
│   ├── schedulers.py
│   └── support.py
├── experiment_with_constant_marginal_costs_no_upper_limit.py
├── experiment_with_constant_marginal_costs.py
├── experiment_with_decreasing_marginal_costs.py
├── experiment_with_increasing_marginal_costs.py
├── experiment_with_random_costs.py
├── LICENSE
├── original_requirements.txt
├── original_results
│   ├── Analysis of execution times.ipynb
│   ├── Analysis of execution times.py
│   ├── Analysis of total costs.ipynb
│   ├── Analysis of total costs.py
│   ├── results_of_timing_with_fixed_resources.csv
│   ├── results_of_timing_with_fixed_tasks.csv
│   ├── results_with_constant_marginal_costs.csv
│   ├── results_with_constant_marginal_costs_no_upper_limit.csv
│   ├── results_with_decreasing_marginal_costs.csv
│   ├── results_with_increasing_marginal_costs.csv
│   ├── results_with_random_costs.csv
│   └── run_all_analysis.sh
├── README.md
├── run_all_timing_experiments.sh
├── run_all_total_cost_experiments.sh
├── run_analysis_on_new_results.sh
├── timing_with_fixed_resources.py
├── timing_with_fixed_tasks.py
└── unitary_tests.py

You can install the necessary Python3 libraries using the command pip3 install -r [requirements file]. Use file original_requirements.txt for more recent library versions and to reproduce the environment of the original experiments in the article, or use file chameleon_requirements.txt for older library versions and to reproduce the results in the Chameleon platform.

Deployment

If you want to build your own experiments, please check the experiment_with_… files for examples on how to test the algorithms.

You can run codes directly using Python3 (example: python3 experiment_with_random_costs.py).

You can also use one of the following Shell scripts below to launch multiple experiments:

  • run_all_total_cost_experiments.sh: runs all total cost experiments using files starting with the name “experiment”.
  • run_all_timing_experiments.sh: runs all timing experiments using files starting with the name “timing”.

Running these scripts should take between 10 and 15 hours each on most platforms. More detailed timings are provided in Section 4.3 Experimental platforms.

After running your experiments, you can use the script run_analysis_on_new_results.sh to run the analysis scripts that are stored in the original_results folder (Analysis of total costs.py and Analysis of execution times.py) over your new results. The Python3 scripts can also be used to generate the same figures and analysis employed in the article using the provided data.

4. Reproducibility of Experiments

A detailed description of the experiments and their results is available in the supplemental material of the article “Scheduling Algorithms for Federated Learning With Minimal Energy Consumption” (DOI: 10.1109/TPDS.2023.3240833/mm1).

Experiments are conducted in two parts: one run to gather total cost results, and one run for timing the algorithms. It is important to not run other software during the timing experiments to avoid adding noise to the performance measurements. These experiments have been first conducted on an ‘original’ machine (whose results are presented in the article and supplemental material) and they have been reproduced in a machine in the Chameleon testbed. Information about both machines are provided later in Section 4.3.

4.1 Total cost experiments

These experiments test the five proposed algorithms ((MC)2MKP, MarIn, MarCo, MarDecUn, and MarDec) and FedAvg with resources following four different cost behaviors (random, marginal increasing, marginal constant, and marginal decreasing). The algorithms have to schedule from 1000 to 5000 tasks (in increments of 100) over 10 or 100 resources. They can be launched with the command ./run_all_total_cost_experiments.sh.

The results are organized in five CSV files, namely results_with_constant_marginal_costs.csv, results_with_constant_marginal_costs_no_upper_limit.csv, results_with_decreasing_marginal_costs.csv, results_with_increasing_marginal_costs.csv, and results_with_random_costs.csv. Each line in the CSV files contains the name of the algorithm employed, the number of tasks, the number of resources, and the total cost found in the scenario. Each file also contains a header with a description of the experiment.

These results emphasize the known properties of the algorithms:

  • (MC)2MKP always finds optimal solutions;
  • MarIn finds optimal solutions when marginal costs are increasing (or constant);
  • MarCo finds optimal solutions when marginal costs are constant;
  • MarDec and MarDecUn find optimal solutions when marginal costs are decreasing (or constant);
  • FedAvg provides no guarantees of optimality.

The original results and the results collected in the Chameleon testbed show minor differences for the results with increasing and decreasing marginal costs. The differences are on the low digits of some floating point values. The absolute differences are under 10e-10 and the relative differences (absolute difference divided by the original value) are under 10e-15. We attribute these differences to changes in hardware and software that can lead to different floating point imprecisions. Nonetheless, these minor differences do not change any of the conclusions of the experiments.

4.2 Timing experiments

Our experiments here are split into two. We first fix the number of resources at 100, and vary the tasks from 200 to 2000 in increments of 200, showing us how the number of tasks influences the execution time. Then, we fix the number of tasks at 2000, and vary the number of resources from 20 to 80 in increments of 20. All resources follow linear cost functions, as we assume that their costs should not have a major impact on the performance of the schedulers. For each triple <scheduler, tasks, resources> we gather 20 samples. Each sample is composed of 5 runs of a scheduler measured using Python’s timeit module. The order that the samples are collected is randomized to reduce issues with interference and system jitter. These experiments can be launched with the command ./run_all_timing_experiments.sh.

The results are organized in two CSV files named results_of_timing_with_fixed_resources.csv and results_of_timing_with_fixed_tasks.csv. Each line in the CSV files contains the name of the algorithm employed, the number of tasks, the number of resources, and the time taken to run the five repetitions in each sample in seconds. Each file also contains a header with a description of the experiment.

These results reflect how the different time complexities of the algorithms lead to vastly different execution times. (MC)2MKP (O(T^2n)) is visibly the slowest algorithm, with execution times varying between the hundreds of milliseconds and the tens of seconds. When we compare it to MarDec (O(Tn^2)), we can see that their times are similar when the number of tasks and resources are similar too. However, when we increase the number of tasks by a factor of ten, (MC)2MKP’s time increases by a factor of a hundred, while MarDec’s time only increases by a factor of ten.

MarDecUn and FedAvg show a difference of about one order of magnitude in their execution times, even though they are both linear in the number of resources. This happens because they require very different operations. MarDecUn requires looping over the resources to assign the lower limits to all resources and to find the one with the smallest marginal cost. Meanwhile, FedAvg can directly assign the same number of tasks to all resources using an optimized numpy operation, leading to a faster execution.

4.3 Experimental platforms

  • Original
    • Hardware: experiments were executed on a Dell Latitude 7420 notebook with an 11th Gen Intel(R) Core(TM) i7-1185G7 processor, 32GB of LPDDR4X RAM (2133MHz), and a Western Digital PC SN530 NVMe WDC 512GB SSD. The computer was plugged to a power source at all times.
    • Software: the computer runs Ubuntu 20.04.5 LTS (kernel version 5.14.0-1054-oem). We used Python 3.8.10 with numpy version 1.22.2 for the experiments. Modules matplotlib (3.5.1), pandas (1.4.3), seaborn (0.11.2), and scipy (1.7.1) were used for the visualization and statistical analysis of results. During scheduling time experiments, no other applications were open besides a terminal.
    • Required time: total cost experiments take about 11 hours, while timing experiments take about 10 hours in this machine. Times were measured with time.
    • Previously generated results: all original results are available in folder original_results. Please run ./run_all_analysis.sh in the folder to repeat the result analysis.
  • Chameleon
    • Hardware: experiments were conducted on a compute_skylake node at TACC through the Chameleon testbed. It contains two Intel(R) Xeon(R) Gold 6126 processors, 192 GB of DDR4 RAM (2666MHz), and 240GB of SSD storage.
    • Software: the computer runs an image of CC-Ubuntu18.04 (kernel version 4.15.0-191-generic), which comes with Python 3.6.9. Modules matplotlib (3.3.4), pandas (1.1.5), seaborn (0.11.2), scipy (1.5.4), and numpy (1.19.5) were installed for the experiments. The tool screen was used to keep the experiments running while disconnected.
    • Required time: total cost experiments take about 13.5 hours, while timing experiments take about 13 hours in this machine. Times were measured with time.
    • Previously generated results: all original results are available in folder chameleon_results. Please run ./run_all_analysis.sh in the folder to repeat the result analysis.

About

Set of scheduling algorithms to minimize the energy consumption of Federated Learning training rounds.

Resources

License

Stars

Watchers

Forks

Packages

No packages published