This repository presents the source code for the experimental tests for the following paper:
Hudson, N., Hayot-Sasson, V., Babuji, Y., Baughman, M., Pauloski, J. G., Chard, R., ... & Chard, K. (2024). Flight: A FaaS-Based Framework for Complex and Hierarchical Federated Learning. arXiv preprint arXiv:2409.16495.
@article{hudson2024flight,
title={Flight: A FaaS-Based Framework for Complex and Hierarchical Federated Learning},
author={Hudson, Nathaniel and Hayot-Sasson, Valerie and Babuji, Yadu and Baughman, Matt and Pauloski, J Gregory and Chard, Ryan and Foster, Ian and Chard, Kyle},
journal={arXiv preprint arXiv:2409.16495},
year={2024}
}
These scaling tests involve benchmarks using our own Flight federated learning framework and the Flower framework.
First, you must setup your Python envrionment (using either venv
or conda
).
These tests were run with Python 3.11.8 specifically.
To set
$ conda create -n=<env_name> python=3.11.8
$ conda activate <env_name>
or
$ python3.11 -m venv <env_name>
$ source <env_name>/bin/activate
For any of the model training in these tests, we use the Fashion MNIST benchmark dataset. To use this dataset for these tests, you must first download the data onto your machine in a directory of your choosing (just be sure to take note of where you save it). This can be done by running the provided Python script:
$ python download_data.py --root .
This will download the dataset using torchvision.datasets
.
Weak-scaling tests using our proposed Flight framework on HPC systems with Parsl. These tests use Parsl's default data transfer implementation in addition to Redis (via Proxystore) as separate tests.
Weak-scaling tests for the Flower framework.
Tests that simulate hierarchical federated learning with Flight. Calculations of communication costs are also included.
Tests that compare synchronous and asynchronous federated learning with Flight.
Remote execution tests prepared for Amazon EC2 instances.