FedPS is a Python module designed for data preprocessing in Federated Learning, primarily leveraging aggregated statistics. The preprocessing workflow involves the following five steps:
- Local Statistics Estimation: Clients estimate local statistics from their local data.
- Aggregation: The server receives the local statistics and performs aggregation.
- Global Parameter Calculation: The server calculates the global preprocessing parameters.
- Parameter Distribution: The global parameters are then sent back to the clients.
- Data Preprocessing: Clients apply the preprocessing to their local data.
- Python (>= 3.9)
- Scikit-learn (>= 1.4)
- NumPy (>= 1.20)
- DataSketches
- PyZMQ
- Create a Python env
conda create --name fedps python=3.9
conda activate fedps
- Clone this project
git clone https://github.com/xuefeng-xu/fedps.git
- Build the project
cd fedps
pip install .
- Set up communication channels
# Client1 channel
from fedps.channel import ClientChannel
channel = ClientChannel(
local_ip="127.0.0.1", local_port=5556,
remote_ip="127.0.0.1", remote_port=5555,
)
# Client2 channel
from fedps.channel import ClientChannel
channel = ClientChannel(
local_ip="127.0.0.1", local_port=5557,
remote_ip="127.0.0.1", remote_port=5555,
)
# Server channel
from fedps.channel import ServerChannel
channel = ServerChannel(
local_ip="127.0.0.1", local_port=5555,
remote_ip=["127.0.0.1", "127.0.0.1"],
remote_port=[5556, 5557],
)
- Specify
FL_type
androle
in the preprocessor
-
FL_type
: "H" (Horizontal) or "V" (Vertical) -
role
: "client" or "server"
# Client1 code example
from fedps.preprocessing import MinMaxScaler
X = [[-1, 2], [-0.5, 6]]
est = MinMaxScaler(FL_type="H", role="client", channel=channel)
Xt = est.fit_transform(X)
print(Xt)
# Client2 code example
from fedps.preprocessing import MinMaxScaler
X = [[0, 10], [1, 18]]
est = MinMaxScaler(FL_type="H", role="client", channel=channel)
Xt = est.fit_transform(X)
print(Xt)
# Server code example
from fedps.preprocessing import MinMaxScaler
est = MinMaxScaler(FL_type="H", role="server", channel=channel)
est.fit()
- Run the script
# Run in three terminals
python client1.py
python client2.py
python server.py
PS: See more cases in the example folder.
-
Discretization
-
Encoding
-
Scaling
-
Transformation
-
Imputation
IterativeImputer
(experimental)KNNImputer
SimpleImputer
- Currently, this library does not support sparse data.
KBinsDiscretizer
,StandardScaler
, andSplineTransformer
cannot set thesample_weight
parameter in their fit methods.IterativeImputer
does not support thesample_posterior
andn_nearest_features
parameters.KNNImputer
does not support custom weight funtion and distance metric.
This project is build on Scikit-learn.