From 1d99ffdbeebe686ec24f4b2689a27f4ce8527a63 Mon Sep 17 00:00:00 2001 From: robbie228 Date: Thu, 14 Nov 2024 19:02:52 +0800 Subject: [PATCH 1/8] 1.add docker-compose deploy 2.update docker-compose deployment on readme 3.delete serving&spark on docker-compose Signed-off-by: robbie228 --- README.md | 3 + deploy/docker-compose/README_zh.md | 411 ++++++++++++++++++ .../docker-deploy/docker_deploy.sh | 272 ++++++++++++ .../docker-deploy/generate_config.sh | 408 +++++++++++++++++ .../docker-deploy/hosts-setup.sh | 43 ++ .../docker-compose/docker-deploy/parties.conf | 47 ++ deploy/docker-compose/docker-deploy/test.sh | 182 ++++++++ .../backends/eggroll/conf/README.md | 6 + .../eggroll/conf/applicationContext-proxy.xml | 45 ++ .../backends/eggroll/conf/eggroll.properties | 70 +++ .../backends/eggroll/conf/log4j2.properties | 108 +++++ .../eggroll/conf/node-extend-env.properties | 0 .../backends/eggroll/conf/route_table.json | 28 ++ .../backends/eggroll/conf/whitelist.json | 245 +++++++++++ .../docker-compose-eggroll.yml | 168 +++++++ .../fate_board/conf/application.properties | 29 ++ .../public/fate_board/conf/ssh.properties | 0 .../fate_flow/conf/pulsar_route_table.yaml | 17 + .../fate_flow/conf/rabbitmq_route_table.yaml | 6 + .../public/fate_flow/conf/service_conf.yaml | 127 ++++++ .../mysql/init/create-eggroll-meta-tables.sql | 205 +++++++++ .../public/osx/conf/broker.properties | 61 +++ .../public/osx/conf/route_table.json | 26 ++ 23 files changed, 2507 insertions(+) create mode 100644 deploy/docker-compose/README_zh.md create mode 100644 deploy/docker-compose/docker-deploy/docker_deploy.sh create mode 100644 deploy/docker-compose/docker-deploy/generate_config.sh create mode 100644 deploy/docker-compose/docker-deploy/hosts-setup.sh create mode 100644 deploy/docker-compose/docker-deploy/parties.conf create mode 100644 deploy/docker-compose/docker-deploy/test.sh create mode 100644 deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/README.md create mode 100644 deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/applicationContext-proxy.xml create mode 100644 deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/eggroll.properties create mode 100644 deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/log4j2.properties create mode 100644 deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/node-extend-env.properties create mode 100644 deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/route_table.json create mode 100644 deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/whitelist.json create mode 100644 deploy/docker-compose/docker-deploy/training_template/docker-compose-eggroll.yml create mode 100644 deploy/docker-compose/docker-deploy/training_template/public/fate_board/conf/application.properties create mode 100644 deploy/docker-compose/docker-deploy/training_template/public/fate_board/conf/ssh.properties create mode 100644 deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/pulsar_route_table.yaml create mode 100644 deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/rabbitmq_route_table.yaml create mode 100644 deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/service_conf.yaml create mode 100644 deploy/docker-compose/docker-deploy/training_template/public/mysql/init/create-eggroll-meta-tables.sql create mode 100644 deploy/docker-compose/docker-deploy/training_template/public/osx/conf/broker.properties create mode 100644 deploy/docker-compose/docker-deploy/training_template/public/osx/conf/route_table.json diff --git a/README.md b/README.md index 50be2d3e70..ee586ca8d6 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,9 @@ FATE can be deployed on a single node or on multiple nodes. Choose the deploymen Deploying FATE to multiple nodes to achieve scalability, reliability and manageability. - [Cluster deployment by CLI](./deploy/cluster-deploy): Using CLI to deploy a FATE cluster. +### Docker-Compose deployment +- [Docker-Compose deployment](./deploy/docker-compose): Using docker-compose to deploy FATE. + ### Quick Start - [Training Demo with Only FATE Installed From Pypi](doc/2.0/fate/ml) - [Training Demo with Both FATE AND FATE-Flow Installed From Pypi](doc/2.0/fate/quick_start.md) diff --git a/deploy/docker-compose/README_zh.md b/deploy/docker-compose/README_zh.md new file mode 100644 index 0000000000..14e6a1b7ca --- /dev/null +++ b/deploy/docker-compose/README_zh.md @@ -0,0 +1,411 @@ +# 使用Docker Compose 部署 FATE + +## 前言 + +[FATE](https://www.fedai.org/ )是一个联邦学习框架,能有效帮助多个机构在满足用户隐私保护、数据安全和政府法规的要求下,进行数据使用和建模。项目地址:() 本文档介绍使用Docker Compose部署FATE集群的方法。 + +## Docker Compose 简介 + +Compose是用于定义和运行多容器Docker应用程序的工具。通过Compose,您可以使用YAML文件来配置应用程序的服务。然后,使用一个命令,就可以从配置中创建并启动所有服务。要了解有关Compose的所有功能的更多信息,请参阅[相关文档](https://docs.docker.com/compose/#features)。 + +使用Docker compose 可以方便的部署FATE,下面是使用步骤。 + +## 目标 + +两个可以互通的FATE实例,每个实例均包括FATE所有组件。 + +## 准备工作 + +1. 两个主机(物理机或者虚拟机,都是Centos7系统); +2. 所有主机安装Docker 版本 : 19.03.0+; +3. 所有主机安装Docker Compose 版本: 1.27.0+; +4. 部署机可以联网,所以主机相互之间可以网络互通; +5. 运行机已经下载FATE的各组件镜像,如果无法连接dockerhub,请考虑使用harbor([Harbor 作为本地镜像源](../registry/README.md))或者使用离线部署(离线构建镜像参考文档[构建镜像]( https://github.com/FederatedAI/FATE-Builder/tree/main/docker-build))。 +6. 运行FATE的主机推荐配置8CPUs和16G RAM。 + +### 下载部署脚本 + +在任意机器上下载合适的KubeFATE版本,可参考 [releases pages](https://github.com/FederatedAI/KubeFATE/releases),然后解压。 + +### 修改镜像配置文件(可选) + +在默认情况下,脚本在部署期间会从 [Docker Hub](https://hub.docker.com/search?q=federatedai&type=image)中下载镜像。 + +对于中国的用户可以用使用国内镜像源: +具体方法是通过编辑docker-deploy目录下的.env文件,给`RegistryURI`参数填入以下字段 + +```bash +RegistryURI=hub.c.163.com +``` + +如果在运行机器上已经下载或导入了所需镜像,部署将会变得非常容易。 + +### 手动下载镜像(可选) + +如果运行机没有FATE组件的镜像,可以通过以下命令从Docker Hub获取镜像。FATE镜像的版本``可在[release页面](https://github.com/FederatedAI/FATE/releases)上查看,其中serving镜像的版本信息在[这个页面](https://github.com/FederatedAI/FATE-Serving/releases): + +```bash +docker pull federatedai/eggroll:3.2.0-release +docker pull federatedai/fateflow:2.2.0-release +docker pull federatedai/osx:2.2.0-release +docker pull federatedai/fateboard:2.1.1-release +docker pull mysql:8.0.28 +``` + +检查所有镜像是否下载成功。 + +```bash +$ docker images +REPOSITORY TAG +federatedai/fateflow 2.2.0-release +federatedai/eggroll 3.2.0-release +federatedai/osx 2.2.0-release +federatedai/fateboard 2.1.1-release +mysql 8.0.28 +``` + +### 离线部署(可选) + +当我们的运行机器处于无法连接外部网络的时候,就无法从Docker Hub下载镜像,建议使用[Harbor](https://goharbor.io/)作为本地镜像仓库。安装Harbor请参考[文档](https://github.com/FederatedAI/KubeFATE/blob/master/registry/install_harbor.md)。在`.env`文件中,将`RegistryURI`变量更改为Harbor的IP。如下面 192.168.10.1是Harbor IP的示例。 + +```bash +$ cd KubeFATE/ +$ vi .env + +... +RegistryURI=192.168.10.1/federatedai +... +``` + +## 用Docker Compose部署FATE + + ***如果在之前你已经部署过其他版本的FATE,请删除清理之后再部署新的版本,[删除部署](#删除部署).*** + +### 配置需要部署的实例数目 + +部署脚本提供了部署多个FATE实例的功能,下面的例子我们部署在两个机器上,每个机器运行一个FATE实例,这里两台机器的IP分别为*192.168.7.1*和*192.168.7.2* + +根据需求修改配置文件`kubeFATE\docker-deploy\parties.conf`。 + +`parties.conf`配置文件配置项的含义查看这个文档[parties.conf文件介绍](../docs/configurations/Docker_compose_Partys_configuration.md) + +下面是修改好的文件,`party 10000`的集群将部署在*192.168.7.1*上,而`party 9999`的集群将部署在*192.168.7.2*上。 + +```bash +user=fate +dir=/data/projects/fate +party_list=(10000 9999) +party_ip_list=(192.168.7.1 192.168.7.2) + +# Engines: +# Computing : Eggroll, Spark, Spark_local +computing=Eggroll +# Federation: OSX(computing: Eggroll/Spark/Spark_local), Pulsar/RabbitMQ(computing: Spark/Spark_local) +federation=OSX +# Storage: Eggroll(computing: Eggroll), HDFS(computing: Spark), LocalFS(computing: Spark_local) +storage=Eggroll +# Algorithm: Basic, NN, ALL +algorithm=Basic +# Device: CPU, IPCL, GPU +device=CPU + +# spark and eggroll +compute_core=16 + +# You only need to configure this parameter when you want to use the GPU, the default value is 1 +gpu_count=0 + +# modify if you are going to use an external db +mysql_ip=mysql +mysql_user=fate +mysql_password=fate_dev +mysql_db=fate_flow +serverTimezone=UTC + +name_node=hdfs://namenode:9000 + +# Define fateboard login information +fateboard_username=admin +fateboard_password=admin + +``` + +使用Docker-compose部署FATE可以支持多种种不同的类型引擎的组合(对computing federation storage的选择),关于不同类型的FATE的更多细节查看: [不同类型FATE的架构介绍](../docs/Introduction_to_Engine_Architecture_zh.md)。 + +`algorithm`和`device`的配置可以查看这里[FATE_Algorithm_and_Computational_Acceleration_Selection.md](../docs/FATE_Algorithm_and_Computational_Acceleration_Selection.md) + +**注意**: 默认情况下不会部署exchange组件。如需部署,用户可以把服务器IP填入上述配置文件的`exchangeip`中,该组件的默认监听端口为9371。 + +在运行部署脚本之前,需要确保部署机器可以ssh免密登录到两个运行节点主机上。user代表免密的用户。 + +在运行FATE的主机上,user是非root用户的,需要有`/data/projects/fate`文件夹权限和docker权限。如果是root用户则不需要任何其他操作。 + +```bash +# 创建一个组为docker的fate用户 +[user@localhost]$ sudo useradd -s /bin/bash -g docker -d /home/fate fate +# 设置用户密码 +[user@localhost]$ sudo passwd fate +# 创建docker-compose部署目录 +[user@localhost]$ sudo mkdir -p /data/projects/fate /home/fate +# 修改docker-compose部署目录对应用户和组 +[user@localhost]$ sudo chown -R fate:docker /data/projects/fate /home/fate +# 选择用户 +[user@localhost]$ sudo su fate +# 查看是否拥有docker权限 +[fate@localhost]$ docker ps +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +# 查看docker-compose部署目录 +[fate@localhost]$ ls -l /data/projects/ +total 0 +drwxr-xr-x. 2 fate docker 6 May 27 00:51 fate +``` + +### 执行部署脚本 + +**注意:**在运行以下命令之前,所有目标主机必须 + +* 允许使用 SSH 密钥进行无密码 SSH 访问(否则我们将需要为每个主机多次输入密码)。 +* 满足 [准备工作](#准备工作) 中指定的要求。 + +要将 FATE 部署到所有已配置的目标主机,请使用以下命令: + +以下修改可在任意机器执行。 + +进入目录`kubeFATE\docker-deploy`,然后运行: + +```bash +bash ./generate_config.sh # 生成部署文件 +``` + +脚本将会生成10000、9999两个组织(Party)的部署文件,然后打包成tar文件。接着把tar文件`confs-.tar`复制到party对应的主机上并解包,解包后的文件默认在`/data/projects/fate`目录下。然后脚本将远程登录到这些主机并使用docker compose命令启动FATE实例。 + +默认情况下,脚本会同时启动训练和服务集群。 如果您需要单独启动它们,请将 `--training` 添加到 `docker_deploy.sh` 中,如下所示。 + +(可选)要部署各方训练集群,请使用以下命令: + +```bash +bash ./docker_deploy.sh all --training +``` + +(可选)要将 FATE 部署到单个目标主机,请使用以下命令和参与方的 ID(下例中为 10000): + +```bash +bash ./docker_deploy.sh 10000 +``` + +命令完成后,登录到任何主机并使用 `docker compose ps` 来验证集群的状态。 示例输出如下: + +```bash +ssh fate@192.168.7.1 +``` + +使用以下命令验证实例状态, + +```bash +cd /data/projects/fate/confs-10000 +docker-compose ps +``` + +输出显示如下,若各个组件状态都是`Up`状态,并且fateflow的状态还是(healthy),说明部署成功。 + +```bash +NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS +confs-10000-clustermanager-1 federatedai/eggroll:3.2.0-release "/tini -- bash -c 'j…" clustermanager About a minute ago Up About a minute 4670/tcp +confs-10000-fateflow-1 federatedai/fateflow:2.2.0-release "/bin/bash -c 'set -…" fateflow About a minute ago Up About a minute (healthy) 192.168.7.1:9360->9360/tcp, :::9360->9360/tcp, 192.168.7.1:9380->9380/tcp, :::9380->9380/tcp +confs-10000-mysql-1 mysql:8.0.28 "docker-entrypoint.s…" mysql About a minute ago Up About a minute 3306/tcp, 33060/tcp +confs-10000-nodemanager-1 federatedai/eggroll:3.2.0-release "/tini -- bash -c 'j…" nodemanager About a minute ago Up About a minute 4671/tcp +confs-10000-osx-1 federatedai/osx:2.2.0-release "/tini -- bash -c 'j…" osx About a minute ago Up About a minute 192.168.7.1:9370->9370/tcp, :::9370->9370/tcp +confs-10000-fateboard-1 federatedai/fateboard:2.1.1-release "sh -c 'java -Dsprin…" fateboard About a minute ago Up About a minute 192.168.7.1:8080->8080/tcp +``` + +### 验证部署 + +docker-compose上的FATE启动成功之后需要验证各个服务是否都正常运行,我们可以通过验证toy_example示例来检测。 + +选择192.168.7.1这个节点验证,使用以下命令验证: + +```bash +# 在192.168.7.1上执行下列命令 + +# 进入fateflow组件容器内部 +$ docker-compose exec fateflow bash +# toy 验证 +$ flow test toy --guest-party-id 10000 --host-party-id 9999 +``` + +如果测试通过,屏幕将显示类似如下消息: + +```bash +toy test job xxxxx is success +``` + +### 上传数据,发起任务 + +#### Host方操作 + +##### 进入party10000 fateflow容器 + +```bash +cd /data/projects/fate/confs-10000 +docker-compose exec fateflow bash +``` + +##### 上传host数据 +执行python脚本,上传数据 +```bash +# 上传数据(单边的, 双边需要在另一方再次执行) +from fate_client.pipeline import FateFlowPipeline + +guest_data_path="/data/projects/fate/examples/data/breast_hetero_guest.csv" +host_data_path="/data/projects/fate/examples/data/breast_hetero_host.csv" + +data_pipeline = FateFlowPipeline().set_parties(local="0") +guest_meta = { + "delimiter": ",", "dtype": "float64", "label_type": "int64","label_name": "y", "match_id_name": "id" + } +host_meta = { + "delimiter": ",", "input_format": "dense", "match_id_name": "id" + } +data_pipeline.transform_local_file_to_dataframe(file=guest_data_path, namespace="experiment", name="breast_hetero_guest", + meta=guest_meta, head=True, extend_sid=True) +data_pipeline.transform_local_file_to_dataframe(file=host_data_path, namespace="experiment", name="breast_hetero_host", + meta=host_meta, head=True, extend_sid=True) +``` + +#### Guest方操作 + +##### 进入party9999 fateflow容器 + +```bash +cd /data/projects/fate/confs-9999 +docker-compose exec fateflow bash +``` + +##### 上传guest数据 +执行python脚本,上传数据 +```bash +# 上传数据(单边的, 双边需要在另一方再次执行) +from fate_client.pipeline import FateFlowPipeline + +guest_data_path="/data/projects/fate/examples/data/breast_hetero_guest.csv" +host_data_path="/data/projects/fate/examples/data/breast_hetero_host.csv" + +data_pipeline = FateFlowPipeline().set_parties(local="0") +guest_meta = { + "delimiter": ",", "dtype": "float64", "label_type": "int64","label_name": "y", "match_id_name": "id" + } +host_meta = { + "delimiter": ",", "input_format": "dense", "match_id_name": "id" + } +data_pipeline.transform_local_file_to_dataframe(file=guest_data_path, namespace="experiment", name="breast_hetero_guest", + meta=guest_meta, head=True, extend_sid=True) +data_pipeline.transform_local_file_to_dataframe(file=host_data_path, namespace="experiment", name="breast_hetero_host", + meta=host_meta, head=True, extend_sid=True) +``` + +##### 提交任务 +执行python脚本,发起任务 +```bash +# 发起任务 +from fate_client.pipeline.components.fate import ( + HeteroSecureBoost, + Reader, + PSI, + Evaluation + ) +from fate_client.pipeline import FateFlowPipeline + + +# create pipeline for training +pipeline = FateFlowPipeline().set_parties(guest="9999", host="10000") + +# create reader task_desc +reader_0 = Reader("reader_0") +reader_0.guest.task_parameters(namespace="experiment", name="breast_hetero_guest") +reader_0.hosts[0].task_parameters(namespace="experiment", name="breast_hetero_host") + +# create psi component_desc +psi_0 = PSI("psi_0", input_data=reader_0.outputs["output_data"]) + +# create hetero secure_boost component_desc +hetero_secureboost_0 = HeteroSecureBoost( + "hetero_secureboost_0", num_trees=1, max_depth=5, + train_data=psi_0.outputs["output_data"], + validate_data=psi_0.outputs["output_data"] + ) + +# create evaluation component_desc +evaluation_0 = Evaluation( + 'evaluation_0', runtime_parties=dict(guest="9999"), metrics=["auc"], input_datas=[hetero_secureboost_0.outputs["train_output_data"]] + ) + +# add training task +pipeline.add_tasks([reader_0, psi_0, hetero_secureboost_0, evaluation_0]) + +# compile and train +pipeline.compile() +pipeline.fit() + +# print metric and model info +print (pipeline.get_task_info("hetero_secureboost_0").get_output_model()) +print (pipeline.get_task_info("evaluation_0").get_output_metric()) + +# deploy task for inference +pipeline.deploy([psi_0, hetero_secureboost_0]) + +# create pipeline for predicting +predict_pipeline = FateFlowPipeline() + +# add input to deployed_pipeline +deployed_pipeline = pipeline.get_deployed_pipeline() +reader_1 = Reader("reader_1") +reader_1.guest.task_parameters(namespace="experiment", name="breast_hetero_guest") +reader_1.hosts[0].task_parameters(namespace="experiment", name="breast_hetero_host") +deployed_pipeline.psi_0.input_data = reader_1.outputs["output_data"] + +# add task to predict pipeline +predict_pipeline.add_tasks([reader_1, deployed_pipeline]) + +# compile and predict +predict_pipeline.compile() +predict_pipeline.predict() +``` + + +任务成功后,屏幕将显示下方类似结果 +output: + +```bash +Job is success!!! Job id is 202404031636558952240, response_data={'apply_resource_time': 1712133417129, 'cores': 4, 'create_time': 1712133415928, 'dag': {'dag': {'conf': {'auto_retries': 0, 'computing_partitions': 8, 'cores': None, 'extra': None, 'inheritance': None, 'initiator_party_id': '9999', 'model_id': '202404031636558952240', 'model_version': '0', 'model_warehouse': {'model_id': '202404031635272687860', 'model_version': '0'}, 'priority': None, 'scheduler_party_id': '9999', 'sync_type': 'callback', 'task': None}, 'parties': [{'party_id': ['9999'], 'role': 'guest'}, {'party_id': ['10000'], 'role': 'host'}], 'party_tasks': {'guest_9999': {'conf': {}, 'parties': [{'party_id': ['9999'], 'role': 'guest'}], 'tasks': {'reader_1': {'conf': None, 'parameters': {'name': 'breast_hetero_guest', 'namespace': 'experiment'}}}}, 'host_10000': {'conf': {}, 'parties': [{'party_id': ['10000'], 'role': 'host'}], 'tasks': {'reader_1': {'conf': None, 'parameters': {'name': 'breast_hetero_host', 'namespace': 'experiment'}}}}}, 'stage': 'predict', 'tasks': {'hetero_secureboost_0': {'component_ref': 'hetero_secureboost', 'conf': None, 'dependent_tasks': ['psi_0'], 'inputs': {'data': {'test_data': {'task_output_artifact': [{'output_artifact_key': 'output_data', 'output_artifact_type_alias': None, 'parties': [{'party_id': ['9999'], 'role': 'guest'}, {'party_id': ['10000'], 'role': 'host'}], 'producer_task': 'psi_0'}]}}, 'model': {'input_model': {'model_warehouse': {'output_artifact_key': 'output_model', 'output_artifact_type_alias': None, 'parties': [{'party_id': ['9999'], 'role': 'guest'}, {'party_id': ['10000'], 'role': 'host'}], 'producer_task': 'hetero_secureboost_0'}}}}, 'outputs': None, 'parameters': {'gh_pack': True, 'goss': False, 'goss_start_iter': 0, 'hist_sub': True, 'l1': 0, 'l2': 0.1, 'learning_rate': 0.3, 'max_bin': 32, 'max_depth': 5, 'min_child_weight': 1, 'min_impurity_split': 0.01, 'min_leaf_node': 1, 'min_sample_split': 2, 'num_class': 2, 'num_trees': 1, 'objective': 'binary:bce', 'other_rate': 0.1, 'split_info_pack': True, 'top_rate': 0.2}, 'parties': None, 'stage': None}, 'psi_0': {'component_ref': 'psi', 'conf': None, 'dependent_tasks': ['reader_1'], 'inputs': {'data': {'input_data': {'task_output_artifact': {'output_artifact_key': 'output_data', 'output_artifact_type_alias': None, 'parties': [{'party_id': ['9999'], 'role': 'guest'}, {'party_id': ['10000'], 'role': 'host'}], 'producer_task': 'reader_1'}}}, 'model': None}, 'outputs': None, 'parameters': {}, 'parties': None, 'stage': 'default'}, 'reader_1': {'component_ref': 'reader', 'conf': None, 'dependent_tasks': None, 'inputs': None, 'outputs': None, 'parameters': {}, 'parties': None, 'stage': 'default'}}}, 'kind': 'fate', 'schema_version': '2.1.0'}, 'description': '', 'elapsed': 62958, 'end_time': 1712133480145, 'engine_name': 'eggroll', 'flow_id': '', 'inheritance': {}, 'initiator_party_id': '9999', 'job_id': '202404031636558952240', 'memory': 0, 'model_id': '202404031636558952240', 'model_version': '0', 'parties': [{'party_id': ['9999'], 'role': 'guest'}, {'party_id': ['10000'], 'role': 'host'}], 'party_id': '9999', 'progress': 100, 'protocol': 'fate', 'remaining_cores': 4, 'remaining_memory': 0, 'resource_in_use': False, 'return_resource_time': 1712133480016, 'role': 'guest', 'scheduler_party_id': '9999', 'start_time': 1712133417187, 'status': 'success', 'status_code': None, 'tag': 'job_end', 'update_time': 1712133480145, 'user_name': ''} +Total time: 0:01:04 +``` + +### 删除部署 + +在部署机器上运行以下命令可以停止所有FATE集群: + +```bash +bash ./docker_deploy.sh --delete all +``` + +如果想要彻底删除在运行机器上部署的FATE,可以分别登录节点,然后运行命令: + +```bash +cd /data/projects/fate/confs-/ # 组织的id,本例中代表10000或者9999 +docker-compose down +rm -rf ../confs-/ # 删除docker-compose部署文件 +``` + +### 可能遇到的问题 + +#### 采用docker hub下载镜像速度可能较慢 + +解决办法:可以自己构建镜像,自己构建镜像参考[这里](https://github.com/FederatedAI/FATE/tree/master/docker-build)。 + +#### 运行脚本`./docker_deploy.sh all`的时候提示需要输入密码 + +解决办法:检查免密登陆是否正常。ps:直接输入对应主机的用户密码也可以继续运行。 + +#### CPU指令集问题 + +解决办法:查看[wiki](https://github.com/FederatedAI/KubeFATE/wiki/KubeFATE)页面的storage-service部分。 diff --git a/deploy/docker-compose/docker-deploy/docker_deploy.sh b/deploy/docker-compose/docker-deploy/docker_deploy.sh new file mode 100644 index 0000000000..dbc8731182 --- /dev/null +++ b/deploy/docker-compose/docker-deploy/docker_deploy.sh @@ -0,0 +1,272 @@ +#!/bin/bash + +# Copyright 2019-2022 VMware, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# you may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +set -e +BASEDIR=$(dirname "$0") +cd $BASEDIR +WORKINGDIR=$(pwd) + +# fetch fate-python image +source ${WORKINGDIR}/.env +source ${WORKINGDIR}/parties.conf +cd ${WORKINGDIR} + +Deploy() { + if [ "$1" = "" ]; then + echo "No party id was provided, please check your arguments " + exit 1 + fi + + while [ "$1" != "" ]; do + case $1 in + splitting_proxy) + shift + DeployPartyInternal $@ + break + ;; + all) + for party in ${party_list[*]}; do + if [ "$2" != "" ]; then + case $2 in + --training) + DeployPartyInternal $party + if [ "${exchangeip}" != "" ]; then + DeployPartyInternal exchange + fi + ;; + esac + else + DeployPartyInternal $party + if [ "${exchangeip}" != "" ]; then + DeployPartyInternal exchange + fi + fi + done + break + ;; + *) + if [ "$2" != "" ]; then + case $2 in + --training) + DeployPartyInternal $1 + break + ;; + esac + else + DeployPartyInternal $1 + fi + ;; + esac + shift + + done +} + +Delete() { + if [ "$1" = "" ]; then + echo "No party id was provided, please check your arguments " + exit 1 + fi + + while [ "$1" != "" ]; do + case $1 in + all) + for party in ${party_list[*]}; do + if [ "$2" != "" ]; then + DeleteCluster $party $2 + else + DeleteCluster $party + fi + done + if [ "${exchangeip}" != "" ]; then + DeleteCluster exchange + fi + break + ;; + *) + DeleteCluster $@ + break + ;; + esac + done +} + +DeployPartyInternal() { + target_party_id=$1 + # should not use localhost at any case + target_party_ip="127.0.0.1" + + # check configuration files + if [ ! -d ${WORKINGDIR}/outputs ]; then + echo "Unable to find outputs dir, please generate config files first." + return 1 + fi + if [ ! -f ${WORKINGDIR}/outputs/confs-${target_party_id}.tar ]; then + echo "Unable to find deployment file of training for party $target_party_id, please generate it first." + return 0 + fi + # extract the ip address of the target party + if [ "$target_party_id" = "exchange" ]; then + target_party_ip=${exchangeip} + elif [ "$2" != "" ]; then + target_party_ip="$2" + else + for ((i = 0; i < ${#party_list[*]}; i++)); do + if [ "${party_list[$i]}" = "$target_party_id" ]; then + target_party_ip=${party_ip_list[$i]} + fi + done + fi + # verify the target_party_ip + if [ "$target_party_ip" = "127.0.0.1" ]; then + echo "Unable to find Party: $target_party_id, please check you input." + return 1 + fi + if [ "$3" != "" ]; then + user=$3 + fi + + handleLocally confs + if [ "$local_flag" == "true" ]; then + return 0 + fi + + scp -P ${SSH_PORT} ${WORKINGDIR}/outputs/confs-$target_party_id.tar $user@$target_party_ip:~/ + #rm -f ${WORKINGDIR}/outputs/confs-$target_party_id.tar + echo "$target_party_ip training cluster copy is ok!" + ssh -p ${SSH_PORT} -tt $user@$target_party_ip << eeooff +mkdir -p $dir +rm -f $dir/confs-$target_party_id.tar +mv ~/confs-$target_party_id.tar $dir +cd $dir +tar -xzf confs-$target_party_id.tar +cd confs-$target_party_id +docker-compose down +docker volume rm -f confs-${target_party_id}_shared_dir_examples +docker volume rm -f confs-${target_party_id}_shared_dir_fate +docker volume rm -f confs-${target_party_id}_sdownload_dir +docker volume rm -f confs-${target_party_id}_fate_flow_logs + +docker-compose up -d +cd ../ +rm -f confs-${target_party_id}.tar +exit +eeooff + echo "party ${target_party_id} deploy is ok!" +} + +DeleteCluster() { + target_party_id=$1 + cluster_type=$2 + target_party_serving_ip="127.0.0.1" + target_party_ip="127.0.0.1" + + # extract the ip address of the target party + if [ "$target_party_id" == "exchange" ]; then + target_party_ip=${exchangeip} + else + for ((i = 0; i < ${#party_list[*]}; i++)); do + if [ "${party_list[$i]}" = "$target_party_id" ]; then + target_party_ip=${party_ip_list[$i]} + fi + done + fi + + # echo "target_party_ip: $target_party_ip" + + for ((i = 0; i < ${#party_list[*]}; i++)); do + if [ "${party_list[$i]}" = "$target_party_id" ]; then + target_party_serving_ip=${serving_ip_list[$i]} + fi + done + + # echo "target_party_ip: $target_party_ip" + # echo "cluster_type: $cluster_type" + + # delete training cluster + if [ "$cluster_type" == "--training" ]; then + ssh -p ${SSH_PORT} -tt $user@$target_party_ip <#jdbc:mysql://${db_ip}:3306/${db_name}?useSSL=false\&serverTimezone=${db_serverTimezone}\&characterEncoding=utf8\&allowPublicKeyRetrieval=true#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + sed -i "s##${db_user}#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + sed -i "s##${db_password}#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + + #clustermanager & nodemanager + sed -i "s##${clustermanager_ip}#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + sed -i "s##${clustermanager_port}#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + sed -i "s##${nodemanager_ip}#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + sed -i "s##${nodemanager_port}#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + sed -i "s##${party_id}#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + + #pythonpath, very import, do not modify." + sed -i "s##/data/projects/fate/python:/data/projects/fate/eggroll/python#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + + #javahome + sed -i "s##/usr/lib/jvm/java-1.8.0-openjdk#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + sed -i "s##conf/:lib/*#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + + sed -i "s##${proxy_ip}#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + sed -i "s##${proxy_port}#g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + fi + + cp ${WORKINGDIR}/.env ./confs-$party_id + echo "NOTEBOOK_HASHED_PASSWORD=${notebook_hashed_password}" >> ./confs-$party_id/.env + + # Modify the configuration file + + # federatedai/fateflow-${computing}-${algorithm}-${device}:${version} + + # eggroll or spark-worker + if [ "$computing" == "Eggroll" ]; then + sed -i "s#image: \"\${FATEFlow_IMAGE}:\${FATEFlow_IMAGE_TAG}\"#image: \"\${FATEFlow_IMAGE}${Suffix}:\${FATEFlow_IMAGE_TAG}\"#g" ./confs-"$party_id"/docker-compose.yml + sed -i "s#image: \"\${EGGRoll_IMAGE}:\${EGGRoll_IMAGE_TAG}\"#image: \"\${EGGRoll_IMAGE}${Suffix}:\${EGGRoll_IMAGE_TAG}\"#g" ./confs-"$party_id"/docker-compose.yml + fi + + # GPU + if [ "$device" == "GPU" ]; then + line=0 # line refers to the line number of the fateflow `command` line in docker-compose.yaml + if [ "$computing" == "Eggroll" ]; then + line=141 + fi + sed -i "${line}i\\ + deploy:\\ + resources:\\ + reservations:\\ + devices:\\ + - driver: nvidia\\ + count: $gpu_count\\ + capabilities: [gpu]" ./confs-"$party_id"/docker-compose.yml + fi + # RegistryURI + if [ "$RegistryURI" != "" ]; then + + if [ "${RegistryURI: -1}" != "/" ]; then + RegistryURI="${RegistryURI}/" + fi + + sed -i "s#RegistryURI=.*#RegistryURI=${RegistryURI}/#g" ./confs-"$party_id"/.env + fi + + # replace namenode in training_template/public/fate_flow/conf/service_conf.yaml + if [ "$name_node" != "" ]; then + sed -i "s#name_node: hdfs://namenode:9000#name_node: ${name_node}#g" ./confs-$party_id/confs/fate_flow/conf/service_conf.yaml + fi + + # update serving ip + # sed -i "s/fate-serving/${serving_ip}/g" ./confs-"$party_id"/docker-compose.yml + + # update the path of shared_dir + shared_dir="confs-${party_id}/shared_dir" + + # create directories + for value in "examples" "fate" "data"; do + mkdir -p "${shared_dir}"/${value} + done + + sed -i "s||${dir}/${shared_dir}|g" ./confs-"$party_id"/docker-compose.yml + + # Start the general config rendering + # fateboard + sed -i "s#^server.port=.*#server.port=${fateboard_port}#g" ./confs-"$party_id"/confs/fate_board/conf/application.properties + sed -i "s#^fateflow.url=.*#fateflow.url=http://${fate_flow_ip}:${fate_flow_http_port}#g" ./confs-"$party_id"/confs/fate_board/conf/application.properties + sed -i "s#^server.board.login.username=.*#server.board.login.username=${fateboard_username}#g" ./confs-"$party_id"/confs/fate_board/conf/application.properties + sed -i "s#^server.board.login.password=.*#server.board.login.password=${fateboard_password}#g" ./confs-"$party_id"/confs/fate_board/conf/application.properties + + echo fateboard module of "$party_id" done! + + # mysql + + { + echo "CREATE DATABASE IF NOT EXISTS ${db_name};" + echo "CREATE DATABASE IF NOT EXISTS fate_flow;" + echo "CREATE USER '${db_user}'@'%' IDENTIFIED BY '${db_password}';" + echo "GRANT ALL ON *.* TO '${db_user}'@'%';" + } >> ./confs-"$party_id"/confs/mysql/init/insert-node.sql + + if [[ "$computing" == "Eggroll" ]]; then + echo 'USE `'${db_name}'`;' >>./confs-$party_id/confs/mysql/init/insert-node.sql + echo "show tables;" >>./confs-$party_id/confs/mysql/init/insert-node.sql + sed -i "s/eggroll_meta/${db_name}/g" ./confs-$party_id/confs/mysql/init/create-eggroll-meta-tables.sql + else + rm -f ./confs-$party_id/confs/mysql/init/create-eggroll-meta-tables.sql + fi + echo mysql module of $party_id done! + + # fate_flow + sed -i "s/party_id: .*/party_id: \"${party_id}\"/g" ./confs-$party_id/confs/fate_flow/conf/service_conf.yaml + sed -i "s/name: /name: '${db_name}'/g" ./confs-$party_id/confs/fate_flow/conf/service_conf.yaml + sed -i "s/user: /user: '${db_user}'/g" ./confs-$party_id/confs/fate_flow/conf/service_conf.yaml + sed -i "s/passwd: /passwd: '${db_password}'/g" ./confs-$party_id/confs/fate_flow/conf/service_conf.yaml + sed -i "s/host: /host: '${db_ip}'/g" ./confs-$party_id/confs/fate_flow/conf/service_conf.yaml + sed -i "s/127.0.0.1:8000/${serving_ip}:8000/g" ./confs-$party_id/confs/fate_flow/conf/service_conf.yaml + + # if [[ "$computing" == "STANDALONE" ]] ; then + # sed -i "s#spark.master .*#spark.master local[*]#g" ./confs-$party_id/confs/spark/spark-defaults.conf + # fi + + # compute_core + sed -i "s/nodes: .*/nodes: 1/g" ./confs-$party_id/confs/fate_flow/conf/service_conf.yaml + sed -i "s/cores_per_node: .*/cores_per_node: $compute_core/g" ./confs-$party_id/confs/fate_flow/conf/service_conf.yaml + + if [[ "$computing" == "Eggroll" ]]; then + sed -i "s/eggroll.session.processors.per.node=.*/eggroll.session.processors.per.node=$compute_core/g" ./confs-$party_id/confs/eggroll/conf/eggroll.properties + fi + echo fate_flow module of $party_id done! + + # federation config + # OSX + sed -i "s/self.party=9999/self.party=${party_id}/g" ./confs-$party_id/confs/osx/conf/broker.properties + if [[ "$federation" == "OSX" ]]; then + cat >./confs-$party_id/confs/osx/conf/route_table.json <#${proxy_ip}#g" ./confs-exchange/conf/eggroll/eggroll.properties + sed -i "s##${proxy_port}#g" ./confs-exchange/conf/eggroll/eggroll.properties + sed -i "s##exchange#g" ./confs-exchange/conf/eggroll/eggroll.properties + sed -i "s/coordinator=.*/coordinator=exchange/g" ./confs-exchange/conf/eggroll/eggroll.properties + sed -i "s/ip=.*/ip=0.0.0.0/g" ./confs-exchange/conf/eggroll/eggroll.properties + cat >./confs-exchange/conf/osx/broker/route_table.json < /dev/null + +#update and install the latest version +sudo apt-get update +sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin diff --git a/deploy/docker-compose/docker-deploy/parties.conf b/deploy/docker-compose/docker-deploy/parties.conf new file mode 100644 index 0000000000..fba3485b3b --- /dev/null +++ b/deploy/docker-compose/docker-deploy/parties.conf @@ -0,0 +1,47 @@ +#!/bin/bash + +user=fate +dir=/data/projects/fate +party_list=(10000 9999) +party_ip_list=(192.168.1.1 192.168.1.2) + +# Engines: +# Computing : Eggroll, Spark, Spark_local +computing=Eggroll +# Federation: OSX(computing: Eggroll/Spark/Spark_local), Pulsar/RabbitMQ(computing: Spark/Spark_local) +federation=OSX +# Storage: Eggroll(computing: Eggroll), HDFS(computing: Spark), LocalFS(computing: Spark_local) +storage=Eggroll +# Algorithm: Basic, NN, ALL +algorithm=Basic +# Device: CPU, IPCL, GPU +device=CPU + +# spark and eggroll +compute_core=8 + +# You only need to configure this parameter when you want to use the GPU, the default value is 1 +gpu_count=1 + +# default +#exchangeip= + +# modify if you are going to use an external db +mysql_ip=mysql +mysql_user=fate +mysql_password=fate_dev +mysql_db=fate_flow +serverTimezone=UTC + +name_node=hdfs://namenode:9000 + +# Define fateboard login information +fateboard_username=admin +fateboard_password=admin + +# Define serving admin login information +serving_admin_username=admin +serving_admin_password=admin + +# Define notebook login information +notebook_hashed_password= diff --git a/deploy/docker-compose/docker-deploy/test.sh b/deploy/docker-compose/docker-deploy/test.sh new file mode 100644 index 0000000000..c46a260690 --- /dev/null +++ b/deploy/docker-compose/docker-deploy/test.sh @@ -0,0 +1,182 @@ +#!/bin/bash + +# Copyright 2019-2020 VMware, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# you may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +BASEDIR=$(dirname "$0") +cd $BASEDIR +WORKINGDIR=$(pwd) + +# fetch fate-python image +source ${WORKINGDIR}/.env +source ${WORKINGDIR}/parties.conf + +cd ${WORKINGDIR} + +get_party_ip(){ + target_party_id=$1 + for ((i = 0; i < ${#partylist[*]}; i++)); do + if [ "${partylist[$i]}" = "$target_party_id" ]; then + target_party_ip=${partyiplist[$i]} + fi + done + return $target_party_ip +} + +Test() { + + while [ "$1" != "" ]; do + case $1 in + toy_example) + shift + if [ "$1" = "" ] || [ "$2" = "" ]; then + echo "No party id was provided, please check your arguments " + echo "Example: " + echo " 'bash test.sh toy_example 9999 10000'" + exit 1 + fi + toy_example $@ + break + ;; + min_test_task) + shift + min_test_task $@ + break + ;; + serving) + shift + serving $@ + break + ;; + esac + shift + done + +} + +toy_example() { + echo "start test toy_example" + guest=$1 + host=$2 + echo "guest_id: "$guest + echo "host_id: "$host + + target_party_id=$1 + echo "target_party_id: "$target_party_id + for ((i = 0; i < ${#party_ip_list[*]}; i++)); do + if [ "${party_list[$i]}" = "$target_party_id" ]; then + target_party_ip=${party_ip_list[$i]} + fi + done + echo "*********************start docker log***************************" + echo $user@$target_party_ip + ssh -tt $user@$target_party_ip < + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/eggroll.properties b/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/eggroll.properties new file mode 100644 index 0000000000..c762c009df --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/eggroll.properties @@ -0,0 +1,70 @@ +# +# Copyright (c) 2019 - now, Eggroll Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#h2 + +[eggroll] +# core +eggroll.resourcemanager.nodemanager.net.device=eth0 +eggroll.resourcemanager.nodemanager.gpu.num.shell=nvidia.sh +#eggroll.resourcemanager.clustermanager.jdbc.driver.class.name=org.h2.Driver +eggroll.resourcemanager.clustermanager.jdbc.driver.class.name=com.mysql.cj.jdbc.Driver +#eggroll.resourcemanager.clustermanager.jdbc.url=jdbc:h2:./data/meta_h2/eggroll_meta.h2;AUTO_SERVER=TRUE;MODE=MySQL;DATABASE_TO_LOWER=TRUE;SCHEMA=eggroll_meta; +eggroll.resourcemanager.clustermanager.jdbc.url= +eggroll.resourcemanager.clustermanager.jdbc.username= +eggroll.resourcemanager.clustermanager.jdbc.password= + +eggroll.resourcemanager.clustermanager.host= +eggroll.resourcemanager.clustermanager.port= +eggroll.resourcemanager.nodemanager.host= +eggroll.resourcemanager.nodemanager.port= +eggroll.resourcemanager.process.tag= + +# dashboard +eggroll.dashboard.server.port=8083 +eggroll.security.session.expired.time=30 +eggroll.security.login.username=admin +eggroll.security.login.password=admin +eggroll.security.encrypt.private_key= +eggroll.security.encrypt.enable=false + +eggroll.data.dir=/data/projects/fate/eggroll/data/ +eggroll.logs.dir=/data/projects/fate/eggroll/logs/ + +eggroll.bootstrap.root.script=bin/eggroll_boot.sh + +eggroll.resourcemanager.bootstrap.egg_pair.exepath=bin/roll_pair/egg_pair_bootstrap.sh +eggroll.resourcemanager.bootstrap.egg_pair.venv= +eggroll.resourcemanager.bootstrap.egg_pair.pythonpath=python +eggroll.resourcemanager.bootstrap.egg_pair.filepath=python/eggroll/computing/egg_pair/egg_pair.py +eggroll.resourcemanager.bootstrap.egg_pair.ld_library_path= + +# session +eggroll.session.processors.per.node=4 + +# deepspeed +## where deepspeed containers locate, required for deepspeed +#eggroll.resourcemanager.nodemanager.containers.data.dir= +## which python exec that deepspeed container used, fallback to eggpair venv/bin/python +#eggroll.container.python.exec= +## provide by submit option for now +#eggroll.container.deepspeed.script.path= +eggroll.container.deepspeed.distributed.backend=nccl +## defaults to cluster manager endpoint +#eggroll.container.deepspeed.distributed.store.host= +#eggroll.container.deepspeed.distributed.store.port= + + + + diff --git a/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/log4j2.properties b/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/log4j2.properties new file mode 100644 index 0000000000..03b1bdfd1b --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/log4j2.properties @@ -0,0 +1,108 @@ +# +# Copyright (c) 2019 - now, Eggroll Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +name=PropertiesConfig +property.logDir=${env:EGGROLL_LOG_DIR:-logs/}/${env:EGGROLL_SESSION_ID:-eggroll} +property.logFile=${env:EGGROLL_LOG_FILE:-eggroll} +property.logPattern=[%-5level][%r][%d{yyyy-MM-dd} %d{HH:mm:ss,SSS}][%t,pid:%pid,tid:%T][%c{1.}:%L] - %msg%n +# console +appender.console.type=Console +appender.console.name=STDOUT +appender.console.layout.type=PatternLayout +appender.console.layout.pattern=${logPattern} +# default file +appender.file.type=RollingFile +appender.file.name=LOGFILE +appender.file.fileName=${logDir}/${logFile}.jvm.log +appender.file.filePattern=${logDir}/%d{yyyy}/%d{MM}/%d{dd}/${logFile}.jvm.log.%d{yyyy-MM-dd-HH} +appender.file.layout.type=PatternLayout +appender.file.layout.pattern=${logPattern} +appender.file.policies.type=Policies +appender.file.policies.time.type=TimeBasedTriggeringPolicy +appender.file.policies.time.interval=1 +appender.file.policies.time.modulate=true +appender.file.strategy.type=DefaultRolloverStrategy +# error file +appender.errorlog.type=RollingFile +appender.errorlog.name=ERRORLOG +appender.errorlog.fileName=${logDir}/${logFile}.jvm.err.log +appender.errorlog.filePattern=${logDir}/%d{yyyy}/%d{MM}/%d{dd}/${logFile}.jvm.err.log.%d{yyyy-MM-dd-HH} +appender.errorlog.layout.type=PatternLayout +appender.errorlog.layout.pattern=${logPattern} +appender.errorlog.policies.type=Policies +appender.errorlog.policies.time.type=TimeBasedTriggeringPolicy +appender.errorlog.policies.time.interval=1 +appender.errorlog.policies.time.modulate=true +appender.errorlog.strategy.type=DefaultRolloverStrategy +# audit +appender.audit.type=RollingFile +appender.audit.name=LOGAUDIT +appender.audit.fileName=${logDir}/${logFile}-audit.log +appender.audit.filePattern=${logDir}/%d{yyyy}/%d{MM}/%d{dd}/${logFile}-audit.log.%d{yyyy-MM-dd-HH} +appender.audit.layout.type=PatternLayout +appender.audit.layout.pattern=${logPattern} +appender.audit.policies.type=Policies +appender.audit.policies.time.type=TimeBasedTriggeringPolicy +appender.audit.policies.time.interval=1 +appender.audit.policies.time.modulate=true +appender.audit.strategy.type=DefaultRolloverStrategy + + +# loggers +loggers=file, netty, audit, httpclient, httpclientwire + +# logger - file +logger.file.name=file +logger.file.level=${env:EGGROLL_LOG_LEVEL:-INFO} +logger.file.appenderRefs=file +logger.file.appenderRef.file.ref=LOGFILE +logger.file.additivity=false + +# logger - error +logger.errorlog.name=errorlog +logger.errorlog.level=ERROR +logger.errorlog.appenderRefs=errorlog +logger.errorlog.appenderRef.file.ref=LOGERROR +logger.errorlog.additivity=false + +# logger - root +rootLogger.level=${env:EGGROLL_LOG_LEVEL:-INFO} +rootLogger.appenderRefs=file, stdout, errorlog +rootLogger.appenderRef.file.ref=LOGFILE +rootLogger.appenderRef.errorlog.ref=ERRORLOG +rootLogger.appenderRef.errorlog.level=ERROR + +# Uncomment the following line if you always want logs on console. +# Otherwise you can enable it by setting EGGROLL_LOG_LEVEL<=DEBUG or EGGROLL_LOG_CONSOLE=1 in system env +#rootLogger.appenderRef.stdout.ref=STDOUT + +# logger - netty +logger.netty.name=io.grpc.netty +logger.netty.level=INFO + +# logger - audit +logger.audit.name=audit +logger.audit.level=info +logger.audit.appenderRefs=audit +logger.audit.appenderRef.file.ref=LOGAUDIT +logger.audit.additivity=false + +# logger - HttpClient +logger.httpclient.name=org.apache.commons.httpclient +logger.httpclient.level=INFO + +logger.httpclientwire.name=httpclient.wire +logger.httpclientwire.level=INFO + diff --git a/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/node-extend-env.properties b/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/node-extend-env.properties new file mode 100644 index 0000000000..e69de29bb2 diff --git a/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/route_table.json b/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/route_table.json new file mode 100644 index 0000000000..38f08b1432 --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/route_table.json @@ -0,0 +1,28 @@ + +{ + "route_table": + { + "10001": + { + "default":[ + { + "port": 9370, + "ip": "127.0.0.1" + } + ] + }, + "10002": + { + "default":[ + { + "port": 9470, + "ip": "127.0.0.1" + } + ] + } + }, + "permission": + { + "default_allow": true + } +} \ No newline at end of file diff --git a/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/whitelist.json b/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/whitelist.json new file mode 100644 index 0000000000..9a8230fd1c --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/backends/eggroll/conf/whitelist.json @@ -0,0 +1,245 @@ +{ + "builtins": [ + "int", + "list", + "set", + "slice" + ], + "collections": [ + "defaultdict", + "OrderedDict" + ], + "eggroll.core.transfer_model": [ + "ErRollSiteHeader" + ], + "eggroll.roll_pair.task.storage": [ + "BSS" + ], + "federatedml.cipher_compressor.compressor": [ + "PackingCipherTensor", + "NormalCipherPackage", + "PackingCipherTensorPackage" + ], + "federatedml.ensemble.basic_algorithms.decision_tree.tree_core.feature_histogram": [ + "HistogramBag", + "FeatureHistogramWeights" + ], + "federatedml.ensemble.basic_algorithms.decision_tree.tree_core.feature_importance": [ + "FeatureImportance" + ], + "federatedml.ensemble.basic_algorithms.decision_tree.tree_core.g_h_optim": [ + "SplitInfoPackage" + ], + "federatedml.ensemble.basic_algorithms.decision_tree.tree_core.node": [ + "Node" + ], + "federatedml.ensemble.basic_algorithms.decision_tree.tree_core.splitter": [ + "SplitInfo" + ], + "federatedml.evaluation.performance_recorder": [ + "PerformanceRecorder" + ], + "federatedml.feature.binning.bin_result": [ + "BinColResults" + ], + "federatedml.feature.binning.optimal_binning.bucket_info": [ + "Bucket" + ], + "federatedml.feature.binning.optimal_binning.heap": [ + "MinHeap", + "IvHeapNode", + "GiniHeapNode", + "ChiSquareHeapNode" + ], + "federatedml.feature.binning.quantile_summaries": [ + "SparseQuantileSummaries", + "Stats", + "QuantileSummaries" + ], + "federatedml.feature.fate_element_type": [ + "NoneType" + ], + "federatedml.feature.homo_feature_binning.homo_binning_base": [ + "SplitPointNode" + ], + "federatedml.feature.instance": [ + "Instance" + ], + "federatedml.feature.one_hot_encoder": [ + "TransferPair" + ], + "federatedml.feature.sparse_vector": [ + "SparseVector" + ], + "federatedml.framework.weights": [ + "NumpyWeights", + "TransferableWeights", + "NumericWeights", + "ListWeights", + "DictWeights", + "OrderDictWeights" + ], + "federatedml.linear_model.linear_model_weight": [ + "LinearModelWeights" + ], + "federatedml.secureprotol.fixedpoint": [ + "FixedPointNumber" + ], + "federatedml.secureprotol.number_theory.field.integers_modulo_prime_field": [ + "IntegersModuloPrimeElement" + ], + "federatedml.secureprotol.number_theory.group.twisted_edwards_curve_group": [ + "TwistedEdwardsCurveElement" + ], + "federatedml.secureprotol.symmetric_encryption.cryptor_executor": [ + "CryptoExecutor" + ], + "federatedml.secureprotol.symmetric_encryption.pohlig_hellman_encryption": [ + "PohligHellmanCiphertext", + "PohligHellmanCipherKey" + ], + "federatedml.statistic.intersect.intersect_preprocess": [ + "BitArray" + ], + "federatedml.statistic.statics": [ + "SummaryStatistics" + ], + "gmpy2": [ + "from_binary" + ], + "numpy": [ + "ndarray", + "dtype" + ], + "numpy.core.multiarray": [ + "scalar", + "_reconstruct" + ], + "numpy.core.numeric": [ + "_frombuffer" + ], + "tensorflow.python.framework.ops": [ + "convert_to_tensor" + ], + "torch._utils": [ + "_rebuild_tensor_v2" + ], + "torch.storage": [ + "_load_from_bytes" + ], + "ipcl_python.bindings.ipcl_bindings": [ + "ipclPublicKey" + ], + "ipcl_python.ipcl_python": [ + "PaillierPublicKey", + "PaillierEncryptedNumber" + ], + "torch": [ + "Size" + ], + "fate.arch.tensor.storage.local.device.cpu.plain": [ + "_TorchStorage" + ], + "fate.arch.tensor.types._dtype": [ + "dtype" + ], + "fate.arch.tensor.types._shape": [ + "DAxis", + "Shape" + ], + "pandas.core.frame": [ + "DataFrame" + ], + "pandas.core.indexes.base": [ + "Index", + "_new_Index" + ], + "pandas.core.indexes.range": [ + "RangeIndex" + ], + "pandas.core.series": [ + "Series" + ], + "pandas.core.internals.managers": [ + "BlockManager", + "SingleBlockManager" + ], + "fate.arch.dataframe.manager.data_manager": [ + "DataManager" + ], + "fate.arch.dataframe.manager.schema_manager": [ + "SchemaManager", + "Schema" + ], + "fate.arch.dataframe.manager.block_manager":[ + "BlockManager", + "IndexBlock", + "BlockType", + "Int64Block", + "Float32Block", + "Float64Block", + "Int32Block", + "BoolBlock", + "NPObjectBlock", + "PHETensorBlock" + ], + "fate.arch.tensor.inside._op_quantile":[ + "GKSummary" + ], + "fate.arch.protocol.phe.paillier":[ + "Coder", + "SK", + "PK", + "evaluator" + ], + "fate.arch.protocol.phe.ou":[ + "Coder", + "SK", + "PK", + "evaluator" + ], + "fate.arch.tensor.phe._tensor":[ + "PHETensorEncoded", "PHETensor" + ], + "fate.arch.tensor.phe._keypair":[ + "PHETensorCoder" + ], + "fate_utils.quantile":[ + "QuantileSummaryStream" + ], + "fate_utils.paillier":[ + "Coder","Coders", "FixedpointVector", "PK", "FixedpointPaillierVector", "CiphertextVector","PlaintextVector" + ], + "fate_utils.ou":[ + "Coder", "Coders", "FixedpointVector", "PK", "FixedpointPaillierVector", "CiphertextVector","PlaintextVector" + ], + "fate.arch.unify._infra_def":[ + "device" + ], + "fate.arch.histogram._histogram_splits": [ + "HistogramSplits" + ], + "fate.arch.histogram.values._values": [ + "HistogramValuesContainer" + ], + "fate.arch.histogram.values._plain": [ + "HistogramPlainValues" + ], + "fate.arch.histogram.values._cipher":[ + "HistogramEncryptedValues" + ], + "fate.arch.protocol.phe.mock": [ + "PK", "SK", "FV", "EV", "Coder", "evaluator" + ], + "fate.arch.histogram.histogram":[ + "HistogramSplits", "HistogramPlainValues", "HistogramEncryptedValues" + ], + "torch":[ + "float32", + "int64", + "int32", + "device", + "float64", + "Size" + ] +} diff --git a/deploy/docker-compose/docker-deploy/training_template/docker-compose-eggroll.yml b/deploy/docker-compose/docker-deploy/training_template/docker-compose-eggroll.yml new file mode 100644 index 0000000000..d563a5b8e0 --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/docker-compose-eggroll.yml @@ -0,0 +1,168 @@ +# Copyright 2019-2022 VMware, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# you may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +version: '3.7' +networks: + fate-network: + ipam: + config: + - subnet: 192.167.0.0/16 + +volumes: + fate_flow_logs: + download_dir: + shared_dir_examples: + driver: local + driver_opts: + type: none + o: bind + device: /examples + shared_dir_fate: + driver: local + driver_opts: + type: none + o: bind + device: /fate + shared_dir_data: + driver: local + driver_opts: + type: none + o: bind + device: /data + +services: + osx: + image: "${RegistryURI}${OSX_IMAGE}:${OSX_IMAGE_TAG}" + restart: always + expose: + - 9370 + ports: + - "9370:9370" + environment: + PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION: python + JAVA_HOME: java + volumes: + - ./confs/osx/conf/:/data/projects/fate/osx/conf/broker/ + - /etc/localtime:/etc/localtime:ro + networks: + - fate-network + command: ["sh", "-c", "java -XX:+UseG1GC -XX:G1HeapRegionSize=16m -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -XX:SoftRefLRUPolicyMSPerMB=0 -verbose:gc -Xloggc:/dev/shm/rmq_srv_gc_%p_%t.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=30m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/oom/heapdump.hprof -server -Xms4g -Xmx4g -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPreTouch -XX:MaxDirectMemorySize=15g -XX:-UseLargePages -XX:-UseBiasedLocking -cp conf/broker/:lib/*:extension/*:/data/projects/fate/osx/lib/osx-broker-1.1.0.jar:pb_lib/v3/* org.fedai.osx.broker.Bootstrap -c /data/projects/fate/osx/conf"] + + fateboard: + image: "${FATEBoard_IMAGE}:${FATEBoard_IMAGE_TAG}" + restart: always + ports: + - "8080:8080" + volumes: + - ./confs/fate_board/conf:/data/projects/fate/fate_board/conf + - fate_flow_logs:/data/projects/fate/fate_flow/logs + - /etc/localtime:/etc/localtime:ro + networks: + - fate-network + depends_on: + - fateflow + command: ["sh", "-c", "java -Dspring.config.location=/data/projects/fate/fate_board/conf/application.properties -Dssh_config_file=/data/projects/fate/fate_board/ssh/ -Xmx2048m -Xms2048m -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError -cp /data/projects/fate/fate_board/lib/*:/data/projects/fate/fate_board/fateboard-2.1.0.jar org.fedai.fate.board.bootstrap.Bootstrap"] + + clustermanager: + image: "${EGGRoll_IMAGE}:${EGGRoll_IMAGE_TAG}" + restart: always + expose: + - 4670 + volumes: + - ./confs/eggroll/conf/:/data/projects/fate/eggroll/conf/ + - /etc/localtime:/etc/localtime:ro + - shared_dir_fate:/data/projects/fate/fate + environment: + PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION: python + networks: + - fate-network + command: ["bash", "-c", "java -server -Dlog4j.configurationFile=$${EGGROLL_HOME}/conf/log4j2.xml -Dmodule=clustermanager -cp $${EGGROLL_HOME}/lib/*: org.fedai.eggroll.clustermanager.Bootstrap -p 4670 -s EGGROLL_DAEMON"] + + nodemanager: + image: "${EGGRoll_IMAGE}:${EGGRoll_IMAGE_TAG}" + restart: always + expose: + - 4671 + volumes: + - ./confs/eggroll/conf:/data/projects/fate/eggroll/conf/ + - ./confs/fate_flow/conf/service_conf.yaml:/data/projects/fate/conf/service_conf.yaml + - ./shared_dir/data/nodemanager:/data/projects/fate/eggroll/data + - /etc/localtime:/etc/localtime:ro + - shared_dir_fate:/data/projects/fate/fate + depends_on: + - clustermanager + networks: + - fate-network + environment: + PYTHONPATH: /data/projects/fate/fate/python:/data/projects/fate/fate_flow/python:/data/projects/fate/fate_client/python:/data/projects/fate/eggroll/python + cap_add: + - SYS_PTRACE + command: ["bash", "-c", "java -server -Dlog4j.configurationFile=$${EGGROLL_HOME}/conf/log4j2.xml -Dmodule=nodemanager -cp $${EGGROLL_HOME}/lib/*: org.fedai.eggroll.nodemanager.Bootstrap -p 4671 -s EGGROLL_DAEMON"] + + fateflow: + image: "${FATEFlow_IMAGE}:${FATEFlow_IMAGE_TAG}" + environment: + FATE_PROJECT_BASE: "/data/projects/fate" + FATE_LOG_LEVEL: "DEBUG" + ports: + - "9360:9360" + - "9380:9380" + restart: always + volumes: + - shared_dir_fate:/data/projects/fate/fate + - shared_dir_examples:/data/projects/fate/examples + - download_dir:/data/projects/fate/fate/python/download_dir + - fate_flow_logs:/data/projects/fate/fate_flow/logs + - ./confs/fate_flow/conf/service_conf.yaml:/data/projects/fate/fate_flow/conf/service_conf.yaml + - ./confs/fate_flow/conf/pulsar_route_table.yaml:/data/projects/fate/fate_flow/conf/pulsar_route_table.yaml + - ./confs/fate_flow/conf/rabbitmq_route_table.yaml:/data/projects/fate/fate_flow/conf/rabbitmq_route_table.yaml + - ./confs/eggroll/conf:/data/projects/fate/eggroll/conf + - /etc/localtime:/etc/localtime:ro + depends_on: + - mysql + - osx + - clustermanager + - nodemanager + networks: + fate-network: + ipv4_address: 192.167.0.100 + healthcheck: + test: ["CMD", "curl", "-f", "-X GET", "http://192.167.0.100:9380/v2/server/fateflow"] + interval: 1m30s + timeout: 10s + retries: 3 + start_period: 40s + command: + - "/bin/bash" + - "-c" + - | + set -x + pip install cryptography && sleep 5 && python fate_flow/python/fate_flow/fate_flow_server.py --debug + python /data/projects/fate/eggroll/python/setup.py install + sleep 10 && pipeline init --ip fateflow --port 9380 + flow init --ip fateflow --port 9380 + + mysql: + image: "${MySQL_IMAGE}:${MySQL_IMAGE_TAG}" + expose: + - 3306 + volumes: + - ./confs/mysql/init:/docker-entrypoint-initdb.d/ + - ./shared_dir/data/mysql:/var/lib/mysql + - /etc/localtime:/etc/localtime:ro + restart: always + cap_add: + - SYS_NICE + environment: + MYSQL_ALLOW_EMPTY_PASSWORD: "yes" + networks: + - fate-network diff --git a/deploy/docker-compose/docker-deploy/training_template/public/fate_board/conf/application.properties b/deploy/docker-compose/docker-deploy/training_template/public/fate_board/conf/application.properties new file mode 100644 index 0000000000..c2fbc0f256 --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/public/fate_board/conf/application.properties @@ -0,0 +1,29 @@ +server.port=8080 +fateflow.url=http://localhost:9380 +#priority is higher than {fateflow.url}, split by ; +#below config can support configuring more than one fate flow for this fate board +fateflow.url-list= +fateflow.http_app_key= +fateflow.http_secret_key= +server.servlet.encoding.charset=UTF-8 +server.servlet.encoding.enabled=true +server.tomcat.uri-encoding=UTF-8 +fateboard.front_end.cors=false +fateboard.front_end.url=http://localhost:8028 +server.tomcat.max-threads=1000 +server.tomcat.max-connections=20000 +spring.servlet.multipart.max-file-size=10MB +spring.servlet.multipart.max-request-size=100MB +spring.servlet.session.timeout=1800s +server.compression.enabled=true +server.compression.mime-types=application/json,application/xml,text/html,text/xml,text/plain +server.board.login.username= +server.board.login.password= +server.board.encrypt.private_key= +server.board.encrypt.enable=false +#only [h,m,s] is available +server.servlet.session.timeout=4h +server.servlet.session.cookie.max-age=4h +management.endpoints.web.exposure.exclude=* +feign.client.config.default.connectTimeout=10000 +feign.client.config.default.readTimeout=10000 diff --git a/deploy/docker-compose/docker-deploy/training_template/public/fate_board/conf/ssh.properties b/deploy/docker-compose/docker-deploy/training_template/public/fate_board/conf/ssh.properties new file mode 100644 index 0000000000..e69de29bb2 diff --git a/deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/pulsar_route_table.yaml b/deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/pulsar_route_table.yaml new file mode 100644 index 0000000000..2e9d58030b --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/pulsar_route_table.yaml @@ -0,0 +1,17 @@ +9999: + host: 192.168.0.4 + port: 6650 + sslPort: 6651 + proxy: "" + +10000: + host: 192.168.0.3 + port: 6650 + sslPort: 6651 + proxy: "" + +default: + proxy: "proxy.fate.org:443" + domain: "fate.org" + brokerPort: 6650 + brokerSslPort: 6651 \ No newline at end of file diff --git a/deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/rabbitmq_route_table.yaml b/deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/rabbitmq_route_table.yaml new file mode 100644 index 0000000000..8ff50453df --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/rabbitmq_route_table.yaml @@ -0,0 +1,6 @@ +9999: + host: guest + port: 5672 +10000: + host: rabbitmq + port: 5672 diff --git a/deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/service_conf.yaml b/deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/service_conf.yaml new file mode 100644 index 0000000000..fbe4295826 --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/public/fate_flow/conf/service_conf.yaml @@ -0,0 +1,127 @@ +party_id: "9999" +use_registry: false +# DEBUG 10/INFO 20 +log_level: 20 +encrypt: + key_0: + module: fate_flow.hub.encrypt.password_encrypt#pwdecrypt + # base on: fate_flow/conf/ + private_path: private_key.pem +fateflow: + host: 192.167.0.100 + http_port: 9380 + grpc_port: 9360 + proxy_name: osx +# nginx: +# host: +# http_port: +# grpc_port: +database: + engine: mysql + # encrypt passwd key + decrypt_key: + mysql: + name: + user: + passwd: + host: + port: 3306 + max_connections: 100 + stale_timeout: 30 + sqlite: + # default fate_flow/runtime/system_settings: SQLITE_PATH + # /xxx/xxx.sqlite + path: +default_engines: + computing: eggroll + federation: osx + storage: eggroll +default_provider: + name: fate + # version default: fateflow.env + version: + device: local +computing: + standalone: + cores: 32 + eggroll: + cores: 32 + nodes: 1 + # cluster manager host and port + host: clustermanager + port: 4670 + spark: + # default use SPARK_HOME environment variable + home: /data/projects/spark-3.1.3-bin-hadoop3.2/ + cores: 32 +federation: + osx: + host: osx + port: 9370 + # stream or queue + mode: stream + pulsar: + host: pulsar + port: 6650 + mng_port: 8080 + cluster: standalone + tenant: fl-tenant + topic_ttl: 30 + # default conf/pulsar_route_table.yaml + route_table: conf/pulsar_route_table.yaml + # mode: replication / client, default: replication + mode: replication + max_message_size: 1048576 + nginx: + host: nginx + http_port: 9300 + grpc_port: 9310 + # http or grpc + protocol: http + rabbitmq: + host: rabbitmq + mng_port: 15672 + port: 5672 + user: fate + password: fate + # default conf/rabbitmq_route_table.yaml + route_table: conf/pulsar_route_table.yaml + # mode: replication / client, default: replication + mode: replication +storage: + hdfs: + name_node: hdfs://namenode:9000 +hook_module: + client_authentication: fate_flow.hook.flow.client_authentication + site_authentication: fate_flow.hook.flow.site_authentication + permission: fate_flow.hook.flow.permission +authentication: + client: false + site: false + permission: false +model_store: + engine: file + # encrypt passwd key + decrypt_key: + file: + # default fate_flow/runtime/system_settings: MODEL_STORE_PATH + path: + mysql: + name: fate_flow + user: fate + passwd: fate + host: 127.0.0.1 + port: 3306 + max_connections: 100 + stale_timeout: 30 + tencent_cos: + Region: + SecretId: + SecretKey: + Bucket: +zookeeper: + hosts: + - 127.0.0.1:2181 + use_acl: true + user: fate + password: fate diff --git a/deploy/docker-compose/docker-deploy/training_template/public/mysql/init/create-eggroll-meta-tables.sql b/deploy/docker-compose/docker-deploy/training_template/public/mysql/init/create-eggroll-meta-tables.sql new file mode 100644 index 0000000000..9e674c77f0 --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/public/mysql/init/create-eggroll-meta-tables.sql @@ -0,0 +1,205 @@ +-- create database if not exists, default database is eggroll_meta +CREATE DATABASE IF NOT EXISTS `eggroll_meta`; + +-- all operation under this database +USE `eggroll_meta`; + +-- store_locator +CREATE TABLE IF NOT EXISTS `store_locator` +( + `store_locator_id` SERIAL PRIMARY KEY, + `store_type` VARCHAR(255) NOT NULL, + `namespace` VARCHAR(2000) NOT NULL DEFAULT 'DEFAULT', + `name` VARCHAR(2000) NOT NULL, + `path` VARCHAR(2000) NOT NULL DEFAULT '', + `total_partitions` INT UNSIGNED NOT NULL, + `key_serdes_type` INT NOT NULL DEFAULT 0, + `value_serdes_type` INT NOT NULL DEFAULT 0, + `partitioner_type` INT NOT NULL DEFAULT 0, + `version` INT UNSIGNED NOT NULL DEFAULT 0, + `status` VARCHAR(255) NOT NULL, + `created_at` DATETIME DEFAULT CURRENT_TIMESTAMP, + `updated_at` DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP +) DEFAULT CHARACTER SET latin1 + COLLATE latin1_swedish_ci; + +CREATE UNIQUE INDEX `idx_u_store_locator_ns_n` ON `store_locator` (`namespace`(120), `name`(640)); +CREATE INDEX `idx_store_locator_st` ON `store_locator` (`store_type`(255)); +CREATE INDEX `idx_store_locator_ns` ON `store_locator` (`namespace`(767)); +CREATE INDEX `idx_store_locator_n` ON `store_locator` (`name`(767)); +CREATE INDEX `idx_store_locator_s` ON `store_locator` (`status`(255)); +CREATE INDEX `idx_store_locator_v` ON `store_locator` (`version`); + + +-- store (option) +CREATE TABLE IF NOT EXISTS `store_option` +( + `store_option_id` SERIAL PRIMARY KEY, + `store_locator_id` BIGINT UNSIGNED NOT NULL, + `name` VARCHAR(255) NOT NULL, + `data` VARCHAR(2000) NOT NULL DEFAULT '', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP +) DEFAULT CHARACTER SET latin1 + COLLATE latin1_swedish_ci; + +CREATE INDEX `idx_store_option_si` ON `store_option` (`store_locator_id`); + + +-- store_partition +CREATE TABLE IF NOT EXISTS `store_partition` +( + `store_partition_id` SERIAL PRIMARY KEY, -- self-increment sequence + `store_locator_id` BIGINT UNSIGNED NOT NULL, + `node_id` BIGINT UNSIGNED NOT NULL, + `partition_id` INT UNSIGNED NOT NULL, -- partition id of a store + `status` VARCHAR(255) NOT NULL, + `created_at` DATETIME DEFAULT CURRENT_TIMESTAMP, + `updated_at` DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP +) DEFAULT CHARACTER SET latin1 + COLLATE latin1_swedish_ci; + +CREATE UNIQUE INDEX `idx_u_store_partition_si_spi_ni` ON `store_partition` (`store_locator_id`, `store_partition_id`, `node_id`); +CREATE INDEX `idx_store_partition_sli` ON `store_partition` (`store_locator_id`); +CREATE INDEX `idx_store_partition_ni` ON `store_partition` (`node_id`); +CREATE INDEX `idx_store_partition_s` ON `store_partition` (`status`(255)); + + +-- node +CREATE TABLE IF NOT EXISTS `server_node` +( + `server_node_id` SERIAL PRIMARY KEY, + `name` VARCHAR(2000) NOT NULL DEFAULT '', + `server_cluster_id` BIGINT UNSIGNED NOT NULL DEFAULT 0, + `host` VARCHAR(1000) NOT NULL, + `port` INT NOT NULL, + `node_type` VARCHAR(255) NOT NULL, + `status` VARCHAR(255) NOT NULL, + `last_heartbeat_at` DATETIME DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP, + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP +) DEFAULT CHARACTER SET latin1 + COLLATE latin1_swedish_ci; + +CREATE INDEX `idx_server_node_h_p_nt` ON `server_node` (`host`(600), `port`, `node_type`(100)); +CREATE INDEX `idx_server_node_h` ON `server_node` (`host`(767)); +CREATE INDEX `idx_server_node_sci` ON `server_node` (`server_cluster_id`); +CREATE INDEX `idx_server_node_nt` ON `server_node` (`node_type`(255)); +CREATE INDEX `idx_server_node_s` ON `server_node` (`status`(255)); + + +-- session (main) +CREATE TABLE IF NOT EXISTS `session_main` +( + `session_id` VARCHAR(767) PRIMARY KEY, + `name` VARCHAR(2000) NOT NULL DEFAULT '', + `status` VARCHAR(255) NOT NULL, + `status_reason` VARCHAR(255), + `before_status` VARCHAR(255), + `tag` VARCHAR(255), + `total_proc_count` INT, + `active_proc_count` INT, + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP +) DEFAULT CHARACTER SET latin1 + COLLATE latin1_swedish_ci; + +CREATE INDEX `idx_session_main_s` ON `session_main` (`status`); + + +-- session (option) +CREATE TABLE IF NOT EXISTS `session_option` +( + `session_option_id` SERIAL PRIMARY KEY, + `session_id` VARCHAR(2000), + `name` VARCHAR(255) NOT NULL, + `data` VARCHAR(2000) NOT NULL DEFAULT '', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP +) DEFAULT CHARACTER SET latin1 + COLLATE latin1_swedish_ci; + +CREATE INDEX `idx_session_option_si` ON `session_option` (`session_id`(767)); + + +-- session (processor) +CREATE TABLE IF NOT EXISTS `session_processor` +( + `processor_id` SERIAL PRIMARY KEY, + `session_id` VARCHAR(767), + `server_node_id` INT NOT NULL, + `processor_type` VARCHAR(255) NOT NULL, + `status` VARCHAR(255), + `before_status` VARCHAR(255), + `tag` VARCHAR(255), + `command_endpoint` VARCHAR(255), + `transfer_endpoint` VARCHAR(255), + `processor_option` VARCHAR(512), + `pid` INT NOT NULL DEFAULT -1, + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP +) DEFAULT CHARACTER SET latin1 + COLLATE latin1_swedish_ci; + +CREATE INDEX `idx_session_processor_si` ON `session_processor` (`session_id`(767)); + + +CREATE TABLE IF NOT EXISTS `processor_resource` +( + `id` SERIAL PRIMARY KEY, + `processor_id` BIGINT NOT NULL, + `session_id` VARCHAR(767), + `server_node_id` INT NOT NULL, + `resource_type` VARCHAR(255), + `allocated` BIGINT NOT NULL default 0, + `extention` VARCHAR(512), + `status` VARCHAR(255), + `pid` INT NOT NULL DEFAULT -1, + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP +) DEFAULT CHARACTER SET latin1 + COLLATE latin1_swedish_ci; +CREATE INDEX `idx_processor_id_processor_resource` ON `processor_resource` (`processor_id`); +CREATE INDEX `idx_node_id_processor_resource` ON `processor_resource` (`server_node_id`); +CREATE INDEX `idx_session_id_processor_resource` ON `processor_resource` (`session_id`); +CREATE INDEX `idx_node_status_processor_resource` ON `processor_resource` (`server_node_id`,`resource_type`,`status`); + + + +CREATE TABLE IF NOT EXISTS `node_resource` +( + `resource_id` SERIAL PRIMARY KEY, + `server_node_id` BIGINT NOT NULL, + `resource_type` VARCHAR(255), + `total` BIGINT NOT NULL default 0, + `used` BIGINT NOT NULL default 0, + `pre_allocated` BIGINT NOT NULL default 0, + `allocated` BIGINT NOT NULL DEFAULT 0, + `extention` VARCHAR(512), + `status` VARCHAR(255), + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP +) DEFAULT CHARACTER SET latin1 + COLLATE latin1_swedish_ci; +CREATE INDEX `idx_node_id_node_resource` ON `node_resource` (`server_node_id`); +CREATE INDEX `idx_node_status_node_resource` ON `node_resource` (`server_node_id`,`status`); +CREATE UNIQUE INDEX `idx_u_node_resource` ON `node_resource` (`server_node_id`, `resource_type`); + + +CREATE TABLE IF NOT EXISTS `session_ranks` +( + `container_id` SERIAL PRIMARY KEY, + `session_id` VARCHAR(767), + `server_node_id` INT NOT NULL, + `global_rank` INT UNSIGNED NOT NULL, + `local_rank` INT UNSIGNED NOT NULL +) DEFAULT CHARACTER SET latin1 + COLLATE latin1_swedish_ci; + + CREATE INDEX `idx_session_id_session_ranks` ON `session_ranks` (`session_id`); + + + + + + diff --git a/deploy/docker-compose/docker-deploy/training_template/public/osx/conf/broker.properties b/deploy/docker-compose/docker-deploy/training_template/public/osx/conf/broker.properties new file mode 100644 index 0000000000..9d537b9976 --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/public/osx/conf/broker.properties @@ -0,0 +1,61 @@ +grpc.port= 9370 +# Http switch for the server. +# If set to True, the server will open the http port. +# http port configuration can be set through http.port +open.http.server=false +# port of http +http.port=8087 +https.port=8088 +# whether the http server uses TLS +#ttp.use.tls = false +# whether the grpc server uses TLS? +# If true, a grpc port will be specially opened to listen for TLS requests +# grpc tls port configuration can be set through grpc.tls.port +open.grpc.tls.server=false +grpc.tls.port=9883 +# the partyId of self ,multiple partyIds can be set. +# eg: 9999,10000,10001 +self.party=9999 +# deployment mode, including cluster/standalone, +# respectively representing cluster mode and standalone mode , +# and standalone is used by default +deploy.mode=standalone +# the zookeeper address needs to be configured when the deployment mode is cluster +zk.url=127.0.0.1:2181 +stream.limit.mode=LOCAL + +# the IP of the cluster manager component of eggroll +eggroll.cluster.manager.ip = clustermanager +# the port of the cluster manager component of eggroll +eggroll.cluster.manager.port = 4670 +# maximum number of message retries +produce.msg.max.try.time =3 + +http.client.method.config = {"UNARY_CALL":{"reqTimeout":0,"connectionTimeout":0,"socketTimeout":0}} + +http.use.tls=false + +http.ssl.trust.store.type=PKCS12 + +http.ssl.key.store.alias=22 + +http.ssl.key.store.password=123456 + + +mapped.file.size=134217728 + +#http.ssl.trust.store.path=D:\\44\\127.0.0.1.pfx + +server.ca.file= +server.cert.chain.file= +server.private.key.file= + + + + + + + + + + diff --git a/deploy/docker-compose/docker-deploy/training_template/public/osx/conf/route_table.json b/deploy/docker-compose/docker-deploy/training_template/public/osx/conf/route_table.json new file mode 100644 index 0000000000..abe60b8c56 --- /dev/null +++ b/deploy/docker-compose/docker-deploy/training_template/public/osx/conf/route_table.json @@ -0,0 +1,26 @@ +{ + "route_table": + { + "9999": + { + "fateflow":[ + { + "port": 9360, + "ip": "127.0.0.1" + } + ] + }, + "10000":{ + "default":[{ + "protocol":"http", + "url": "http://127.0.0.1:8087/osx/inbound", + "ip": "127.0.0.1", + "port": 9370 + }] + } + }, + "permission": + { + "default_allow": true + } +} \ No newline at end of file From e81a69222230d2e1b8f092d1abdf0519f1fa5b7b Mon Sep 17 00:00:00 2001 From: robbie228 Date: Fri, 15 Nov 2024 10:16:52 +0800 Subject: [PATCH 2/8] update README Signed-off-by: robbie228 --- deploy/docker-compose/README_zh.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/deploy/docker-compose/README_zh.md b/deploy/docker-compose/README_zh.md index 14e6a1b7ca..ee4ecad031 100644 --- a/deploy/docker-compose/README_zh.md +++ b/deploy/docker-compose/README_zh.md @@ -20,7 +20,7 @@ Compose是用于定义和运行多容器Docker应用程序的工具。通过Comp 2. 所有主机安装Docker 版本 : 19.03.0+; 3. 所有主机安装Docker Compose 版本: 1.27.0+; 4. 部署机可以联网,所以主机相互之间可以网络互通; -5. 运行机已经下载FATE的各组件镜像,如果无法连接dockerhub,请考虑使用harbor([Harbor 作为本地镜像源](../registry/README.md))或者使用离线部署(离线构建镜像参考文档[构建镜像]( https://github.com/FederatedAI/FATE-Builder/tree/main/docker-build))。 +5. 运行机已经下载FATE的各组件镜像,如果无法连接dockerhub,请考虑使用harbor或者使用离线部署(离线构建镜像参考文档[构建镜像]( https://github.com/FederatedAI/FATE-Builder/tree/main/docker-build))。 6. 运行FATE的主机推荐配置8CPUs和16G RAM。 ### 下载部署脚本 @@ -42,7 +42,7 @@ RegistryURI=hub.c.163.com ### 手动下载镜像(可选) -如果运行机没有FATE组件的镜像,可以通过以下命令从Docker Hub获取镜像。FATE镜像的版本``可在[release页面](https://github.com/FederatedAI/FATE/releases)上查看,其中serving镜像的版本信息在[这个页面](https://github.com/FederatedAI/FATE-Serving/releases): +如果运行机没有FATE组件的镜像,可以通过以下命令从Docker Hub获取镜像。FATE镜像的版本``可在[release页面](https://github.com/FederatedAI/FATE/releases)上查看: ```bash docker pull federatedai/eggroll:3.2.0-release @@ -66,7 +66,7 @@ mysql 8.0.28 ### 离线部署(可选) -当我们的运行机器处于无法连接外部网络的时候,就无法从Docker Hub下载镜像,建议使用[Harbor](https://goharbor.io/)作为本地镜像仓库。安装Harbor请参考[文档](https://github.com/FederatedAI/KubeFATE/blob/master/registry/install_harbor.md)。在`.env`文件中,将`RegistryURI`变量更改为Harbor的IP。如下面 192.168.10.1是Harbor IP的示例。 +当我们的运行机器处于无法连接外部网络的时候,就无法从Docker Hub下载镜像,建议使用[Harbor](https://goharbor.io/)作为本地镜像仓库。安装Harbor请参考[文档](https://github.com/FederatedAI/KubeFATE/blob/master/registry/README.md)。在`.env`文件中,将`RegistryURI`变量更改为Harbor的IP。如下面 192.168.10.1是Harbor IP的示例。 ```bash $ cd KubeFATE/ @@ -87,7 +87,7 @@ RegistryURI=192.168.10.1/federatedai 根据需求修改配置文件`kubeFATE\docker-deploy\parties.conf`。 -`parties.conf`配置文件配置项的含义查看这个文档[parties.conf文件介绍](../docs/configurations/Docker_compose_Partys_configuration.md) +`parties.conf`配置文件配置项的含义查看这个文档[parties.conf文件介绍](https://github.com/FederatedAI/KubeFATE/blob/master/docs/configurations/Docker_compose_Partys_configuration.md) 下面是修改好的文件,`party 10000`的集群将部署在*192.168.7.1*上,而`party 9999`的集群将部署在*192.168.7.2*上。 @@ -130,9 +130,9 @@ fateboard_password=admin ``` -使用Docker-compose部署FATE可以支持多种种不同的类型引擎的组合(对computing federation storage的选择),关于不同类型的FATE的更多细节查看: [不同类型FATE的架构介绍](../docs/Introduction_to_Engine_Architecture_zh.md)。 +使用Docker-compose部署FATE可以支持多种种不同的类型引擎的组合(对computing federation storage的选择),关于不同类型的FATE的更多细节查看: [不同类型FATE的架构介绍](https://github.com/FederatedAI/KubeFATE/blob/master/docs/Introduction_to_Engine_Architecture_zh.md)。 -`algorithm`和`device`的配置可以查看这里[FATE_Algorithm_and_Computational_Acceleration_Selection.md](../docs/FATE_Algorithm_and_Computational_Acceleration_Selection.md) +`algorithm`和`device`的配置可以查看这里[FATE_Algorithm_and_Computational_Acceleration_Selection.md](https://github.com/FederatedAI/KubeFATE/blob/master/docs/FATE_Algorithm_and_Computational_Acceleration_Selection.md) **注意**: 默认情况下不会部署exchange组件。如需部署,用户可以把服务器IP填入上述配置文件的`exchangeip`中,该组件的默认监听端口为9371。 @@ -400,7 +400,7 @@ rm -rf ../confs-/ # 删除docker-compose部署文件 #### 采用docker hub下载镜像速度可能较慢 -解决办法:可以自己构建镜像,自己构建镜像参考[这里](https://github.com/FederatedAI/FATE/tree/master/docker-build)。 +解决办法:可以自己构建镜像,自己构建镜像参考[这里](https://github.com/FederatedAI/FATE-Builder/tree/master/docker-build)。 #### 运行脚本`./docker_deploy.sh all`的时候提示需要输入密码 From f7c26211d41fe7675b70d0294423829e15bbfd14 Mon Sep 17 00:00:00 2001 From: robbie228 Date: Fri, 15 Nov 2024 10:27:12 +0800 Subject: [PATCH 3/8] update README Signed-off-by: robbie228 --- deploy/docker-compose/README_zh.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/deploy/docker-compose/README_zh.md b/deploy/docker-compose/README_zh.md index ee4ecad031..6c7f1a9fcf 100644 --- a/deploy/docker-compose/README_zh.md +++ b/deploy/docker-compose/README_zh.md @@ -25,7 +25,7 @@ Compose是用于定义和运行多容器Docker应用程序的工具。通过Comp ### 下载部署脚本 -在任意机器上下载合适的KubeFATE版本,可参考 [releases pages](https://github.com/FederatedAI/KubeFATE/releases),然后解压。 +在任意机器上下载合适的KubeFATE版本,可参考 [releases pages](https://github.com/FederatedAI/KubeFATE/releases),然后解压;2.2.0版本docker-compose部署可以参考[Docker-Compose](../docker-compose]) ### 修改镜像配置文件(可选) From d30ca1e24b7100905adab0322a7d191423b421ff Mon Sep 17 00:00:00 2001 From: robbie228 Date: Fri, 15 Nov 2024 10:29:28 +0800 Subject: [PATCH 4/8] update README Signed-off-by: robbie228 --- deploy/docker-compose/README_zh.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/deploy/docker-compose/README_zh.md b/deploy/docker-compose/README_zh.md index 6c7f1a9fcf..a5c3481bd8 100644 --- a/deploy/docker-compose/README_zh.md +++ b/deploy/docker-compose/README_zh.md @@ -25,7 +25,7 @@ Compose是用于定义和运行多容器Docker应用程序的工具。通过Comp ### 下载部署脚本 -在任意机器上下载合适的KubeFATE版本,可参考 [releases pages](https://github.com/FederatedAI/KubeFATE/releases),然后解压;2.2.0版本docker-compose部署可以参考[Docker-Compose](../docker-compose]) +在任意机器上下载合适的KubeFATE版本,可参考 [releases pages](https://github.com/FederatedAI/KubeFATE/releases),然后解压;2.2.0版本docker-compose部署可以参考[Docker-Compose](../docker-compose) ### 修改镜像配置文件(可选) From 2432dc8a3bc811323498e48a2156e97a8ac0eb32 Mon Sep 17 00:00:00 2001 From: robbie228 Date: Fri, 15 Nov 2024 14:35:17 +0800 Subject: [PATCH 5/8] update README Signed-off-by: robbie228 --- deploy/docker-compose/README_zh.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/deploy/docker-compose/README_zh.md b/deploy/docker-compose/README_zh.md index a5c3481bd8..11030ffd3a 100644 --- a/deploy/docker-compose/README_zh.md +++ b/deploy/docker-compose/README_zh.md @@ -92,10 +92,10 @@ RegistryURI=192.168.10.1/federatedai 下面是修改好的文件,`party 10000`的集群将部署在*192.168.7.1*上,而`party 9999`的集群将部署在*192.168.7.2*上。 ```bash -user=fate +user=fate #使用实际使用的用户 dir=/data/projects/fate -party_list=(10000 9999) -party_ip_list=(192.168.7.1 192.168.7.2) +party_list=(10000 9999) #所需要部署的实际partyid,部署单边只填写一个 +party_ip_list=(192.168.7.1 192.168.7.2) #所需要部署的实际IP,与partyid对应,部署单边只填写一个 # Engines: # Computing : Eggroll, Spark, Spark_local From 90c9e00dde29466bc13ddb3e9efb84c1cc63aa04 Mon Sep 17 00:00:00 2001 From: robbie228 Date: Mon, 18 Nov 2024 14:59:43 +0800 Subject: [PATCH 6/8] 1.update README 2.update README_zh 3.add copyright 4.delete test.sh Signed-off-by: robbie228 --- README.md | 2 - deploy/docker-compose/README_zh.md | 8 +- .../docker-deploy/docker_deploy.sh | 1 + .../docker-deploy/generate_config.sh | 1 + .../docker-deploy/hosts-setup.sh | 3 + deploy/docker-compose/docker-deploy/test.sh | 182 ------------------ 6 files changed, 9 insertions(+), 188 deletions(-) delete mode 100644 deploy/docker-compose/docker-deploy/test.sh diff --git a/README.md b/README.md index ee586ca8d6..a19c6da166 100644 --- a/README.md +++ b/README.md @@ -33,8 +33,6 @@ FATE can be deployed on a single node or on multiple nodes. Choose the deploymen ### Cluster deployment Deploying FATE to multiple nodes to achieve scalability, reliability and manageability. - [Cluster deployment by CLI](./deploy/cluster-deploy): Using CLI to deploy a FATE cluster. - -### Docker-Compose deployment - [Docker-Compose deployment](./deploy/docker-compose): Using docker-compose to deploy FATE. ### Quick Start diff --git a/deploy/docker-compose/README_zh.md b/deploy/docker-compose/README_zh.md index 11030ffd3a..b77795d11a 100644 --- a/deploy/docker-compose/README_zh.md +++ b/deploy/docker-compose/README_zh.md @@ -130,11 +130,11 @@ fateboard_password=admin ``` -使用Docker-compose部署FATE可以支持多种种不同的类型引擎的组合(对computing federation storage的选择),关于不同类型的FATE的更多细节查看: [不同类型FATE的架构介绍](https://github.com/FederatedAI/KubeFATE/blob/master/docs/Introduction_to_Engine_Architecture_zh.md)。 +使用Docker-compose部署FATE可以支持多种不同的类型引擎的组合(对computing federation storage的选择),关于不同类型的FATE的更多细节查看: [不同类型FATE的架构介绍](https://github.com/FederatedAI/KubeFATE/blob/master/docs/Introduction_to_Engine_Architecture_zh.md)。 `algorithm`和`device`的配置可以查看这里[FATE_Algorithm_and_Computational_Acceleration_Selection.md](https://github.com/FederatedAI/KubeFATE/blob/master/docs/FATE_Algorithm_and_Computational_Acceleration_Selection.md) -**注意**: 默认情况下不会部署exchange组件。如需部署,用户可以把服务器IP填入上述配置文件的`exchangeip`中,该组件的默认监听端口为9371。 +**注意:** 默认情况下不会部署exchange组件。如需部署,用户可以把服务器IP填入上述配置文件的`exchangeip`中,该组件的默认监听端口为9371。 在运行部署脚本之前,需要确保部署机器可以ssh免密登录到两个运行节点主机上。user代表免密的用户。 @@ -162,7 +162,7 @@ drwxr-xr-x. 2 fate docker 6 May 27 00:51 fate ### 执行部署脚本 -**注意:**在运行以下命令之前,所有目标主机必须 +**注意**: 在运行以下命令之前,所有目标主机必须 * 允许使用 SSH 密钥进行无密码 SSH 访问(否则我们将需要为每个主机多次输入密码)。 * 满足 [准备工作](#准备工作) 中指定的要求。 @@ -408,4 +408,4 @@ rm -rf ../confs-/ # 删除docker-compose部署文件 #### CPU指令集问题 -解决办法:查看[wiki](https://github.com/FederatedAI/KubeFATE/wiki/KubeFATE)页面的storage-service部分。 +解决办法:查看[wiki](https://github.com/FederatedAI/KubeFATE/wiki)页面的storage-service部分。 diff --git a/deploy/docker-compose/docker-deploy/docker_deploy.sh b/deploy/docker-compose/docker-deploy/docker_deploy.sh index dbc8731182..ef47cbfa83 100644 --- a/deploy/docker-compose/docker-deploy/docker_deploy.sh +++ b/deploy/docker-compose/docker-deploy/docker_deploy.sh @@ -1,6 +1,7 @@ #!/bin/bash # Copyright 2019-2022 VMware, Inc. +# Copyright 2019 The FATE Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/deploy/docker-compose/docker-deploy/generate_config.sh b/deploy/docker-compose/docker-deploy/generate_config.sh index ebd402381e..ef40dda6e5 100644 --- a/deploy/docker-compose/docker-deploy/generate_config.sh +++ b/deploy/docker-compose/docker-deploy/generate_config.sh @@ -1,6 +1,7 @@ #!/bin/bash # Copyright 2019-2022 VMware, Inc. +# Copyright 2019 The FATE Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/deploy/docker-compose/docker-deploy/hosts-setup.sh b/deploy/docker-compose/docker-deploy/hosts-setup.sh index f22f047118..4722a9339a 100644 --- a/deploy/docker-compose/docker-deploy/hosts-setup.sh +++ b/deploy/docker-compose/docker-deploy/hosts-setup.sh @@ -1,4 +1,7 @@ #! /bin/bash + +# Copyright 2019 The FATE Authors. All Rights Reserved. +# ### Group, user and folder creation ## Add docker group if not exists echo "Adding group 'docker'" diff --git a/deploy/docker-compose/docker-deploy/test.sh b/deploy/docker-compose/docker-deploy/test.sh deleted file mode 100644 index c46a260690..0000000000 --- a/deploy/docker-compose/docker-deploy/test.sh +++ /dev/null @@ -1,182 +0,0 @@ -#!/bin/bash - -# Copyright 2019-2020 VMware, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# you may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -BASEDIR=$(dirname "$0") -cd $BASEDIR -WORKINGDIR=$(pwd) - -# fetch fate-python image -source ${WORKINGDIR}/.env -source ${WORKINGDIR}/parties.conf - -cd ${WORKINGDIR} - -get_party_ip(){ - target_party_id=$1 - for ((i = 0; i < ${#partylist[*]}; i++)); do - if [ "${partylist[$i]}" = "$target_party_id" ]; then - target_party_ip=${partyiplist[$i]} - fi - done - return $target_party_ip -} - -Test() { - - while [ "$1" != "" ]; do - case $1 in - toy_example) - shift - if [ "$1" = "" ] || [ "$2" = "" ]; then - echo "No party id was provided, please check your arguments " - echo "Example: " - echo " 'bash test.sh toy_example 9999 10000'" - exit 1 - fi - toy_example $@ - break - ;; - min_test_task) - shift - min_test_task $@ - break - ;; - serving) - shift - serving $@ - break - ;; - esac - shift - done - -} - -toy_example() { - echo "start test toy_example" - guest=$1 - host=$2 - echo "guest_id: "$guest - echo "host_id: "$host - - target_party_id=$1 - echo "target_party_id: "$target_party_id - for ((i = 0; i < ${#party_ip_list[*]}; i++)); do - if [ "${party_list[$i]}" = "$target_party_id" ]; then - target_party_ip=${party_ip_list[$i]} - fi - done - echo "*********************start docker log***************************" - echo $user@$target_party_ip - ssh -tt $user@$target_party_ip < Date: Mon, 18 Nov 2024 15:20:17 +0800 Subject: [PATCH 7/8] 1.update README_zh Signed-off-by: robbie228 --- deploy/docker-compose/README_zh.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/deploy/docker-compose/README_zh.md b/deploy/docker-compose/README_zh.md index b77795d11a..2d48349e30 100644 --- a/deploy/docker-compose/README_zh.md +++ b/deploy/docker-compose/README_zh.md @@ -162,7 +162,7 @@ drwxr-xr-x. 2 fate docker 6 May 27 00:51 fate ### 执行部署脚本 -**注意**: 在运行以下命令之前,所有目标主机必须 +**注意:** 在运行以下命令之前,所有目标主机必须 * 允许使用 SSH 密钥进行无密码 SSH 访问(否则我们将需要为每个主机多次输入密码)。 * 满足 [准备工作](#准备工作) 中指定的要求。 From 60f939af0795563e58ac0edccb4e6e0054d12231 Mon Sep 17 00:00:00 2001 From: robbie228 Date: Tue, 19 Nov 2024 16:11:40 +0800 Subject: [PATCH 8/8] update copyright Signed-off-by: robbie228 --- .../docker-compose/docker-deploy/hosts-setup.sh | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/deploy/docker-compose/docker-deploy/hosts-setup.sh b/deploy/docker-compose/docker-deploy/hosts-setup.sh index 4722a9339a..42d9759950 100644 --- a/deploy/docker-compose/docker-deploy/hosts-setup.sh +++ b/deploy/docker-compose/docker-deploy/hosts-setup.sh @@ -1,6 +1,18 @@ #! /bin/bash - -# Copyright 2019 The FATE Authors. All Rights Reserved. +# +# Copyright 2019 The FATE Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. # ### Group, user and folder creation ## Add docker group if not exists