-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #491 from FederatedAI/build-doc-2.0.0-beta
Build doc 2.0.0 beta
- Loading branch information
Showing
48 changed files
with
4,590 additions
and
2,671 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
import json | ||
import os.path | ||
import subprocess | ||
import sys | ||
import threading | ||
|
||
import requests | ||
|
||
|
||
def run_script(script_path, *args): | ||
result = subprocess.run(['python', script_path, *args]) | ||
return result.stderr | ||
|
||
|
||
if __name__ == '__main__': | ||
base_dir = os.path.dirname(__file__) | ||
build_path = os.path.join(base_dir, 'build_swagger_server.py') | ||
|
||
thread = threading.Thread(target=run_script, args=(build_path,)) | ||
thread.start() | ||
# | ||
thread.join() | ||
build_path = os.path.join(base_dir, 'swagger_server.py') | ||
port = "50000" | ||
server = threading.Thread(target=run_script, args=(build_path, port)) | ||
|
||
result = server.start() | ||
|
||
import time | ||
time.sleep(3) | ||
data = requests.get(url=f"http://127.0.0.1:{port}/swagger.json").text | ||
data = json.loads(data) | ||
swagger_file = os.path.join(os.path.dirname(base_dir), "swagger", "swagger.json") | ||
os.makedirs(os.path.dirname(swagger_file), exist_ok=True) | ||
with open(swagger_file, "w") as fw: | ||
json.dump(data, fw, indent=4) | ||
print("build success!") | ||
sys.exit() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# Overall Design | ||
|
||
## 1. Logical Architecture | ||
|
||
- DSL defined jobs | ||
- Top-down vertical subtask flow scheduling, multi-participant joint subtask coordination | ||
- Independent isolated task execution work processes | ||
- Support for multiple types and versions of components | ||
- Computational abstraction API | ||
- Storage abstraction API | ||
- Cross-party transfer abstraction API | ||
|
||
![](./images/fate_flow_logical_arch.png) | ||
|
||
## 2. Service Architecture | ||
|
||
### 2.1 FATE | ||
|
||
![](./images/fate_arch.png) | ||
|
||
### 2.2 FATE Flow | ||
|
||
![](./images/fate_flow_arch.png) | ||
|
||
## 3. [Scheduling Architecture](./fate_flow_job_scheduling.md) | ||
|
||
### 3.1 A new scheduling architecture based on shared-state | ||
|
||
- Stripping state (resources, jobs) and managers (schedulers, resource managers) | ||
- Resource state and job state are persisted in MySQL and shared globally to provide reliable transactional operations | ||
- Improve the high availability and scalability of managed services | ||
- Jobs can be intervened to support restart, rerun, parallel control, resource isolation, etc. | ||
|
||
![](./images/fate_flow_scheduling_arch.png) | ||
|
||
### 3.2 State-Driven Scheduling | ||
|
||
- Resource coordination | ||
- Pull up the child process Executor to run the component | ||
- Executor reports state to local Server and also to scheduler | ||
- Multi-party task state calculation of federal task state | ||
- Upstream and downstream task states compute job states | ||
|
||
![](./images/fate_flow_resource_process.png) | ||
|
||
## 4. [Multiparty Resource Coordination](./fate_flow_resource_management.md) | ||
|
||
- The total resource size of each engine is configured through the configuration file, and the system is subsequently interfaced | ||
- The cores_per_node in the total resource size indicates the number of cpu cores per compute node, and nodes indicates the number of compute nodes. | ||
- FATEFlow server reads the resource size configuration from the configuration file when it starts and registers the update to the database | ||
- The resources are requested in Job dimension, and take effect when Job Conf is submitted, formula: task_parallelism*task_cores | ||
- See separate section of the documentation for details | ||
|
||
## 5. [Data Flow Tracking](./fate_flow_tracking.md) | ||
|
||
- Definition | ||
- metric type: metric type, such as auc, loss, ks, etc. | ||
- metric namespace: custom metric namespace, e.g. train, predict | ||
- metric name: custom metric name, e.g. auc0, hetero_lr_auc0 | ||
- metric data: metric data in key-value form | ||
- metric meta: metric meta information in key-value form, support flexible drawing | ||
- API | ||
- log_metric_data(metric_namespace, metric_name, metrics) | ||
- set_metric_meta(metric_namespace, metric_name, metric_meta) | ||
- get_metric_data(metric_namespace, metric_name) | ||
- get_metric_meta(metric_namespace, metric_name) | ||
|
||
## 6. [Realtime Monitoring](./fate_flow_monitoring.md) | ||
|
||
- Job process survivability detection | ||
- Job timeout detection | ||
- Resource recovery detection | ||
- Base engine session timeout detection | ||
|
||
![](./images/fate_flow_detector.png) | ||
|
||
## 7. [Task Component Registry](./fate_flow_component_registry.md) | ||
|
||
![](./images/fate_flow_component_registry.png) | ||
|
||
## 8. [Multi-Party Federated Model Registry](./fate_flow_model_registry.md) | ||
|
||
- Using Google Protocol Buffer as the model storage protocol, using cross-language sharing, each algorithmic model consists of two parts: ModelParam & ModelMeta | ||
- A Pipeline generates a series of algorithmic models | ||
- The model named Pipeline stores Pipeline modeling DSL and online inference DSL | ||
- Under federal learning, model consistency needs to be guaranteed for all participants, i.e., model binding | ||
- model_key is the model identifier defined by the user when submitting the task | ||
- The model IDs of the federated parties are the party identification information role, party_id, plus model_key | ||
- The model version of the federated parties must be unique and consistent, and FATE-Flow directly sets it to job_id | ||
|
||
![](./images/fate_flow_pipelined_model.png){: style="height:400px;width:450px"} | ||
|
||
![](./images/fate_flow_model_storage.png){: style="height:400px;width:800px"} | ||
|
||
## 9. [Data Access](./fate_flow_data_access.md) | ||
|
||
- Upload. | ||
- External storage is imported directly to FATE Storage, creating a new DTable | ||
- When the job runs, Reader reads directly from Storage | ||
|
||
- Table Bind. | ||
- Key the external storage address to a new DTable in FATE | ||
- When the job is running, Reader reads data from external storage via Meta and transfers it to FATE Storage | ||
- Connecting to the Big Data ecosystem: HDFS, Hive/MySQL | ||
|
||
![](./images/fate_flow_inputoutput.png) | ||
|
||
## 10. [Multi-Party Collaboration Authority Management](./fate_flow_authority_management.md) | ||
|
||
![](./images/fate_flow_authorization.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# 整体设计 | ||
|
||
## 1. 逻辑架构 | ||
|
||
- DSL定义作业 | ||
- 自顶向下的纵向子任务流调度、多参与方联合子任务协调 | ||
- 独立隔离的任务执行工作进程 | ||
- 支持多类型多版本组件 | ||
- 计算抽象API | ||
- 存储抽象API | ||
- 跨方传输抽象API | ||
|
||
![](./images/fate_flow_logical_arch.png) | ||
|
||
## 2. 整体架构 | ||
|
||
### 2.1 FATE整体架构 | ||
|
||
![](./images/fate_arch.png) | ||
|
||
### 2.2 FATE Flow整体架构 | ||
|
||
![](./images/fate_flow_arch.png) | ||
|
||
## 3. [调度架构](./fate_flow_job_scheduling.zh.md) | ||
|
||
### 3.1 基于共享状态的全新调度架构 | ||
|
||
- 剥离状态(资源、作业)与管理器(调度器、资源管理器) | ||
- 资源状态与作业状态持久化存于MySQL,全局共享,提供可靠事务性操作 | ||
- 提高管理服务的高可用与扩展性 | ||
- 作业可介入,支持实现如重启、重跑、并行控制、资源隔离等 | ||
|
||
![](./images/fate_flow_scheduling_arch.png) | ||
|
||
### 3.2 状态驱动调度 | ||
|
||
- 资源协调 | ||
- 拉起子进程Executor运行组件 | ||
- Executor上报状态到本方Server,并且同时上报到调度方 | ||
- 多方任务状态计算联邦任务状态 | ||
- 上下游任务状态计算作业作态 | ||
|
||
![](./images/fate_flow_resource_process.png) | ||
|
||
## 4. [多方资源协调](./fate_flow_resource_management.zh.md) | ||
|
||
- 每个引擎总资源大小通过配置文件配置,后续实现系统对接 | ||
- 总资源大小中的cores_per_node表示每个计算节点cpu核数,nodes表示计算节点个数 | ||
- FATEFlow server启动时从配置文件读取资源大小配置,并注册更新到数据库 | ||
- 以Job维度申请资源,Job Conf提交时生效,公式:task_parallelism*task_cores | ||
- 详细请看文档单独章节 | ||
|
||
## 5. [数据流动追踪](./fate_flow_tracking.zh.md) | ||
|
||
- 定义 | ||
- metric type: 指标类型,如auc, loss, ks等等 | ||
- metric namespace: 自定义指标命名空间,如train, predict | ||
- metric name: 自定义指标名称,如auc0,hetero_lr_auc0 | ||
- metric data: key-value形式的指标数据 | ||
- metric meta: key-value形式的指标元信息,支持灵活画图 | ||
- API | ||
- log_metric_data(metric_namespace, metric_name, metrics) | ||
- set_metric_meta(metric_namespace, metric_name, metric_meta) | ||
- get_metric_data(metric_namespace, metric_name) | ||
- get_metric_meta(metric_namespace, metric_name) | ||
|
||
## 6. [作业实时监测](./fate_flow_monitoring.zh.md) | ||
|
||
- 工作进程存活性检测 | ||
- 作业超时检测 | ||
- 资源回收检测 | ||
- 基础引擎会话超时检测 | ||
|
||
![](./images/fate_flow_detector.png) | ||
|
||
## 7. [任务组件中心](./fate_flow_component_registry.zh.md) | ||
|
||
![](./images/fate_flow_component_registry.png) | ||
|
||
## 8. [多方联合模型注册中心](./fate_flow_model_registry.zh.md) | ||
|
||
- 使用Google Protocol Buffer作为模型存储协议,利用跨语言共享,每个算法模型由两部分组成:ModelParam & ModelMeta | ||
- 一个Pipeline产生一系列算法模型 | ||
- 命名为Pipeline的模型存储Pipeline建模DSL及在线推理DSL | ||
- 联邦学习下,需要保证所有参与方模型一致性,即模型绑定 | ||
- model_key为用户提交任务时定义的模型标识 | ||
- 联邦各方的模型ID由本方标识信息role、party_id,加model_key | ||
- 联邦各方的模型版本必须唯一且保持一致,FATE-Flow直接设置为job_id | ||
|
||
![](./images/fate_flow_pipelined_model.png){: style="height:400px;width:450px"} | ||
|
||
![](./images/fate_flow_model_storage.png){: style="height:400px;width:800px"} | ||
|
||
## 9. [数据接入](./fate_flow_data_access.zh.md) | ||
|
||
- Upload: | ||
- 外部存储直接导入到FATE Storage,创建一个新的DTable | ||
- 作业运行时,Reader直接从Storage读取 | ||
|
||
- Table Bind: | ||
- 外部存储地址关键到FATE一个新的DTable | ||
- 作业运行时,Reader通过Meta从外部存储读取数据并转存到FATE Storage | ||
- 打通大数据生态:HDFS,Hive/MySQL | ||
|
||
![](./images/fate_flow_inputoutput.png) | ||
|
||
## 10. [多方合作权限管理](./fate_flow_authority_management.zh.md) | ||
|
||
![](./images/fate_flow_authorization.png) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
template: overrides/home.html | ||
title: Secure, Privacy-preserving Machine Learning Multi-Party Schduling System | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
template: overrides/home.zh.html | ||
title: 安全,隐私保护的机器学习多方调度系统 | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Build | ||
|
||
## use docker | ||
|
||
At repo root, execute | ||
|
||
```sh | ||
docker run --rm -it -p 8000:8000 -v ${PWD}:/docs sagewei0/mkdocs | ||
``` | ||
|
||
to serve docs in http://localhost:8000 | ||
|
||
or | ||
|
||
```sh | ||
docker run --rm -it -p 8000:8000 -v ${PWD}:/docs sagewei0/mkdocs build | ||
``` | ||
|
||
to build docs to `site` folder. | ||
|
||
## manually | ||
|
||
[`mkdocs-material`](https://pypi.org/project/mkdocs-material/) and servel plugins are needed to build this docs | ||
|
||
Fisrt, create an python virtual environment | ||
|
||
```sh | ||
python3 -m venv "fatedocs" | ||
source fatedocs/bin/activate | ||
pip install -U pip | ||
``` | ||
And then install requirements | ||
|
||
```sh | ||
pip install -r doc/mkdocs/requirements.txt | ||
``` | ||
|
||
Now, use | ||
|
||
```sh | ||
mkdocs serve | ||
``` | ||
|
||
at repo root to serve docs or | ||
|
||
use | ||
|
||
```sh | ||
mkdocs build | ||
``` | ||
|
||
at repo root to build docs to folder `site` | ||
|
||
|
||
# Develop guide | ||
|
||
We use [mkdocs-material](https://squidfunk.github.io/mkdocs-material/) to build our docs. | ||
Servel markdown extensions are really useful to write pretty documents such as | ||
[admonitions](https://squidfunk.github.io/mkdocs-material/reference/admonitions/) and | ||
[content-tabs](https://squidfunk.github.io/mkdocs-material/reference/content-tabs/). | ||
|
||
Servel plugins are introdused to makes mkdocs-material much powerful: | ||
|
||
|
||
- [mkdocstrings](https://mkdocstrings.github.io/usage/) | ||
automatic documentation from sources code. We mostly use this to automatic generate | ||
`params api` for `federatedml`. | ||
|
||
- [awesome-pages](https://github.com/lukasgeiter/mkdocs-awesome-pages-plugin) | ||
for powerful nav rule | ||
|
||
- [i18n](https://ultrabug.github.io/mkdocs-static-i18n/) | ||
for multi-languege support | ||
|
||
- [mkdocs-jupyter](https://github.com/danielfrg/mkdocs-jupyter) | ||
for jupyter format support | ||
|
||
- [mkdocs-simple-hooks](https://github.com/aklajnert/mkdocs-simple-hooks) | ||
for simple plugin-in |
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.