Skip to content

Commit

Permalink
Merge pull request #491 from FederatedAI/build-doc-2.0.0-beta
Browse files Browse the repository at this point in the history
Build doc 2.0.0 beta
  • Loading branch information
zhihuiwan authored Sep 11, 2023
2 parents ab57b71 + af4af7f commit 007e43a
Show file tree
Hide file tree
Showing 48 changed files with 4,590 additions and 2,671 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ on:
push:
branches:
- 'main'
- 'develop-[0-9].[0-9].[0-9]'
- 'develop-[0-9]+.[0-9]+.[0-9]+'
- 'build-doc-[0-9]+.[0-9]+.[0-9]+-[a-zA-Z]+'

schedule:
- cron: '0 8 * * *'
Expand Down Expand Up @@ -41,6 +42,7 @@ jobs:
VERSION='${{ github.ref_name }}'
[ "$VERSION" == main ] && { VERSION=latest; ALIAS='main master'; }
VERSION="${VERSION#develop-}"
VERSION="${VERSION#build-doc-}"
mike deploy --push --update-aliases "$VERSION" $ALIAS
mike set-default --push latest
38 changes: 38 additions & 0 deletions doc/build/build.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import json
import os.path
import subprocess
import sys
import threading

import requests


def run_script(script_path, *args):
result = subprocess.run(['python', script_path, *args])
return result.stderr


if __name__ == '__main__':
base_dir = os.path.dirname(__file__)
build_path = os.path.join(base_dir, 'build_swagger_server.py')

thread = threading.Thread(target=run_script, args=(build_path,))
thread.start()
#
thread.join()
build_path = os.path.join(base_dir, 'swagger_server.py')
port = "50000"
server = threading.Thread(target=run_script, args=(build_path, port))

result = server.start()

import time
time.sleep(3)
data = requests.get(url=f"http://127.0.0.1:{port}/swagger.json").text
data = json.loads(data)
swagger_file = os.path.join(os.path.dirname(base_dir), "swagger", "swagger.json")
os.makedirs(os.path.dirname(swagger_file), exist_ok=True)
with open(swagger_file, "w") as fw:
json.dump(data, fw, indent=4)
print("build success!")
sys.exit()
110 changes: 110 additions & 0 deletions doc/fate_flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Overall Design

## 1. Logical Architecture

- DSL defined jobs
- Top-down vertical subtask flow scheduling, multi-participant joint subtask coordination
- Independent isolated task execution work processes
- Support for multiple types and versions of components
- Computational abstraction API
- Storage abstraction API
- Cross-party transfer abstraction API

![](./images/fate_flow_logical_arch.png)

## 2. Service Architecture

### 2.1 FATE

![](./images/fate_arch.png)

### 2.2 FATE Flow

![](./images/fate_flow_arch.png)

## 3. [Scheduling Architecture](./fate_flow_job_scheduling.md)

### 3.1 A new scheduling architecture based on shared-state

- Stripping state (resources, jobs) and managers (schedulers, resource managers)
- Resource state and job state are persisted in MySQL and shared globally to provide reliable transactional operations
- Improve the high availability and scalability of managed services
- Jobs can be intervened to support restart, rerun, parallel control, resource isolation, etc.

![](./images/fate_flow_scheduling_arch.png)

### 3.2 State-Driven Scheduling

- Resource coordination
- Pull up the child process Executor to run the component
- Executor reports state to local Server and also to scheduler
- Multi-party task state calculation of federal task state
- Upstream and downstream task states compute job states

![](./images/fate_flow_resource_process.png)

## 4. [Multiparty Resource Coordination](./fate_flow_resource_management.md)

- The total resource size of each engine is configured through the configuration file, and the system is subsequently interfaced
- The cores_per_node in the total resource size indicates the number of cpu cores per compute node, and nodes indicates the number of compute nodes.
- FATEFlow server reads the resource size configuration from the configuration file when it starts and registers the update to the database
- The resources are requested in Job dimension, and take effect when Job Conf is submitted, formula: task_parallelism*task_cores
- See separate section of the documentation for details

## 5. [Data Flow Tracking](./fate_flow_tracking.md)

- Definition
- metric type: metric type, such as auc, loss, ks, etc.
- metric namespace: custom metric namespace, e.g. train, predict
- metric name: custom metric name, e.g. auc0, hetero_lr_auc0
- metric data: metric data in key-value form
- metric meta: metric meta information in key-value form, support flexible drawing
- API
- log_metric_data(metric_namespace, metric_name, metrics)
- set_metric_meta(metric_namespace, metric_name, metric_meta)
- get_metric_data(metric_namespace, metric_name)
- get_metric_meta(metric_namespace, metric_name)

## 6. [Realtime Monitoring](./fate_flow_monitoring.md)

- Job process survivability detection
- Job timeout detection
- Resource recovery detection
- Base engine session timeout detection

![](./images/fate_flow_detector.png)

## 7. [Task Component Registry](./fate_flow_component_registry.md)

![](./images/fate_flow_component_registry.png)

## 8. [Multi-Party Federated Model Registry](./fate_flow_model_registry.md)

- Using Google Protocol Buffer as the model storage protocol, using cross-language sharing, each algorithmic model consists of two parts: ModelParam & ModelMeta
- A Pipeline generates a series of algorithmic models
- The model named Pipeline stores Pipeline modeling DSL and online inference DSL
- Under federal learning, model consistency needs to be guaranteed for all participants, i.e., model binding
- model_key is the model identifier defined by the user when submitting the task
- The model IDs of the federated parties are the party identification information role, party_id, plus model_key
- The model version of the federated parties must be unique and consistent, and FATE-Flow directly sets it to job_id

![](./images/fate_flow_pipelined_model.png){: style="height:400px;width:450px"}

![](./images/fate_flow_model_storage.png){: style="height:400px;width:800px"}

## 9. [Data Access](./fate_flow_data_access.md)

- Upload.
- External storage is imported directly to FATE Storage, creating a new DTable
- When the job runs, Reader reads directly from Storage

- Table Bind.
- Key the external storage address to a new DTable in FATE
- When the job is running, Reader reads data from external storage via Meta and transfers it to FATE Storage
- Connecting to the Big Data ecosystem: HDFS, Hive/MySQL

![](./images/fate_flow_inputoutput.png)

## 10. [Multi-Party Collaboration Authority Management](./fate_flow_authority_management.md)

![](./images/fate_flow_authorization.png)
110 changes: 110 additions & 0 deletions doc/fate_flow.zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# 整体设计

## 1. 逻辑架构

- DSL定义作业
- 自顶向下的纵向子任务流调度、多参与方联合子任务协调
- 独立隔离的任务执行工作进程
- 支持多类型多版本组件
- 计算抽象API
- 存储抽象API
- 跨方传输抽象API

![](./images/fate_flow_logical_arch.png)

## 2. 整体架构

### 2.1 FATE整体架构

![](./images/fate_arch.png)

### 2.2 FATE Flow整体架构

![](./images/fate_flow_arch.png)

## 3. [调度架构](./fate_flow_job_scheduling.zh.md)

### 3.1 基于共享状态的全新调度架构

- 剥离状态(资源、作业)与管理器(调度器、资源管理器)
- 资源状态与作业状态持久化存于MySQL,全局共享,提供可靠事务性操作
- 提高管理服务的高可用与扩展性
- 作业可介入,支持实现如重启、重跑、并行控制、资源隔离等

![](./images/fate_flow_scheduling_arch.png)

### 3.2 状态驱动调度

- 资源协调
- 拉起子进程Executor运行组件
- Executor上报状态到本方Server,并且同时上报到调度方
- 多方任务状态计算联邦任务状态
- 上下游任务状态计算作业作态

![](./images/fate_flow_resource_process.png)

## 4. [多方资源协调](./fate_flow_resource_management.zh.md)

- 每个引擎总资源大小通过配置文件配置,后续实现系统对接
- 总资源大小中的cores_per_node表示每个计算节点cpu核数,nodes表示计算节点个数
- FATEFlow server启动时从配置文件读取资源大小配置,并注册更新到数据库
- 以Job维度申请资源,Job Conf提交时生效,公式:task_parallelism*task_cores
- 详细请看文档单独章节

## 5. [数据流动追踪](./fate_flow_tracking.zh.md)

- 定义
- metric type: 指标类型,如auc, loss, ks等等
- metric namespace: 自定义指标命名空间,如train, predict
- metric name: 自定义指标名称,如auc0,hetero_lr_auc0
- metric data: key-value形式的指标数据
- metric meta: key-value形式的指标元信息,支持灵活画图
- API
- log_metric_data(metric_namespace, metric_name, metrics)
- set_metric_meta(metric_namespace, metric_name, metric_meta)
- get_metric_data(metric_namespace, metric_name)
- get_metric_meta(metric_namespace, metric_name)

## 6. [作业实时监测](./fate_flow_monitoring.zh.md)

- 工作进程存活性检测
- 作业超时检测
- 资源回收检测
- 基础引擎会话超时检测

![](./images/fate_flow_detector.png)

## 7. [任务组件中心](./fate_flow_component_registry.zh.md)

![](./images/fate_flow_component_registry.png)

## 8. [多方联合模型注册中心](./fate_flow_model_registry.zh.md)

- 使用Google Protocol Buffer作为模型存储协议,利用跨语言共享,每个算法模型由两部分组成:ModelParam & ModelMeta
- 一个Pipeline产生一系列算法模型
- 命名为Pipeline的模型存储Pipeline建模DSL及在线推理DSL
- 联邦学习下,需要保证所有参与方模型一致性,即模型绑定
- model_key为用户提交任务时定义的模型标识
- 联邦各方的模型ID由本方标识信息role、party_id,加model_key
- 联邦各方的模型版本必须唯一且保持一致,FATE-Flow直接设置为job_id

![](./images/fate_flow_pipelined_model.png){: style="height:400px;width:450px"}

![](./images/fate_flow_model_storage.png){: style="height:400px;width:800px"}

## 9. [数据接入](./fate_flow_data_access.zh.md)

- Upload:
- 外部存储直接导入到FATE Storage,创建一个新的DTable
- 作业运行时,Reader直接从Storage读取

- Table Bind:
- 外部存储地址关键到FATE一个新的DTable
- 作业运行时,Reader通过Meta从外部存储读取数据并转存到FATE Storage
- 打通大数据生态:HDFS,Hive/MySQL

![](./images/fate_flow_inputoutput.png)

## 10. [多方合作权限管理](./fate_flow_authority_management.zh.md)

![](./images/fate_flow_authorization.png)
Binary file added doc/images/fate_arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_deploy_directory.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_authorization.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_component_dsl.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_component_registry.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_dag.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_detector.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_dsl.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_inputoutput.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_logical_arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_major_feature.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_model_storage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_pipelined_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_resource_process.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/fate_flow_scheduling_arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/federated_learning_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions doc/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
template: overrides/home.html
title: Secure, Privacy-preserving Machine Learning Multi-Party Schduling System
---
4 changes: 4 additions & 0 deletions doc/index.zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
template: overrides/home.zh.html
title: 安全,隐私保护的机器学习多方调度系统
---
79 changes: 79 additions & 0 deletions doc/mkdocs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Build

## use docker

At repo root, execute

```sh
docker run --rm -it -p 8000:8000 -v ${PWD}:/docs sagewei0/mkdocs
```

to serve docs in http://localhost:8000

or

```sh
docker run --rm -it -p 8000:8000 -v ${PWD}:/docs sagewei0/mkdocs build
```

to build docs to `site` folder.

## manually

[`mkdocs-material`](https://pypi.org/project/mkdocs-material/) and servel plugins are needed to build this docs

Fisrt, create an python virtual environment

```sh
python3 -m venv "fatedocs"
source fatedocs/bin/activate
pip install -U pip
```
And then install requirements

```sh
pip install -r doc/mkdocs/requirements.txt
```

Now, use

```sh
mkdocs serve
```

at repo root to serve docs or

use

```sh
mkdocs build
```

at repo root to build docs to folder `site`


# Develop guide

We use [mkdocs-material](https://squidfunk.github.io/mkdocs-material/) to build our docs.
Servel markdown extensions are really useful to write pretty documents such as
[admonitions](https://squidfunk.github.io/mkdocs-material/reference/admonitions/) and
[content-tabs](https://squidfunk.github.io/mkdocs-material/reference/content-tabs/).

Servel plugins are introdused to makes mkdocs-material much powerful:


- [mkdocstrings](https://mkdocstrings.github.io/usage/)
automatic documentation from sources code. We mostly use this to automatic generate
`params api` for `federatedml`.

- [awesome-pages](https://github.com/lukasgeiter/mkdocs-awesome-pages-plugin)
for powerful nav rule

- [i18n](https://ultrabug.github.io/mkdocs-static-i18n/)
for multi-languege support

- [mkdocs-jupyter](https://github.com/danielfrg/mkdocs-jupyter)
for jupyter format support

- [mkdocs-simple-hooks](https://github.com/aklajnert/mkdocs-simple-hooks)
for simple plugin-in
1 change: 1 addition & 0 deletions doc/mkdocs/assets/animations/ml.json

Large diffs are not rendered by default.

Loading

0 comments on commit 007e43a

Please sign in to comment.