Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement apiToken failover mechanism #1256

Merged
merged 36 commits into from
Nov 16, 2024
Merged

Conversation

cr7258
Copy link
Collaborator

@cr7258 cr7258 commented Aug 27, 2024

Ⅰ. Describe what this PR did

配置示例:

provider:
  type: qwen
  apiTokens:
    - "api-token-1"
    - "api-token-2"
    - "api-token-3"
  modelMapping:
    'gpt-3': "qwen-turbo"
    'gpt-4-turbo': "qwen-max"
    '*': "qwen-turbo"
  failover:
    enabled: true
    failureThreshold: 3
    successThreshold: 1
    healthCheckInterval: 5000
    healthCheckTimeout: 5000
    healthCheckModel: gpt-3

目前仅根据 HTTP 请求的响应状态码是否是 200 来判断 apiToken 是否可用,应该暂时用不到其他复杂的判断条件。

Ⅱ. Does this pull request fix one issue?

fixes #1227

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Question

目前还有两个问题:

    1. 由于 Envoy 会启动多个 Wasm VM,当前的故障切换和健康检测是每个 Wasm VM 分别去做的(也就是说 VM1 可能已经把某个 apiToken 移除了,但是 VM2 可能还会继续用这个 apiToken 进行请求),是否需要通过 proxywasm.SetSharedData 在多个 Wasm VM 间进行同步?如果同步的话会带来另一个问题,如果 apiToken 不可用时,多个 Wasm VM 会同时发起多个健康检测请求。
    1. 我需要发送请求到 envoy 本地监听的服务和端口来对 apiToken 做健康检测,目前我的做法是手动创建一个 cluster,指向 envoy 本地 Listen 的地址和端口,这样好像不太灵活,而且需要用户额外设置 cluster。有没有更好的方式?
healthCheckClient = wrapper.NewClusterClient(wrapper.StaticIpCluster{
		ServiceName: "local_cluster",
		Port:        10000,
	})
    - name: outbound|10000||local_cluster.static
      connect_timeout: 0.25s
      type: STATIC
      load_assignment:
        cluster_name: outbound|10000||local_cluster.static
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 10000

@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.52%. Comparing base (ef31e09) to head (f0f24cc).
Report is 201 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1256      +/-   ##
==========================================
+ Coverage   35.91%   43.52%   +7.61%     
==========================================
  Files          69       76       +7     
  Lines       11576    12320     +744     
==========================================
+ Hits         4157     5362    +1205     
+ Misses       7104     6622     -482     
- Partials      315      336      +21     

see 69 files with indirect coverage changes

@johnlanni
Copy link
Collaborator

@cr7258 可以用SetSharedData同步一下,要注意用cas机制避免冲突,同时也可以基于SetSharedData机制进行选主,让一个worker做健康检查恢复,不过要注意SharedData中的数据是VM级别的,即使插件配置更新也不会清理。

@cr7258
Copy link
Collaborator Author

cr7258 commented Aug 31, 2024

@johnlanni 我修改了代码,使用 SetSharedData 在多个 VM 之间同步 apiToken 的信息,并且也使用 SetSharedData 进行选主了。

不过要注意SharedData中的数据是VM级别的,即使插件配置更新也不会清理。

这个地方提到的注意点,我需要做那些处理?

@johnlanni
Copy link
Collaborator

@johnlanni 我修改了代码,使用 SetSharedData 在多个 VM 之间同步 apiToken 的信息,并且也使用 SetSharedData 进行选主了。

不过要注意SharedData中的数据是VM级别的,即使插件配置更新也不会清理。

这个地方提到的注意点,我需要做那些处理?

大的问题没有,上面提到一些跟机制相关的细节处理,辛苦再调整下

@CH3CHO
Copy link
Collaborator

CH3CHO commented Sep 4, 2024

README.md 应该也要更新一下

plugins/wasm-go/extensions/ai-proxy/main.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-proxy/main.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-proxy/main.go Outdated Show resolved Hide resolved
@cr7258
Copy link
Collaborator Author

cr7258 commented Nov 3, 2024

@CH3CHO 我把调用的逻辑包装到 handleRequestHeaders 和 handleRequestBody 函数中了,每个 provider 在 OnRequestHeaders 和 OnRequestBody 中分别调用这两个函数即可。之所以没有抽到 main 函数中,是考虑到在处理 headers 或者 body 的前后不同的 provider 的逻辑有可能有些不一样。example qwen, example claude

在 handleRequestBody 中还对从文件中获取 context 这种统一的行为作为处理,每个 provider 不需要重复写 m.contextCache.GetContent(func(content string, err error) 这部分代码了。insertContext 允许用户实现 provider 自己的 insertHttpContextMessage 方法,比如 qwenclaude 插入 system message 的方式不一样,如果没有实现,则使用默认的 defaultInsertHttpContextMessage 方法。

TransformRequestHeaders 和 TransformRequestBody 目前改为可选实现,如果没有实现 TransformRequestHeaders,不做任何修改,如何没有实现 TransformRequestBody,则只调用 defaultTransformRequestBody 方法做 model 映射。

上述修改已使用下面配置文件进行测试:

apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-proxy-groq
  namespace: higress-system
spec:
  matchRules:
  - config:
      provider:
        type: groq
        apiTokens: 
          - "<grop-token>"
          - "sk-bad-groq"
        modelMapping:
          "*": llama3-8b-8192
        context:
          fileUrl: https://raw.githubusercontent.com/cr7258/test-context/refs/heads/main/README.md
          serviceName: github.dns
          servicePort: 443
        failover:
          enabled: true
          failureThreshold: 3
          successThreshold: 5
          healthCheckModel: gpt-3
    service:
    - groq.dns
  - config:
      provider:
        type: claude
        apiTokens: 
          - "<claude-token>"
          - "sk-bad-claude"
        modelMapping:
          gpt-3: claude-3-opus-20240229
          "*": claude-3-sonnet-20240229
        context:
          fileUrl: https://raw.githubusercontent.com/cr7258/test-context/refs/heads/main/README.md
          serviceName: github.dns
          servicePort: 443
        failover:
          enabled: true
          failureThreshold: 2
          successThreshold: 9
          healthCheckModel: gpt-3
    service:
    - claude.dns
  - config:
      provider:
        type: qwen
        apiTokens: 
          - "<qwen-token>"
          - "sk-bad-qwen"
        modelMapping:
          gpt-3: qwen-turbo
          "*": qwen-turbo
        context:
          fileUrl: https://raw.githubusercontent.com/cr7258/test-context/refs/heads/main/README.md
          serviceName: github.dns
          servicePort: 443
        failover:
          enabled: true
          failureThreshold: 4
          successThreshold: 7
          healthCheckModel: gpt-3
    service:
    - qwen.dns
  url: oci://cr7258/ai-proxy:failover-v86
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
     higress.io/destination: |
      30% claude.dns
      30% groq.dns
      40% qwen.dns
  labels:
    higress.io/resource-definer: higress
  name: test-ai
  namespace: higress-system
spec:
  ingressClassName: higress
  rules:
  - host: test-ai.com
    http:
      paths:
      - backend:
          resource:
            apiGroup: networking.higress.io
            kind: McpBridge
            name: default
        path: /
        pathType: Prefix
---
apiVersion: networking.higress.io/v1
kind: McpBridge
metadata:
  name: default
  namespace: higress-system
spec:
  registries:
  - domain: api.groq.com
    name: groq
    port: 443
    type: dns
    protocol: https
    sni: api.groq.com
  - domain: api.anthropic.com
    name: claude
    port: 443
    type: dns
    protocol: https
    sni: api.anthropic.com
  - domain: dashscope.aliyuncs.com
    name: qwen
    port: 443
    type: dns
    protocol: https
    sni: dashscope.aliyuncs.com
  - domain: raw.githubusercontent.com
    name: github
    port: 443
    type: dns
    protocol: https
    sni: raw.githubusercontent.com

现在只对 qwen, grop, claude 这 3 个 provider 的代码做了对应的适配,如果没有其他问题的话,后面我把其他的 provider 也对应修改一下。

@cr7258 cr7258 requested a review from CH3CHO November 4, 2024 07:45
@cr7258
Copy link
Collaborator Author

cr7258 commented Nov 14, 2024

@johnlanni @CH3CHO 所有 provider 都已经调整完毕,另外有两个新的改动:

  • 新增 TransformRequestBodyHeadersHandler 接口:对于在 OnRequestBody 也会修改 header 的 provider,可以选择实现 TransformRequestBodyHeadersHandler 接口。
  • 在 Provider 接口下新增 GetApiName 方法, 我已经为所有 provider 实现了这个方法,用于在 protocol: original 的情况下,判断 apiName

Copy link
Collaborator

@CH3CHO CH3CHO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@johnlanni johnlanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

@johnlanni johnlanni merged commit d24123a into alibaba:main Nov 16, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AI apitoken failover 机制设计
4 participants