Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [benchmark][benchmark] Load collection raises OOM when enabled all mmap params #38545

Closed
1 task done
wangting0128 opened this issue Dec 18, 2024 · 2 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@wangting0128
Copy link
Contributor

wangting0128 commented Dec 18, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20241217-1ec85843-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar  
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc124
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouramf-hjlmb

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-hjlmb-86-6342-etcd-0                                      1/1     Running     0               9h      10.104.20.23    4am-node22   <none>           <none>
fouramf-hjlmb-86-6342-etcd-1                                      1/1     Running     0               9h      10.104.17.237   4am-node23   <none>           <none>
fouramf-hjlmb-86-6342-etcd-2                                      1/1     Running     0               9h      10.104.21.234   4am-node24   <none>           <none>
fouramf-hjlmb-86-6342-milvus-datanode-7f845787d6-57w6q            1/1     Running     2 (9h ago)      9h      10.104.13.126   4am-node16   <none>           <none>
fouramf-hjlmb-86-6342-milvus-indexnode-dbf56c885-28f58            1/1     Running     2 (9h ago)      9h      10.104.25.81    4am-node30   <none>           <none>
fouramf-hjlmb-86-6342-milvus-indexnode-dbf56c885-m2bvz            1/1     Running     2 (9h ago)      9h      10.104.23.53    4am-node27   <none>           <none>
fouramf-hjlmb-86-6342-milvus-indexnode-dbf56c885-ptctp            1/1     Running     0               9h      10.104.6.179    4am-node13   <none>           <none>
fouramf-hjlmb-86-6342-milvus-indexnode-dbf56c885-vbmbn            1/1     Running     0               9h      10.104.9.198    4am-node14   <none>           <none>
fouramf-hjlmb-86-6342-milvus-mixcoord-6b554dc5d7-z7l22            1/1     Running     0               9h      10.104.9.200    4am-node14   <none>           <none>
fouramf-hjlmb-86-6342-milvus-proxy-cfdfcb7b5-8cn52                1/1     Running     0               9h      10.104.9.196    4am-node14   <none>           <none>
fouramf-hjlmb-86-6342-milvus-querynode-855c98bd99-glb2n           1/1     Running     2 (9h ago)      9h      10.104.16.73    4am-node21   <none>           <none>
fouramf-hjlmb-86-6342-minio-0                                     1/1     Running     0               9h      10.104.20.19    4am-node22   <none>           <none>
fouramf-hjlmb-86-6342-minio-1                                     1/1     Running     0               9h      10.104.21.226   4am-node24   <none>           <none>
fouramf-hjlmb-86-6342-minio-2                                     1/1     Running     0               9h      10.104.33.111   4am-node36   <none>           <none>
fouramf-hjlmb-86-6342-minio-3                                     1/1     Running     0               9h      10.104.17.236   4am-node23   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-bookie-0                           1/1     Running     0               9h      10.104.23.56    4am-node27   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-bookie-1                           1/1     Running     0               9h      10.104.20.20    4am-node22   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-bookie-2                           1/1     Running     0               9h      10.104.21.233   4am-node24   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-bookie-init-xqnds                  0/1     Completed   0               9h      10.104.6.178    4am-node13   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-broker-0                           1/1     Running     0               9h      10.104.9.197    4am-node14   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-broker-1                           1/1     Running     0               9h      10.104.14.119   4am-node18   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-proxy-0                            1/1     Running     0               9h      10.104.9.199    4am-node14   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-proxy-1                            1/1     Running     0               9h      10.104.33.106   4am-node36   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-pulsar-init-q5qw2                  0/1     Completed   0               9h      10.104.9.195    4am-node14   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-recovery-0                         1/1     Running     0               9h      10.104.20.13    4am-node22   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-zookeeper-0                        1/1     Running     0               9h      10.104.20.18    4am-node22   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-zookeeper-1                        1/1     Running     0               9h      10.104.33.110   4am-node36   <none>           <none>
fouramf-hjlmb-86-6342-pulsarv3-zookeeper-2                        1/1     Running     0               9h      10.104.21.228   4am-node24   <none>           <none>
截屏2024-12-18 11 43 41

client log:
截屏2024-12-18 11 19 50

Expected Behavior

No response

Steps To Reproduce

1. create a collection with fields: "id", "float_vector"(128dim), "float_vector_1"(768dim),"sparse_float_vector","bfloat16_vector"(256dim),"int64_1","varchar_1"
2. build index
   - HNSW: float_vector
   - DISKANN: float_vector_1
   - SPARSE_INVERTED_INDEX: sparse_float_vector
   - IVF_SQ8: bfloat16_vector
   - INVERTED: int64_1, varchar_1
3. insert 20m data
4. flush 
5. rebuild index
6. load collection <- raises error

Milvus Log

No response

Anything else?

server config: fouramf-server-all-fields-mmap-cluster

    extraConfigFiles:
      user.yaml: |+
        queryNode:
          mmap:
            vectorField: true
            vectorIndex: true
            scalarField: true
            scalarIndex: true
    queryNode:
      resources:
        limits:
          cpu: '32'
          memory: 32Gi
        requests:
          cpu: '16'
          memory: 32Gi
      replicas: 1
      nodeSelector:
        node-role/nvme: 'true'
    indexNode:
      resources:
        limits:
          cpu: '4.0'
          memory: 16Gi
        requests:
          cpu: '2.0'
          memory: 4Gi
      replicas: 4
    dataNode:
      resources:
        limits:
          cpu: '2.0'
          memory: 16Gi
        requests:
          cpu: '2.0'
          memory: 5Gi

client config: fouramf-client-all-vector-types-dql-ddl

    dataset_params:
      metric_type: L2
      dim: 128
      scalars_index:
        int64_1:
          index_type: INVERTED
        varchar_1:
          index_type: INVERTED
      vectors_index:
        float_vector_1:
          index_type: DISKANN
          index_param: {}
          metric_type: IP
        sparse_float_vector:
          index_type: SPARSE_INVERTED_INDEX
          index_param:
            drop_ratio_build: 0.2
          metric_type: IP
        bfloat16_vector:
          index_type: IVF_SQ8
          index_param:
            nlist: 2048
          metric_type: L2
      scalars_params:
        float_vector_1:
          params:
            dim: 768
          other_params:
            dataset: laion2b_multi
            column_name: float32_vector
        sparse_float_vector:
          other_params:
            dim: 10000
            sparse_range:
            - 1
            - 20
        bfloat16_vector:
          params:
            dim: 256
      dataset_name: sift
      dataset_size: 20m
      ni_per: 10000
    collection_params:
      other_fields:
      - float_vector_1
      - sparse_float_vector
      - bfloat16_vector
      - int64_1
      - varchar_1
      shards_num: 2
    index_params:
      index_type: HNSW
      index_param:
        M: 8
        efConstruction: 200
    concurrent_params:
      concurrent_number: 20
      during_time: 24h
      interval: 20
    concurrent_tasks:
    - type: scene_hybrid_search_test
      weight: 1
      params:
        nq: 2
        top_k: 5
        reqs:
        - search_param:
            nprobe: 128
          anns_field: float_vector
          expr: bool_1 == True
          top_k: 100
        - search_param:
            nprobe: 32
          anns_field: binary_vector_scene_hybrid_search_test_1
          expr: bool_1 != True
          top_k: 10
        - search_param:
            search_list: 30
          anns_field: float16_vector_scene_hybrid_search_test_2
          expr: int64_1 >= 1500
          top_k: 5
        - search_param:
            drop_ratio_search: 0.1
          anns_field: sparse_float_vector_scene_hybrid_search_test_3
          expr: varchar_1 like "1%"
          top_k: 10
        rerank:
          RRFRanker: []
        output_fields:
        - "*"
        timeout: 600
        random_data: true
        dataset: local
        dim: 128
        shards_num: 2
        data_size: 3000
        nb: 3000
        index_type: IVF_SQ8
        index_param:
          nlist: 2048
        metric_type: L2
        other_fields:
        - binary_vector_scene_hybrid_search_test_1
        - float16_vector_scene_hybrid_search_test_2
        - sparse_float_vector_scene_hybrid_search_test_3
        - int64_1
        - bool_1
        - varchar_1
        replica_number: 1
        scalars_params:
          binary_vector_scene_hybrid_search_test_1:
            params:
              dim: 512
            other_params:
              dataset: binary
          float16_vector_scene_hybrid_search_test_2:
            params:
              dim: 64
        scalars_index:
          int64_1: {}
          bool_1:
            index_type: BITMAP
          varchar_1:
            index_type: INVERTED
        vectors_index:
          binary_vector_scene_hybrid_search_test_1:
            index_type: BIN_IVF_FLAT
            index_param:
              nlist: 2048
            metric_type: JACCARD
          float16_vector_scene_hybrid_search_test_2:
            index_type: DISKANN
            index_param: {}
            metric_type: IP
          sparse_float_vector_scene_hybrid_search_test_3:
            index_type: SPARSE_WAND
            index_param:
              drop_ratio_build: 0.2
            metric_type: IP
        hybrid_search_counts: 10
    - type: scene_test
      weight: 1
      params:
        dim: 128
        data_size: 3000
        nb: 3000
        index_type: IVF_SQ8
        index_param:
          nlist: 2048
        metric_type: L2
    - type: scene_test_partition_hybrid_search
      weight: 1
      params:
        nq: 1
        top_k: 1
        reqs:
        - search_param:
            ef: 32
          anns_field: float_vector
          top_k: 10
        - search_param:
            search_list: 30
          anns_field: float_vector_1
          top_k: 10
        - search_param:
            drop_ratio_search: 0.3
          anns_field: sparse_float_vector
          top_k: 30
        - search_param:
            nprobe: 16
          anns_field: bfloat16_vector
          top_k: 400
        rerank:
          RRFRanker: []
        output_fields:
        - "*"
        timeout: 6000
        random_data: true
        hybrid_search_counts: 10
        data_size: 3000
        ni: 3000
    - type: search
      weight: 1
      params:
        nq: 1000
        top_k: 1
        search_param:
          nprobe: 1000
        expr: int64_1 >= 0
        timeout: 6000
        random_data: true
        partition_names:
        - _default
    - type: hybrid_search
      weight: 1
      params:
        nq: 1
        top_k: 100
        reqs:
        - search_param:
            ef: 32
          anns_field: float_vector
          expr: int64_1 > 100000
          top_k: 10
        - search_param:
            search_list: 30
          anns_field: float_vector_1
          expr: id < 900000
          top_k: 10
        - search_param:
            drop_ratio_search: 0.3
          anns_field: sparse_float_vector
          expr: varchar_1 > "1"
          top_k: 30
        - search_param:
            nprobe: 16
          anns_field: bfloat16_vector
          top_k: 400
        rerank:
          WeightedRanker:
          - 0.85
          - 0.95
          - 0.51
          - 0.32
        output_fields:
        - "*"
        partition_names:
        - _default
        timeout: 6000
        random_data: true
    - type: query
      weight: 1
      params:
        expr: 'int64_1 > -1 && '
        output_fields:
        - "*"
        partition_names:
        - _default
        limit: 10
        timeout: 6000
        custom_expr: " {0} < id < {0} + 1000000"
        custom_range:
        - 0
        - 20000000
@wangting0128 wangting0128 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Dec 18, 2024
@wangting0128 wangting0128 added this to the 2.5.0 milestone Dec 18, 2024
@yanliang567 yanliang567 removed their assignment Dec 18, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 18, 2024
@wangting0128
Copy link
Contributor Author

verifying

@wangting0128
Copy link
Contributor Author

verification passed

argo task:fouramf-hpxxf
image: master-20241219-3d360c06-amd64

截屏2024-12-20 17 59 42 截屏2024-12-20 18 01 22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants