Skip to content

向量索引配置

xuxijie edited this page Dec 9, 2022 · 3 revisions

不带类目的向量配置

{
    "attributes": [
        "id"
    ],
    "summarys": {
        "compress": true,
        "summary_fields": [
            "id"
        ]
    },
    "fields": [
        {
            "field_name": "id",
            "field_type": "UINT32"
        },
        {
            "field_name": "vector",
            "field_type": "STRING"
        },
        {
            "field_name": "DUP_id",
            "field_type": "RAW"
        },
        {
            "field_name": "DUP_vector",
            "field_type": "RAW"
        }
    ],
    "index":[
        {
            "index_fields":[
                {
                    "boost":1,
                    "field_name":"DUP_id"
                },
                {
                    "boost":1,
                    "field_name":"DUP_vector"
                }
            ],
            "index_name":"vector_index",
            "index_type":"CUSTOMIZED",
            "indexer":"aitheta_indexer",
            "parameters":{
                "use_linear_threshold":"10000",
                "build_metric_type":"l2",
                "search_metric_type":"ip",
                "use_dynamic_params":"1",
                "dimension":"128",
                "index_type":"hc"
            }
        }
    ]
}

带有类目的向量配置

引入分类的目的是为了支持按照分类进行向量检索,比如一个图片有不同的类别,如果不指定分类构建向量索引,只是对检索出来的向量进行过滤很可能会出现无结果的情况。查询时需要指定待检索的类目,如果不指定的话引擎会查询所有类目,性能会急剧下降。

{
    "attributes": [
        "id",
        "category"
    ],
    "summarys": {
        "compress": true,
        "summary_fields": [
            "id",
            "category"
        ]
    },
    "fields": [
        {
            "field_name": "id",
            "field_type": "UINT32"
        },
        {
            "field_name":"category",
            "field_type":"UINT32"
        },
        {
            "field_name": "vector",
            "field_type": "STRING"
        },
        {
            "field_name": "DUP_id",
            "field_type": "RAW"
        },
        {
            "field_name": "DUP_category",
            "field_type": "RAW"
        },
        {
            "field_name": "DUP_vector",
            "field_type": "RAW"
        }
    ],
    "index":[
        {
            "index_fields":[
                {
                    "boost":1,
                    "field_name":"DUP_id"
                },
                {
                    "boost": 1,
                    "field_name": "DUP_category"
                  
                },
                {
                    "boost":1,
                    "field_name":"DUP_vector"
                }
            ],
            "index_name":"vector_index",
            "index_type":"CUSTOMIZED",
            "indexer":"aitheta_indexer",
            "parameters":{
                "use_linear_threshold":"10000",
                "build_metric_type":"l2",
                "search_metric_type":"ip",
                "use_dynamic_params":"1",
                "dimension":"128",
                "index_type":"hc"
            }
        }
    ]
}
  • field_name:构建向量索引的字段,必须为RAW类型,最少包括2个字段,一个是主键(或者是主键的hash值),字段值必须是整数,另一个是包含向量的字段。如果需要对向量按照分类构建索引,可以新增一个分类字段,字段类型为RAW类型,字段值为整数。这些字段在索引中的顺序必须和在fields配置的顺序相同,而且必须是按照主键、类目(如果有)、向量这个顺序。
  • index_name:向量索引的名称。
  • index_type:索引类型,必须为CUSTOMIZED。
  • indexer:构建向量索引的插件,目前仅支持aitheta_indexer。
  • parameters:向量索引的构建参数和查询参数,不同索引类型参数不同,详见向量索引构建参数
Clone this wiki locally