Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于 keep_first_letter 和 keep_joined_full_pinyin 在 suggest completion 的一些疑惑 #272

Open
zhengpq opened this issue May 28, 2022 · 0 comments

Comments

@zhengpq
Copy link

zhengpq commented May 28, 2022

在实际使用中发现两个有趣的现象

  • 输入 first letter 的情况没有办法匹配到结果
  • 输入 full pinyin 的情况下,在输入过程中如果某个字的拼音没有输全的话也会匹配不到

具体见下方:

索引

{
  "settings": {
    "analysis": {
      "analyzer": {
        "pinyin_ik": {
          "tokenizer": "ik_smart",
          "filter": [
            "py"
          ]
        },
        "pinyin_keyword": {
          "tokenizer": "keyword",
          "filter": [
            "py"
          ]
        },
        "pinyin_keyword_full": {
          "tokenizer": "keyword",
          "filter": [
            "py_full"
          ]
        }
      },
      "filter": {
        "py": {
          "type": "pinyin",
          "keep_original": true,
          "keep_first_letter": true,
          "keep_full_pinyin": true,
          "keep_joined_full_pinyin": true,
          "keep_none_chinese_in_joined_full_pinyin": true,
          "limit_first_letter_length": 16
        },
        "py_full": {
          "type": "pinyin",
          "keep_original": true,
          "keep_first_letter": false,
          "keep_full_pinyin": true,
          "keep_joined_full_pinyin": true,
          "keep_none_chinese_in_joined_full_pinyin": true,
          "limit_first_letter_length": 16
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "page_id": {
        "type": "integer"
      },
      "page_title": {
        "type": "text",
        "analyzer": "pinyin",
        "search_analyzer": "ik_smart",
        "fields": {
          "suggest": {
            "type": "completion",
            "analyzer": "pinyin_keyword",
            "search_analyzer": "pinyin_keyword_full"
          }
        }
      },
    }
  }
}

### 1、首字母没有办法匹配到

文章标题分析结果

POST wxad-page-new/_analyze
{
  "analyzer": "pinyin_keyword",
  "text": "公众号流量主基础介绍"
}

// 结果,部分无关的 token 删除

{
  "tokens" : [
    {
      "token" : "gzhllzjcjs",
      "start_offset" : 0,
      "end_offset" : 10,
      "type" : "word",
      "position" : 9
    }
  ]
}

搜索及结果

POST wxad-page-new/_search?pretty
{
  "_source": {
    "includes": [
      "page_title"
    ]
  },
  "suggest": {
    "title-suggest": {
      "prefix": "gzhl",
      "completion": {
        "field": "page_title.suggest",
        "size": 100
      }
    }
  }
}

// 结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title-suggest" : [
      {
        "text" : "gzhl",
        "offset" : 0,
        "length" : 4,
        "options" : [ ]
      }
    ]
  }
}

### 2、输入全部拼音的问题

文章标题分析结果

POST wxad-page-new/_analyze
{
  "analyzer": "pinyin_keyword",
  "text": "公众号流量主基础介绍"
}

// 结果,部分无关的 token 删除

{
  "tokens" : [
   {
      "token" : "gongzhonghaoliuliangzhujichujieshao",
      "start_offset" : 0,
      "end_offset" : 10,
      "type" : "word",
      "position" : 0
    }
  ]
}

搜索及结果——能够匹配到的情况

POST wxad-page-new/_search?pretty
{
  "_source": {
    "includes": [
      "page_title"
    ]
  },
  "suggest": {
    "title-suggest": {
      "prefix": "gongzhong",
      "completion": {
        "field": "page_title.suggest",
        "size": 100
      }
    }
  }
}

// 结果

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title-suggest" : [
      {
        "text" : "gongzhong",
        "offset" : 0,
        "length" : 9,
        "options" : [
        {
            "text" : "公众号流量主基础介绍",
            "_index" : "wxad-page-new",
            "_type" : "_doc",
            "_id" : "259",
            "_score" : 1.0,
            "_source" : {
              "page_title" : "公众号流量主基础介绍"
            }
          }
        ]
      }
    ]
  }
}

搜索及结果——匹配不到的情况

POST wxad-page-new/_search?pretty
{
  "_source": {
    "includes": [
      "page_title"
    ]
  },
  "suggest": {
    "title-suggest": {
      "prefix": "gongzhon",
      "completion": {
        "field": "page_title.suggest",
        "size": 100
      }
    }
  }
}

// 结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title-suggest" : [
      {
        "text" : "gongzhon",
        "offset" : 0,
        "length" : 8,
        "options" : [ ]
      }
    ]
  }
}

差了一个 g 就匹配不到,让我跟感到很费解。

由于是新手,不知道这个是 pinyin 这个插件的特性还是 es 本身的一些特性,如果有哪位小伙伴知道这个问题的解决方案,麻烦告知一下,感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant