-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add and/or make content relevant to people and planet more discoverable #1717
Comments
Analyze words to tokensFirst, convert the synonyms to tokens. For example (change analyzer to "spanish" if needed, and change the text to the phrase to analyze): curl -n -H "Content-Type: application/json" -H "Accept: application/json" \
https://standard.open-contracting.org/search/_analyze \
--data '{"analyzer":"english","text":"sustainable"} Create a synonyms setRun this as root on the server to create an English synonyms set: curl -n -X PUT -H "Content-Type: application/json" -H "Accept: application/json" \
"https://standard.open-contracting.org/search/_synonyms/ocdssynonyms_en" \
--data '{"synonyms_set":[{"synonyms":"gender, women"},{"synonyms":"green, spp, sustain"},{"synonyms":"sme => small busi"}]}' Check that the synonyms set exists: curl -n "https://standard.open-contracting.org/search/_synonyms/ocdssynonyms_en" Note: "sme, small busi" wasn't working, though the _validate API looked fine:
"sme => small busi" works, but the analyzer changes the query to not search for "sme" at all:
OCDS 1.1 doesn't mention "SME" and OCDS 1.2 always expands it on the same page, so this behavior is fine. Choose an approach for configuring the search analyzerAs described at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-with-synonyms.html#synonyms-synonym-token-filters, we want a "synonym graph", not a "synonym" token filter, because we want multi-word synonyms (like "small business"). Per the note at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html, the "synonym graph" filter can be applied "as part of a search analyzer only," not during indexing. The search analyzer is determined as described at https://www.elastic.co/guide/en/elasticsearch/guide/current/_controlling_analysis.html#_default_analyzers (NOT as described at https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html#specify-search-analyzer). One option is to set the the search query's Instead, we'll set the search_analyzer mapping parameter for the field (ocds-index doesn't set this, only While trying to figure this out, I set the Configure a synonym graph search analyzer in the index settingsI checked the index settings, and that field isn't presently set. curl -n "https://standard.open-contracting.org/search/ocdsindex_en/_settings" I updated the index settings for the English index (based on the english language analyzer): curl -n -X POST "https://standard.open-contracting.org/search/ocdsindex_en/_close"
curl -n -X PUT -H "Content-Type: application/json" -H "Accept: application/json" \
"https://standard.open-contracting.org/search/ocdsindex_en/_settings" \
--data '{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": ["example"]
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
},
"english_synonyms": {
"type": "synonym_graph",
"synonyms_set": "ocdssynonyms_en",
"updateable": true
}
},
"analyzer": {
"default": {
"type": "standard"
},
"default_search": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer",
"english_synonyms"
]
}
}
}
}
}'
curl -n -X POST "https://standard.open-contracting.org/search/ocdsindex_en/_open"
For Spanish, change:
Configure search_analyzer mapping parameterscurl -n -X PUT -H "Content-Type: application/json" -H "Accept: application/json" \
"https://standard.open-contracting.org/search/ocdsindex_en/_mapping" \
--data '{
"properties": {
"title": {"type": "text", "analyzer": "english", "search_analyzer": "default_search"},
"text": {"type": "text", "analyzer": "english", "search_analyzer": "default_search"}
}
}' Test the synonymsCheck that the settings are applied: curl -n "https://standard.open-contracting.org/search/ocdsindex_en/_settings" Check that the mapping is updated: curl -n "https://standard.open-contracting.org/search/ocdsindex_en/_mapping" Check that the synonyms work when applying the filter only:
Check that the synonyms work when performing a search:
Automation notesSince ocds-index only creates a given index once, I think it's easier to just update the synonyms manually as written here. Since the synonyms filter is |
The text was updated successfully, but these errors were encountered: