new version, with added support for token matching

kerighan · Oct 13, 2020 · e2ebcb8 · e2ebcb8
1 parent 876021e
commit e2ebcb8
Show file tree

Hide file tree

Showing 7 changed files with 225 additions and 9,881 deletions.
diff --git a/README.md b/README.md
@@ -7,7 +7,6 @@ These instructions will get you a copy of the project up and running on your loc
 
 ### Prerequisites
 
-* cython
 * unidecode
 
 
@@ -16,7 +15,6 @@ These instructions will get you a copy of the project up and running on your loc
 You can install the method by typing:
 ```
 pip install unidecode -U
-pip install cython -U
 pip install eldar
 ```
 
@@ -73,6 +71,20 @@ df = df[df.content.apply(eldar)]
 print(df)
 ```
 
+### Parameters
+
+There are three parameters that you can adjust in the query builder.
+By default:
+```python
+Query(..., ignore_case=True, ignore_accent=True, match_word=True)
+```
+Let the query be ```query = '"movie"'```:
+
+* If `ignore_case` is True, the documents "Movie" and "movie" will be matched. If False, only "movie" will be matched.
+* If `ignore_accent` is True, the documents "mövie" will be matched.
+* If `match_word` is True, the document will be tokenized and the query terms will have to match exactly. If set to False, the documents "movies" and "movie" will be matched. Setting this option to True may slow down the query.
+
+
 
 ## Authors