Skip to content

Commit

Permalink
new version, with added support for token matching
Browse files Browse the repository at this point in the history
  • Loading branch information
kerighan committed Oct 13, 2020
1 parent 876021e commit e2ebcb8
Show file tree
Hide file tree
Showing 7 changed files with 225 additions and 9,881 deletions.
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ These instructions will get you a copy of the project up and running on your loc

### Prerequisites

* cython
* unidecode


Expand All @@ -16,7 +15,6 @@ These instructions will get you a copy of the project up and running on your loc
You can install the method by typing:
```
pip install unidecode -U
pip install cython -U
pip install eldar
```

Expand Down Expand Up @@ -73,6 +71,20 @@ df = df[df.content.apply(eldar)]
print(df)
```

### Parameters

There are three parameters that you can adjust in the query builder.
By default:
```python
Query(..., ignore_case=True, ignore_accent=True, match_word=True)
```
Let the query be ```query = '"movie"'```:

* If `ignore_case` is True, the documents "Movie" and "movie" will be matched. If False, only "movie" will be matched.
* If `ignore_accent` is True, the documents "mövie" will be matched.
* If `match_word` is True, the document will be tokenized and the query terms will have to match exactly. If set to False, the documents "movies" and "movie" will be matched. Setting this option to True may slow down the query.



## Authors

Expand Down
Loading

0 comments on commit e2ebcb8

Please sign in to comment.