-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
26 changed files
with
440 additions
and
310 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,7 +17,9 @@ The NLP allows users to parse dataset using the spacy model from python in which | |
The following python packages are required: | ||
1. numpy | ||
2. beautifulsoup4 | ||
3. spacy | ||
3. spacy | ||
|
||
* Tests were run using spacy version 2.2.1 | ||
|
||
To install these packages with | ||
|
||
|
@@ -27,30 +29,72 @@ pip install -r requirements.txt | |
``` | ||
or with conda | ||
```bash | ||
conda install --file requirements.txt | ||
conda install -c conda-forge --file requirements.txt | ||
``` | ||
|
||
* Download the English model using ```python -m spacy download en``` | ||
|
||
|
||
Other languages that spacy supports can be found at https://spacy.io/usage/models#languages | ||
|
||
To use the languages in the alpha stage of developement in spacy the following steps can be taken: | ||
|
||
To Download the Chinese model the jieba must be installed | ||
|
||
pip | ||
```bash | ||
pip install jieba | ||
``` | ||
|
||
To download the Japanese model mecab must be installed | ||
|
||
pip | ||
```bash | ||
pip install mecab-python3 | ||
``` | ||
|
||
* spacy_hunspell is not a requirement to run these scripts, but can be installed using the following methods | ||
|
||
Linux | ||
```bash | ||
sudo apt-get install libhunspell-dev hunspell | ||
pip install spacy_hunspell | ||
``` | ||
|
||
mac | ||
```bash | ||
wget https://iweb.dl.sourceforge.net/project/wordlist/speller/2019.10.06/hunspell-en_US-2019.10.06.zip; | ||
unzip hunspell-en_US-2019.10.06; sudo mv en_US.dic en_US.aff /Library/Spelling/; | ||
brew install hunspell; | ||
export C_INCLUDE_PATH=/usr/local/include/hunspell; | ||
sudo ln -sf /usr/local/lib/libhunspell-1.7.a /usr/local/lib/libhunspell.a; | ||
sudo ln -sf /usr/local/Cellar/hunspell/1.7.0_2/lib/libhunspell-1.7.dylib /usr/local/Cellar/hunspell/1.7.0_2/lib/libhunspell.dylib; | ||
CFLAGS=$(pkg-config --cflags hunspell) LDFLAGS=$(pkg-config --libs hunspell) pip install hunspell==0.5.0 | ||
``` | ||
|
||
At the moment spacy_hunspell does not support installation for windows. More information can be found at https://github.com/tokestermw/spacy_hunspell | ||
|
||
## Installation | ||
Run tests with | ||
|
||
```bash | ||
q test.q | ||
``` | ||
|
||
Place the library file in `$QHOME` and load `nlp/init.q` | ||
Place the library file in `$QHOME` and load into a q instance using | ||
|
||
```q | ||
q)\l nlp/init.q | ||
Loading utils.q | ||
Loading regex.q | ||
Loading sent.q | ||
Loading parser.q | ||
Loading time.q | ||
Loading date.q | ||
Loading email.q | ||
Loading cluster.q | ||
Loading nlp.q | ||
q)\l nlp/nlp.q | ||
q).nlp.loadfile`:init.q | ||
Loading init.q | ||
Loading code/utils.q | ||
Loading code/regex.q | ||
Loading code/sent.q | ||
Loading code/parser.q | ||
Loading code/time.q | ||
Loading code/date.q | ||
Loading code/email.q | ||
Loading code/cluster.q | ||
Loading code/nlp_code.q | ||
q).nlp.findTimes"I went to work at 9:00am and had a coffee at 10:20" | ||
09:00:00.000 "9:00am" 18 24 | ||
10:20:00.000 "10:20" 45 50 | ||
|
@@ -73,15 +117,15 @@ If you have [Docker installed](https://www.docker.com/community-edition) you can | |
KDB+ 3.5 2018.04.25 Copyright (C) 1993-2018 Kx Systems | ||
l64/ 4()core 7905MB kx 0123456789ab 172.17.0.2 EXPIRE 2018.12.04 [email protected] KOD #0000000 | ||
|
||
Loading utils.q | ||
Loading regex.q | ||
Loading sent.q | ||
Loading parser.q | ||
Loading time.q | ||
Loading date.q | ||
Loading email.q | ||
Loading cluster.q | ||
Loading nlp.q | ||
Loading code/utils.q | ||
Loading code/regex.q | ||
Loading code/sent.q | ||
Loading code/parser.q | ||
Loading code/time.q | ||
Loading code/date.q | ||
Loading code/email.q | ||
Loading code/cluster.q | ||
Loading code/nlp_code.q | ||
q).nlp.findTimes"I went to work at 9:00am and had a coffee at 10:20" | ||
09:00:00.000 "9:00am" 18 24 | ||
10:20:00.000 "10:20" 45 50 | ||
|
@@ -97,9 +141,7 @@ If you have [Docker installed](https://www.docker.com/community-edition) you can | |
|
||
Documentation is available on the [nlp](https://code.kx.com/v2/ml/nlp/) homepage. | ||
|
||
|
||
|
||
|
||
|
||
|
||
## Status | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
Oops, something went wrong.