Skip to content

Commit

Permalink
Merge pull request #14 from letuananh/coolisf-0.2.3
Browse files Browse the repository at this point in the history
Coolisf 0.2.3
  • Loading branch information
letuananh authored May 12, 2021
2 parents 5a1fe79 + 8761539 commit 847af8d
Show file tree
Hide file tree
Showing 51 changed files with 215 additions and 175 deletions.
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
include README.md
include LICENSE
include requirements*.txt
include coolisf/dao/scripts/*.sql
include coolisf/data/*.gz
103 changes: 53 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,80 @@
Integrated Semantic Framework (intsem.fx)
=========
# Integrated Semantic Framework (intsem.fx)

# Prerequisite
A Python 3 implementation of the [Integrated Semantic Framework](https://osf.io/9udjk/) that provides computational deep semantic analysis by combining structural semantics from construction grammars and lexical semantics from ontologies in a single representation.

* Python >= 3.5
* Required packages (see requirements.txt)
* English Resource Grammar (rev >= 26135)
* NLTK data
* LeLESK data (see https://github.com/letuananh/lelesk)
# Install

# Installation
`coolisf` only works on Linux distributions at the moment (built and tested on Fedora and Ubuntu Linux).

- Install `coolisf` package from [PyPI](https://pypi.org/project/coolisf/) using pip

* Download and install ACE >= 0.9.26 from: http://sweaglesw.org/linguistics/ace/
* Download ERG trunk from SVN `svn checkout http://svn.delph-in.net/erg/trunk`
* Build erg.dat `ace -g ace/config.tdl -G erg.dat`
* Download the latest release from: https://github.com/letuananh/intsem.fx/releases, unzip it to a folder and run the `isf` command
* Download NLTK data
```
import nltk
nltk.download("book")
pip install coolisf
```

- Create coolisf data folder at `/home/user/local/isf/data`
- Download ace-0.9.26 binary from https://osf.io/x52fy/ to `/home/user/bin/ace`. Make sure that you can run ace by

Tips:
`pip` is recommended for installing required packages
```
python -m pip install -r requirements.txt
```bash
[isf]$ ~/bin/ace -V
ACE version 0.9.26
compiled at 18:48:50 on Sep 14 2017
```

- Install [lelesk](https://pypi.org/project/lelesk/) and yawlib with data
- Download coolisf lexical rules database from https://osf.io/qn4wz/ and extract it to `/home/user/local/isf/data/lexrules.db`
- Download grammar files (erg.dat, jacy.dat, virgo.dat, etc.) and copy them to `/home/user/local/isf/data/grammars/`

# Using ISF
The final data folder should look something like this

```
cd ~/workspace/intsem.fx
./isf parse data/sample.txt data/sample.out
/home/user/local/isf/data
├── grammars
│   ├── erg.dat
│   └── jacy.dat
├── lexrules.db
```

# Development
# Using ISF

WARNING: These are meant for developers who want to contribute to the codebase. If all you need is to run the ISF to process your data, please see the Installation section above instead.
To parse a sentence, use coolisf `text` command

2 - Check out the code of intsem.fx to ~/workspace with:
```bash
python -m coolisf text "I drink green tea." -f dmrs

:`I drink green tea.` (len=5)
------------------------------------------------------------
dmrs {
10000 [pron<0:1> x ind=+ num=sg pers=1 pt=std];
10001 [pronoun_q<0:1> x ind=+ num=sg pers=1 pt=std];
10002 [_drink_v_1_rel<2:7> e mood=indicative perf=- prog=- sf=prop tense=pres];
10003 [udef_q<8:18> x num=sg pers=3];
10004 [_green+tea_n_1_rel<8:18> x num=sg pers=3];
0:/H -> 10002;
10001:RSTR/H -> 10000;
10002:ARG1/NEQ -> 10000;
10002:ARG2/NEQ -> 10004;
10003:RSTR/H -> 10004;
}
# 10002 -> 01170052-v[drink/lelesk]
# 10004 -> 07935152-n[green tea/lelesk]
...
```
git clone --recursive https://github.com/letuananh/intsem.fx.git
```

3 - Check out ERG to your workspace folder and compile the grammar

This is complicated, read more here: http://moin.delph-in.net/TuanAnhLe/GramEng4Dummies
For batch processing, create a text file with each sentence on a separate line.
For example here is the content of the file `sample.txt`

Basically, I need the grammar file (erg.dat) to be located at ~/workspace/cldata/erg.dat

4 - Configure the application
```
cd ~/workspace/intsem.fx
./config.sh
```
To use ISF, please try
```
cd ~/workspace/intsem.fx
./isf --help
I drink green tea.
Sherlock Holmes has three guard dogs.
A soul is not a living thing.
Do you have any green tea chest?
```

Notes:
After that, run the following command and the output will be written to the file `demo_out.xml`

Use virtualenv to install required packages
```
python3 -m venv ~/isf_py3
. ~/isf_py3/bin/activate
```bash
python -m coolisf parse demo.txt -o demo_out.xml
```

Install these packages if you are using Fedora Linux:
```
sudo dnf install -y redhat-rpm-config gcc-c++
```
If you encounter any issue, please submit an issue at: https://github.com/letuananh/intsem.fx/issues
2 changes: 0 additions & 2 deletions config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,7 @@ git submodule init && git submodule update
# prerequisite packages
pip install -r requirements.txt -qq

link_folder `readlink -f ./modules/lelesk/lelesk` lelesk
link_folder `readlink -f ./modules/demophin` demophin
link_folder `readlink -f ./modules/yawlib/yawlib` yawlib

link_file `readlink -f ${WORKSPACE_FOLDER}/cldata/erg.dat` data/erg.dat
link_file `readlink -f ${WORKSPACE_FOLDER}/cldata/jacy.dat` data/jacy.dat
Expand Down
4 changes: 4 additions & 0 deletions coolisf/__main__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
# This code is a part of coolisf library: https://github.com/letuananh/intsem.fx
# :copyright: (c) 2014 Le Tuan Anh <[email protected]>
# :license: MIT, see LICENSE for more details.

from . import main
main.main()
6 changes: 3 additions & 3 deletions coolisf/__version__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@
__copyright__ = "Copyright (c) 2014, Le Tuan Anh <[email protected]>"
__credits__ = []
__license__ = "MIT License"
__description__ = "a Python package for providing computational deep semantic analysis by combining structural semantics from construction grammars and ontology-based lexical semantics in a single representation"
__description__ = "A Python 3 implementation of the Integrated Semantic Framework that provides computational deep semantic analysis by combining structural semantics from construction grammars and lexical semantics from ontologies in a single representation."
__url__ = "https://github.com/letuananh/intsem.fx/"
__issue__ = "https://github.com/letuananh/intsem.fx/"
__maintainer__ = "Le Tuan Anh"
__version_major__ = "0.2.3" # follow PEP-0440
__version__ = "{}beta".format(__version_major__)
__version_long__ = "{} - Beta".format(__version_major__)
__version__ = "{}b2".format(__version_major__)
__version_long__ = "{} - Beta 2".format(__version_major__)
__status__ = "4 - Beta"
2 changes: 1 addition & 1 deletion coolisf/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
import gzip
import logging

from chirptext import FileHelper
from texttaglib.chirptext import FileHelper
from lelesk.util import ptpos_to_wn


Expand Down
2 changes: 1 addition & 1 deletion coolisf/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
import os
import logging

from chirptext import FileHelper, AppConfig
from texttaglib.chirptext import FileHelper, AppConfig
from coolisf.common import write_file
from coolisf.data import read_config_template

Expand Down
2 changes: 1 addition & 1 deletion coolisf/dao/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
import logging
import json

from puchikarui import Schema, with_ctx
from texttaglib.puchikarui import Schema, with_ctx

from coolisf.model import Sentence

Expand Down
4 changes: 2 additions & 2 deletions coolisf/dao/corpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@
import os.path
import logging

from puchikarui import Schema, with_ctx
from chirptext import texttaglib as ttl
from texttaglib.puchikarui import Schema, with_ctx
from texttaglib.chirptext import ttl

from coolisf.util import is_valid_name
from coolisf.model import Corpus, Document, Sentence, Reading
Expand Down
2 changes: 1 addition & 1 deletion coolisf/dao/ruledb.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
import os.path
import logging

from puchikarui import with_ctx
from texttaglib.puchikarui import with_ctx

from coolisf.dao import CorpusDAOSQLite
from coolisf.model import LexUnit, RuleInfo, PredInfo, RulePred, Reading
Expand Down
4 changes: 2 additions & 2 deletions coolisf/dao/textcorpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@
import os
import json

from chirptext import FileHelper
from chirptext.chio import CSV
from texttaglib.chirptext import FileHelper
from texttaglib.chirptext.chio import CSV


# ----------------------------------------------------------------------
Expand Down
12 changes: 6 additions & 6 deletions coolisf/ergex.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,12 @@
from collections import defaultdict
import csv

from chirptext.leutile import Counter
from chirptext.leutile import FileHelper
from chirptext.leutile import TextReport
from chirptext.leutile import FileHub
from chirptext.leutile import Timer
from chirptext.leutile import header
from texttaglib.chirptext.leutile import Counter
from texttaglib.chirptext.leutile import FileHelper
from texttaglib.chirptext.leutile import TextReport
from texttaglib.chirptext.leutile import FileHub
from texttaglib.chirptext.leutile import Timer
from texttaglib.chirptext.leutile import header

from yawlib import YLConfig
from yawlib import WordnetSQL as WNSQL
Expand Down
2 changes: 1 addition & 1 deletion coolisf/ghub.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
import logging
from delphin.interfaces import ace

from chirptext import FileHelper
from texttaglib.chirptext import FileHelper

from coolisf.config import read_config
from coolisf.dao.cache import AceCache, ISFCache
Expand Down
6 changes: 3 additions & 3 deletions coolisf/gold_extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@
from lxml import etree


from chirptext.leutile import FileHelper
from chirptext import texttaglib as ttl
from chirptext import chio
from texttaglib.chirptext.leutile import FileHelper
from texttaglib.chirptext import texttaglib as ttl
from texttaglib.chirptext import chio
from lelesk import LeLeskWSD
from lelesk import LeskCache # WSDResources

Expand Down
2 changes: 1 addition & 1 deletion coolisf/lexsem.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@

from delphin.mrs.components import Pred

from chirptext import texttaglib as ttl
from texttaglib.chirptext import texttaglib as ttl
from yawlib import SynsetID


Expand Down
8 changes: 4 additions & 4 deletions coolisf/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@
import logging
import collections

from chirptext.cli import CLIApp, setup_logging
from chirptext import header, confirm, TextReport, FileHelper, Counter, Timer
from chirptext.leutile import is_number
from chirptext import texttaglib as ttl
from texttaglib.chirptext.cli import CLIApp, setup_logging
from texttaglib.chirptext import header, confirm, TextReport, FileHelper, Counter, Timer
from texttaglib.chirptext.leutile import is_number
from texttaglib.chirptext import texttaglib as ttl
from lelesk import LeLeskWSD
from lelesk import LeskCache # WSDResources

Expand Down
6 changes: 3 additions & 3 deletions coolisf/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@
from delphin.mrs.components import Pred
from delphin.mrs.components import normalize_pred_string

from chirptext.anhxa import update_obj
from chirptext.leutile import StringTool, header
from chirptext import texttaglib as ttl
from texttaglib.chirptext.anhxa import update_obj
from texttaglib.chirptext.leutile import StringTool, header
from texttaglib.chirptext import texttaglib as ttl
from yawlib import Synset
from lelesk import LeLeskWSD
from lelesk import LeskCache # WSDResources
Expand Down
2 changes: 1 addition & 1 deletion coolisf/morph.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
import logging
from collections import defaultdict as dd

from chirptext import FileHelper
from texttaglib.chirptext import FileHelper

from coolisf.dao.ruledb import LexRuleDB
from coolisf.config import read_config
Expand Down
4 changes: 2 additions & 2 deletions coolisf/processors/jp_adv.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,13 @@

import logging

from chirptext import TextReport
from texttaglib.chirptext import TextReport

from coolisf.model import Sentence
from .base import Processor
try:

from chirptext.deko import wakati
from texttaglib.chirptext.deko import wakati
from coolisf.shallow import JapaneseAnalyser
from jamdict import Jamdict
from jamdict.tools import dump_result
Expand Down
2 changes: 1 addition & 1 deletion coolisf/processors/jp_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
from .base import Processor

try:
from chirptext.deko import wakati
from texttaglib.chirptext.deko import wakati
from coolisf.shallow import JapaneseAnalyser
except:
logging.warning('chirptext.deko cannot be imported. JNLP mode is disabled')
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 847af8d

Please sign in to comment.