-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
868 changed files
with
4,446 additions
and
17,165 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: Canary package tests | ||
|
||
on: | ||
push: | ||
branches: [ master, development ] | ||
pull_request: | ||
branches: [ master, development ] | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
python-version: [ 3.8, 3.9 ] | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v2 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
- name: Install dependencies and run tests | ||
run: | | ||
python -m pip install --upgrade pip | ||
pip install . | ||
cd tests | ||
python3 -m unittest * |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,148 @@ | ||
# Development files | ||
*.swp | ||
.DS_Store | ||
/.idea | ||
/.vscode | ||
canary/_data/datasets/ukp/ukp_sentential_argument_mining_corpus/data/complete/** | ||
docs/Pipfile | ||
docs/Pipfile.lock | ||
|
||
## https://github.com/github/gitignore/blob/master/Python.gitignore | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
cover/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
.pybuilder/ | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
# For a library or package, you might want to ignore these files since the code is | ||
# intended to run in multiple environments; otherwise, check them in: | ||
# .python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# pytype static type analyzer | ||
.pytype/ | ||
|
||
# Cython debug symbols | ||
cython_debug/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
include data/* | ||
include output/* | ||
include canary/etc/* | ||
recursive-include canary/_data * |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,33 +1,138 @@ | ||
## Canary | ||
# Canary | ||
|
||
**Canary should be considered pre-alpha softeware at present. It's an indicator of the direction we're heading in.** | ||
[![Documentation Status](https://readthedocs.org/projects/canary-am/badge/?version=latest)](https://canary-am.readthedocs.io/en/latest/?badge=latest) | ||
[![Canary package tests](https://github.com/chriswales95/Canary/actions/workflows/python-unit-tests.yml/badge.svg?branch=development)](https://github.com/chriswales95/Canary/actions/workflows/python-unit-tests.yml) | ||
|
||
Canary is a Python library for Argument Mining. Argument Mining is the automated identifcation and extraction of argumentative data from natural language sources such as text files. | ||
Canary is an argument mining Python library. Argument Mining is the automated identifcation and extraction of | ||
argumentative data from natural language. | ||
|
||
The initial aim is to provide a novel implementation of an Argument Mining tool that someone is able to run on their own data set in order to find patterns or to extract argumentative structure from their data. | ||
It should be noted that this software is currently under **active development** and is not fully functional or feature | ||
complete. | ||
|
||
### Getting Started | ||
## Installation | ||
|
||
To download and gain access to Canary run: | ||
Canary will be installable through [Pypi](https://pypi.org) in the near-future. For the time being, it can be installed | ||
in the following manner: | ||
|
||
Eventually there will be a pip capable install along the following lines: | ||
```python | ||
pip install Canary-am | ||
**https:** | ||
|
||
```commandline | ||
pip install git+https://github.com/chriswales95/Canary.git@development | ||
``` | ||
|
||
but for the moment we'll build and use Canary from source until we have a sufficiently robust and feature rich release candidate. | ||
**ssh:** | ||
|
||
```commandline | ||
pip install git+ssh://[email protected]/chriswales95/Canary.git@development | ||
``` | ||
|
||
## Example Usage | ||
|
||
### Detecting an argument (true / false) | ||
|
||
```python | ||
from canary.argument_pipeline import download_model, load_model, analyse_file | ||
|
||
### Example | ||
if __name__ == "__main__": | ||
# Download pretrained models from the web (unless you fancy creating them yourself) | ||
# Training the models takes a while so I'd advise against it. | ||
download_model("all") | ||
|
||
Basic example showing the extraction of Argumentative Components from a local file: | ||
# load the detector | ||
detector = load_model("argument_detector") | ||
|
||
```Python | ||
from canary import local | ||
# outputs false | ||
print(detector.predict("cats are pretty lazy animals")) | ||
|
||
components = canary.Local(file) | ||
# outputs true | ||
print(detector.predict( | ||
"If a criminal knows that a person has a gun , they are much less likely to attempt a crime .")) | ||
``` | ||
|
||
print(components[1]) | ||
### Analysing a full document | ||
|
||
# Output | ||
['hence it is always said that competition makes the society more effective.', 'therefore without the cooperation, there would be no victory of competition.'] | ||
```python | ||
from canary.argument_pipeline import download_model, analyse_file | ||
from canary.corpora import load_corpus | ||
from pathlib import Path | ||
if __name__ == "__main__": | ||
|
||
# Download all models | ||
download_model("all") | ||
|
||
# Load version 1 of the essay corpus. | ||
essays = load_corpus("argument_annotated_essays_1", download_if_missing=True) | ||
if essays is not None: | ||
essays = [essay for essay in essays if Path(essay).suffix == ".txt"] | ||
|
||
# Analyse the first essay | ||
# essays[0] contains the absolute path to the first essay text file | ||
analysis = analyse_file(essays[0]) | ||
``` | ||
|
||
## What kind of performance is Canary achieving? | ||
|
||
Canary is currently still in development and performance is being improved as work continues. | ||
|
||
### Argument Detector | ||
|
||
precision recall f1-score support | ||
|
||
False 0.85 0.86 0.86 2756 | ||
True 0.86 0.85 0.85 2755 | ||
|
||
accuracy 0.86 5511 | ||
macro avg 0.86 0.86 0.86 5511 | ||
weighted avg 0.86 0.86 0.86 5511 | ||
|
||
|
||
|
||
|
||
|
||
### Argument Segmenter | ||
|
||
precision recall f1-score support | ||
|
||
O 0.7936 0.7259 0.7583 9362 | ||
Arg-B 0.7784 0.7765 0.7775 1235 | ||
Arg-I 0.8761 0.9126 0.8939 19248 | ||
|
||
accuracy 0.8484 29845 | ||
macro avg 0.8160 0.8050 0.8099 29845 | ||
weighted avg 0.8462 0.8484 0.8466 29845 | ||
|
||
|
||
### Argument Component Predictor | ||
|
||
precision recall f1-score support | ||
|
||
Claim 0.80 0.81 0.81 1150 | ||
MajorClaim 0.90 0.98 0.94 1150 | ||
Premise 0.90 0.82 0.86 1149 | ||
|
||
accuracy 0.87 3449 | ||
macro avg 0.87 0.87 0.87 3449 | ||
weighted avg 0.87 0.87 0.87 3449 | ||
|
||
### Link Predictor | ||
|
||
precision recall f1-score support | ||
|
||
Linked 0.83 0.88 0.85 7417 | ||
Not Linked 0.87 0.82 0.84 7311 | ||
|
||
accuracy 0.85 14728 | ||
macro avg 0.85 0.85 0.85 14728 | ||
weighted avg 0.85 0.85 0.85 14728 | ||
|
||
|
||
### Structure Predictor | ||
|
||
precision recall f1-score support | ||
|
||
attacks 0.70 0.81 0.75 1106 | ||
supports 0.76 0.64 0.69 1062 | ||
|
||
accuracy 0.72 2168 | ||
macro avg 0.73 0.72 0.72 2168 | ||
weighted avg 0.73 0.72 0.72 2168 |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.