This repo support CoNLL format, which is adapted by Universal Dependencies Project. parserChiang is implemented with great MXNet gluon.
There are different models in this repo:
- [DEPRECATED] default/: The default parser model using only word features. It is the baseline of all other models.
- [DEPRECATED] pos_aid/: This parser model requires standard POS tagging during inference, which is provided in CoNLL dataset. In practice, you may use Stanford NLP tools to get good POS tags.
- [DEPRECATED] pos_joint/: This parser model will predict POS tags.
- pos_deprel_joint/: This parser model will predict POS tags and dependent relation label. LAS index requires the output from this model.
- [DEPRECATED] pos_aid_deprel_joint/: This parser model requires standard POS tagging during inference, and will predict ependent relation label.
The models marked with [DEPRECATED] will not be updated to latest functions.
Data should be put into data/ directory. Train the model with
$ python3 train_pos_parser.py
If the training procedure runs on GPU and the loss value become NaN abruptly, change to CPU training with following command:
$ python3 train_pos_parser.py --cpu
The maintainer is still working on this bug.
Then it will create a directory named model_dumps_{Date}_{Time} to store the model dump. Test it with
$ python3 test_pos_parser.py [model_path] [model_file]
This implementation is a low-performance transition-based parser in both training speed and predicition accuracy. I created it as a toy model simply for learning natural language processing. DO NOT USE IT IN ANY REAL WORLD TASKS.
Have fun with it!
Copyright 2017-2019 Mengxiao Lin <[email protected]>, read LICENSE for more details.