Skip to content

Attempt to clone SyntaxNet using only Python, with GPU support

Notifications You must be signed in to change notification settings

xtknight/ash-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ash-parser

This was originally for a class project.

Utilizes a Chen and Manning (2014) style neural network parser in Python and TensorFlow. Many elements mimic SyntaxNet.

I analyze SyntaxNet's Architecture here.

parsing-config file is required to be created in the model directory before execution.

Run training_test.sh for an example of how to train a model. Evaluation during training works as well, but there is no API for tagging new input yet or serving a model.

External dependencies

  • NumPy
  • TensorFlow 1.0

Similarities to SyntaxNet

  • Same embedding system (configurable per-feature group deep embedding)
  • Same optimizer (Momentum with exponential moving average)
  • Lexicon builder is identical for words, tags, and labels
  • Map files output by SyntaxNet and AshParser should be identical
  • Evaluation metric is identical (SyntaxNet's corresponds to AshParser's UAS)
  • Feature system is almost identical (except perhaps some very rare corner cases)
  • Due to same architecture, accuracy should be very close to Greedy SyntaxNet

Differences from SyntaxNet:

  • Arc-Eager transition system also supported
  • Context file with redundant or boilerplate information is unnecessary
  • Supports GPU: training phase can complete in minutes
  • Pure Python3 implementation. No need for bazel
  • LAS (Labeled Attachment Score) prints out during evaluation
  • Precalculation and caching of feature bags. This makes it easier to train multiple models with the same token features but different hyperparameters
  • No support for structured (beam) parsing. Considering LSTM or something simpler and faster instead for the future. Accuracy loss should be in the ballpark of 1-2% due to this.
  • Feature groups are automatically created by groups of tag, word, and label rather than by grouping together with semicolon in a context file
  • Only support for the transition parser, not the POS tagger, morphological analyzer, or tokenizer
  • ngrams, punctuation_amount, morph tags and other features not yet implemented

About

Attempt to clone SyntaxNet using only Python, with GPU support

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published