GitHub - LeighWeston86/sequence-model-for-ner: Named entity recognition for materials science

A bi-directional LSTM for sequence tagging. This model was developed for Named Entity Recognition (NER) applied to materials science. Details can be found in the following publication: Weston at al., submitted to J. Chem. Inf. Model: https://doi.org/10.26434/chemrxiv.8226068.v1

Usage

Data

The materials-science specific training data included in this repository is heavily truncated; to access the full data, contact Leigh Weston at [email protected]. To use your own data, replace the training/test sets, and embeddings file with your own data in the same format.

Load the data as follows:

from ner_tagging.model.utils import get_data, get_embedding_matrix, get_metrics
word_embedding_dim = 200
training, development, test, word_cache, char_cache = get_data()
embedding_matrix = get_embedding_matrix(word_cache["word_to_integer"], word_embedding_dim)

Training

To train the model first extract the required data:

max_sequence_length = word_cache["max_sequence_length"]
n_words = word_cache["n_words"]
max_char_sequence_length = char_cache["max_word_length"]
n_chars = char_cache["n_chars"]
n_tags = word_cache["n_tags"]

The model has to be built before fitting:

from ner_tagging.model.model import NERTagger
model = NERTagger()
model.build(embedding_matrix, max_sequence_length, n_words, max_char_sequence_length, n_chars, n_tags)
model.fit(X_train, X_train_char, y_train, num_epochs=15)

Predictions and assessment

To assess the model after training, do the following:

from ner_tagging.model.utils import get_metrics
predicted = model.predict(X_dev, X_dev_char)
actual = y_dev.argmax(axis=-1)
print(get_metrics(actual, predicted, word_cache["integer_to_label"]))

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
ner_tagging		ner_tagging
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

Data

Training

Predictions and assessment

About

Releases

Packages

Languages

LeighWeston86/sequence-model-for-ner

Folders and files

Latest commit

History

Repository files navigation

Usage

Data

Training

Predictions and assessment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages