Skip to content

NoahAmsel/GCN-4-Word-Sense-Induction

Repository files navigation

CACooN:

(Clustering Augmented Cooccurrence Network for word sense embeddings)

Hannah Lawrence & Noah Amsel

This repository contains code for our final project in CPSC 667: Advanced Natural Language Processing.

Each script contains a description inside its main function. Run

python cooccurrence.py
python sage.py
python induce.py
python evaluate.py

Dependencies

You'll need nltk, networkx, numpy, and possibly others.
The code uses node2vec and GraphSAGE. You will need to clone those repositories and place them in the same parent directory that this repository is in. (Note that we use the Veles implementation of node2vec due to the high memory requirements of the original version)

https://github.com/williamleif/GraphSAGE
https://github.com/vid-koci/snap/tree/veles/examples/veles

For size reasons, some files can't be uploaded to GitHub: We use pretrained Glove embeddings, which are available for download here. Put them in a folder inside this repository called glove. Finally, we use data from the SemEval-2010 task, available for download here. Put them in a folder called SemEval-2010 with subfolders evaluation, test_data, and training_data,

There are two hardcoded file paths in the sage.py script but they should be easy to spot and change if necessary.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published