Skip to content

Releases: UWPRG/UW-Molecular-Data-Mining

Pipeline alpha-1.0

11 Jan 02:37
7e20250
Compare
Choose a tag to compare

This release contains ipython notbooks, Python scripts, and data that constitute a text data pipeline for chemical information extraction and property mining. Included is the capability to create a large corpus of full-text scientific publications, sort publications by relevance or category using TF-IDF vectorization and support vector machine learning on a given training set, identify the compounds contained within a corpus, verify unambiguous structure of the identified molecules through PubChem, train Word2Vec on the text corpus, and analyze the relationships between chemical entities found.