Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
davidcjuergens authored Jan 6, 2020
1 parent 8f4da5b commit 99ec676
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Running updates from UW Chemical NLP Mining team

### January 6, 2020. Two new notebooks for demonstration of text data pipeline.
1) '/ipynb/CDE-PCP-W2V.ipynb' - This notebook contains a demonstration of training a model to learn real chemical knowledge from text data. Additionally, the ability to extract structures from text and rank them based on extracted knowledge is shown.
#### 1) '/ipynb/CDE-PCP-W2V.ipynb'
This notebook contains a demonstration of training a model to learn real chemical knowledge from text data. Additionally, the ability to extract structures from text and rank them based on extracted knowledge is shown.

2) '/ipynb/tfidf_classification.ipynb' - This notebook contains demonstration of filtering/ranking the "relevance" of full-text publications. Filtering is done via support vector machines learning from TF-IDF vectorized text data.
#### 2) '/ipynb/tfidf_classification.ipynb'
This notebook contains demonstration of filtering/ranking the "relevance" of full-text publications. Filtering is done via support vector machines learning from TF-IDF vectorized text data.

### September 16 Meeting (Taking stock, regroup, what's everyone up to?)
Summary of this meeting is that Wes T + Alex are building a word2vec classifier based on a cumulative cosine distance of a paper to keywords/phrases like 'corrosion inhibition' or 'flame retardant polymer.' Dave J has established one resource for full text publications that can yield at least 10^4 papers on a given topic. Also building TF-IDF classifier. Jon will investigate methods for large JSON datastructures to start storing our data in an organized fashion. We agreed we need the ability to create a living breathing database offline that is easily added to and accessed.
Expand Down

0 comments on commit 99ec676

Please sign in to comment.