A curated list of resources for learning about natural language processing, text mining, text analytics, and unstructured data.
- Other Curated Lists
- Books
- Videos
- Blogs, Articles, Papers, Case Studies
- Online Courses
- APIs and Libraries
- Products
- Online Tools
- Datasets
- awesome-nlp
- Deep Learning for NLP resources
- Speech and Natural Language Processing
- Opinion Mining, Sentiment Analysis, and Opinion Spam Detection
- awesome-machine-learning
- Sentiment140
- Taming Text
- Natural Language Processing with Python
- Speech and Language Processing
- Foundations of Statistical Natural Language Processing
- Language Processing with Perl and Prolog: Theories, Implementation, and Application (Cognitive Technologies)
- An introduction for information retrieval
- Handbook of Natural Language Processing
- Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications
- Fundamentals of Predictive Text Mining
- Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More
- Text Mining with R
- From Natural Language to Calendar Entries, with Clojure. March 2015. NLP, Clojure
- Ask HN: How Can I Get into NLP (Natural Language Processing)?
- Ask HN: What are the best tools for analyzing large bodies of text?
- Lekta Blog
- Quora: How do I learn Natural Language Processing?
- Quora Topic: Natural Language Processing
- The Definitive Guide to Natural Language Processing October 2015.
- Futures of text Feb 2015.
- R or Python on Text Mining Aug 2015.
- Where to start in Text Mining Aug 2012.
- Text Mining in R and Python: 8 Tips To Get Started. Oct 2016
- An introduction to text analysis with Python, Part 1 April 2012.
- Mining Twitter Data with Python (Part 1: Collecting Data)
- a gentle introduction to historical data analysis
- Why Text Mining May Be The Next Big Thing. March 2012.
- SAS CEO offers analytics over BI, reveals use cases for text analytics June 2011.
- Value and benefits of text mining. Sep 2015.
- Text Mining South Park Feb 2016
- Natural Language Processing: An Introduction
- Natural Language Processing Tutorial. June 2013.
- Natural Language Processing blog.
- An Introduction to Text Mining using Twitter Streaming API and Python
- GitHub repo with code: https://github.com/adilmoujahid/Twitter_Analytics
- How To Get Into Natural Language Processing'
- Betty: a friendly English-like interface for your command line.
- https://blog.scrapinghub.com/2016/01/19/scrapy-tips-from-the-pros-part-1/
- Extract text from any document; no muss, no fuss.. July 2014.
- Donald Trump vs Hillary Clinton: sentiment analysis on Twitter mentions
- Does sentiment analysis work? A tidy analysis of Yelp reviews
- CACM: Techniques and Applications for Sentiment Analysis
- Twitter mood predicts the stock market
- A nonlinear impact: evidences of causal effects of social media on market prices
- Stock Sentiment Data: Measuring the Mood of the Market
- Stock Prediction Using Twitter Sentiment Analysis. Stanford course project report.
- Forbes: How Quant Traders Use Sentiment To Get An Edge On The Market
- News Sentiment Analysis Using R to Predict Stock Market Trends. SMU lecture.
- On the Predictability of Stock Market Behavior using StockTwits Sentiment and Posting Volume
- Sentdex: Quantifying the Qualitative
- Leveraging international market sentiment for trading strategies
- From tweets to polls: Linking text sentiment to public opinion time series
- Lexicon-Based Methods for Sentiment Analysis
- On the negativity of negation
- Blog Post: That Sentimental Feeling
- Trump2Cash: A stock trading bot powered by Trump tweets
- Unsupervised Sentiment Neuron. April 2017.
- Current State of Text Sentiment Analysis from Opinion to Emotion Mining Feb 2017
- Does sentiment analysis work? A tidy analysis of Yelp reviews
- Document Clustering. MSc Thesis.
- Blog Post: Found in translation: More accurate, fluent sentences in Google Translate Nov 2016
- NYTimes: The Great A.I. Awakening Dec 2016
- Coursera: Introduction to Natural Language Processing
- Stanford course on NLP: Dan Jurafsky and Chris Manning
- Stanford Deep Learning NLP Course
- Coursera: Nartual Language Processing
- Stanford CS 224N / Ling 284
- CMU Language and Statistics II: (More) Empirical Methods in Natural Language Processing
- UT CS 388: Natural Language Processing
- Coursera: Applied Text Mining in Python
- Big Data University: Text Analytics – Getting Results with SystemT
- Big Data University: Advanced Text Analytics – Getting Results with SystemT
- Big Data University: Text mining in action: Analyzing Twitter data for Democratic General Elections (BETA Version)
- Columbia: COMS W4705: Natural Language Processing
- Columbia: COMS E6998: Machine Learning for Natural Language Processing (Spring 2012)
- Machine Translation: Spring 2016
- tm: Text Mining.
- lsa: Latent Semantic Analysis.
- lda: Collapsed Gibbs Sampling Methods for Topic Models.
- textir: Inverse Regression for Text Analysis.
- corpora: Statistics and data sets for corpus frequency data.
- tau: Text Analysis Utilities.
- tidytext: Text mining using dplyr, ggplot2, and other tidy tools
- Sentiment140: R package for sentiment text analysis
- sentimentr Lexicon-based sentiment analysis.
- cleanNLP ML-based sentiment analysis.
- RSentiment Lexicon-based sentiment analysis. Contains support for negation detection and sarcasum.
Python modules
- NLTK: Natural Language Toolkit.
- spaCy: Industrial-Strength Natural Language Processing in Python.
- textblob: Simplified Text processing.
- Natural Language Basics with TextBlob
- Gensim: Topic Modeling for humans.
- textmining: Python Text Mining utilities.
Apache Tika: a content analysis tookilt.
Stanford CoreNLP: a suite of core NLP tools
- Also checkout http://corenlp.run for a hosted version of the CoreNLP server.
MALLET: MAchine Learning for LanguagE Toolkit
- Github: https://github.com/mimno/Mallet
Streamcrab: Real-Time, Twitter sentiment analyzer engine http:/www.streamcrab.com
TextRazor API: Extract Meaning from your Text.
- SAS Text Miner (Part of SAS Enterprise Miner)
- SAS Sentiment Analysis
- RapidMiner
- Gate
- IBM Watson
- Crimson Hexagon
- Stocktwits: Tap into the Pulse of Markets
- Meltwater
- CrowdFlower: AI for your business.
- Lexalytics Sematria. API and Excel plugin.
- Rosette Text Analytics: AI for Human Language
- Google's Natural Language API: Derive insights from unstructured text using Google machine learning
- Alchemy API
- Monkey Learn
- Apache PDFBox
- Tabula: A tool for liberating data tables locked inside PDF files.
- PDFLayoutTextStripper: Converts a pdf file into a text file while keeping the layout of the original pdf.
- pdftabextract: A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
- SO: How to extract text from a PDF?
- Tools for Extracting Data and Text from PDFs - A Review