Using a Kaggle Playground data to implement ML and DL techniques and further drawing comparisons.
Natural language processing has been widely popular, with the large amount of data available (in emails, web pages, sms) it becomes important to extract valuable information from textual data. There are an assortment of machine learning techniques designed to accomplish this task. With current advances in deep learning, we felt it would be an interesting idea to compare traditional and deep learning techniques. We decided to pick up a playground kaggle dataset with the purpose of text classification and proceeded to implement both these types of algorithms for comparison purposes.
In today’s world, websites have to deal with toxic and divisive content. Especially major websites like Quora which cater to large traffic and their purpose is to provide a platform to people for asking and answering questions. A key challenge is to weed out insincere questions, those founded upon false premises or questions that intend to make a statement rather than look for helpful answers.
A break down of the code and a run through of the methodology used are available in the following blog.
Code:
- quora-nlp-data-exploration: Contains the code regarding the data exploration and some visualizations.
- quora-nlp-text-classification-v1: Contains all the steps involved from text cleaning to feature generation and model selection/evaluation. The details have been described in the following link.
- quora-nlp-text-classification-DL: Contains a deep learning approach to solve the text classification problem. More details are available in the following link.
Data science discovery is a step on the path of your data science journey. Please follow us on LinkedIn to stay updated.
About the writers:
- Ujjayant Sinha: Data science enthusiast with interest in natural language problems.
- Ankit Gadi: Driven by a knack and passion for data science coupled with a strong foundation in Operations Research and Statistics has helped me embark on my data science journey.