Skip to content

Using a Kaggle Playground data to implement ML and DL techniques and further drawing comparisons.

Notifications You must be signed in to change notification settings

ujjayants/NLP_Text_Classification

 
 

Repository files navigation

NLP Text Classification

Using a Kaggle Playground data to implement ML and DL techniques and further drawing comparisons.

Purpose

Natural language processing has been widely popular, with the large amount of data available (in emails, web pages, sms) it becomes important to extract valuable information from textual data. There are an assortment of machine learning techniques designed to accomplish this task. With current advances in deep learning, we felt it would be an interesting idea to compare traditional and deep learning techniques. We decided to pick up a playground kaggle dataset with the purpose of text classification and proceeded to implement both these types of algorithms for comparison purposes.

Problem

In today’s world, websites have to deal with toxic and divisive content. Especially major websites like Quora which cater to large traffic and their purpose is to provide a platform to people for asking and answering questions. A key challenge is to weed out insincere questions, those founded upon false premises or questions that intend to make a statement rather than look for helpful answers.

Details

A break down of the code and a run through of the methodology used are available in the following blog.

Code:

  • quora-nlp-data-exploration: Contains the code regarding the data exploration and some visualizations.
  • quora-nlp-text-classification-v1: Contains all the steps involved from text cleaning to feature generation and model selection/evaluation. The details have been described in the following link.
  • quora-nlp-text-classification-DL: Contains a deep learning approach to solve the text classification problem. More details are available in the following link.

About Us

Data science discovery is a step on the path of your data science journey. Please follow us on LinkedIn to stay updated.

About the writers:

  • Ujjayant Sinha: Data science enthusiast with interest in natural language problems.
  • Ankit Gadi: Driven by a knack and passion for data science coupled with a strong foundation in Operations Research and Statistics has helped me embark on my data science journey.

About

Using a Kaggle Playground data to implement ML and DL techniques and further drawing comparisons.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%