Skip to content

Using Spacy and NLTK module with Tf-Idf algorithm for text-summarisation. This code will give you the summary of inputted article. You can input text directly or from .txt file, .pdf file or from wikipedia url.

Notifications You must be signed in to change notification settings

LunaticPrakash/Text-Summarization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Text-Summarization

Using Spacy and NLTK module with TF-IDF algorithm for text-summarisation. This code will give you the summary of inputted article. You can input the text by typing (or copy-paste) or from Txt file, PDF file or from Wikipedia Page Url.

Purpose :-

To save time while reading by summarizing a large article or text into fewer lines.

Description :-

It usage Term Frequency-Inverse Document Frequency (TF-IDF) algorithm for summarising the article.

Features :-

You can read the text of your long article in 4 ways :-

InputTextWays

  • By typing text on your own (or copy-paste).
  • Reading the text from .txt file.
  • Reading the text from .pdf file.(You can choose either to get summary of entire pdf or select any page interval).

PdfInput

  • Reading the text from wikipedia page (All you have to do is to provide the url of that page. Program will automatically scrap the text and summarise it for you).

Don't worry about Code length xD. It might look lengthy but there are lot of comments for explaination of code(almost 70 comments) and extra spacing for more readability.

Output :-

Summary

  • Comparison of Original Content vs Summarized content.

OriginalvsSummaryWordCount

Requirements :-

  • Python3
  • Spacy Module (short, medium, or long any type is sufficient)
  • NLTK Module
  • PyPdf2
  • Beautiful Soup (bs4)
  • urllib (already available with python itself, no need for external installation)

How to install Requirements :-

  1. Python3 can be installed from their official site https://www.python.org/ . Or you can use anaconda environment.
  2. Spacy can be installed by For Anaconda Environment >
conda install -c conda-forge spacy

python3 -m spacy download en

For other environments >

pip3 install spacy

python3 -m spacy download en
  1. NLTK can be installed by For Anaconda Environment >
conda install -c anaconda nltk

For other environments >

pip3 install nltk
  1. PyPdf2 can be installed by For Anaconda Environment >
conda install -c conda-forge pypdf2

For other environments >

pip3 install PyPDF2
  1. Beautiful Soup (bs4) For Anaconda Environment >
conda install -c anaconda beautifulsoup4

For other environments >

pip3 install beautifulsoup4`

Getting Started :-

  • Download or clone repository.

  • Open cmd or terminal in same directory where Text-Summarizer.py file is stored and then run it by followng command :-

python3 Text-Summarizer.py
  • Now just follow along with the program.

Bugs and Improvements :-

  • No known bugs. Summary can't be as perfect as humans can do.
  • Audio feature will be added soon, so that you can listen the summary too if you want.

Dev :- Prakash Gupta

About

Using Spacy and NLTK module with Tf-Idf algorithm for text-summarisation. This code will give you the summary of inputted article. You can input text directly or from .txt file, .pdf file or from wikipedia url.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages