The News Article Summarizer and Categorizer is a project aimed at addressing the information overload experienced by users in today's fast-paced world. This project provides a streamlined platform for accessing and summarizing news articles across five main domains: India, World, Business, Technology, and Sports. Additionally, it offers the functionality to convert summarized news articles into audio format, enhancing accessibility and convenience for users who prefer auditory consumption of information.
- News article scraping: Collects news articles from Times of India website across various domains.
- Text Summarization: Provides concise summaries of news articles using advanced NLP techniques.
- Text-to-Audio Conversion: Converts summarized news articles into audio format for auditory consumption.
- Libraries:
os.path
: For filesystem path operations.csv
: For reading and writing CSV files.requests
: For sending HTTP requests to fetch web pages.pandas
: For data manipulation and analysis.nltk.corpus
: For accessing natural language corpora and lexical resources.gtts
: For converting text to speech.streamlit
: For creating interactive web applications.bs4 (BeautifulSoup)
: For web scraping HTML and XML documents.newspaper
: For web scraping news articles from various sources.
- Web Scraping: Automatically extracts information from websites to retrieve news articles.
- Text Summarization: Uses models like BART from the transformer library to generate concise summaries of articles.
- Text-to-Audio Conversion: Utilizes the gTTS (Google Text-to-Speech) module to generate audio summaries from the provided text.
Deployed version on Streamlit cloud: https://2minutenews.streamlit.app/