Using Integrated TextRank and BM25+ algorithm
Explore the research publication »
This repository contains the Python implementation of my research publication titled "Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm". The project aims to provide a practical implementation of the extractive text summarization technique proposed in the research publication.
Features:
- Covers the implementation of the multiple similarity matrix variations in the TextRank's extractive text summarization algorithm. The variations include:
- Integrated TextRank and BM25+ Algorithm (Proposed model)
- Original TextRank Algorithm
- BM25+ similarity-based TextRank Algorithm
- TF-IDF-Cosine similarity-based TextRank Algorithm
- LCS similarity-based TextRank Algorithm
- Configurable summarization parameters for controlling the length and quality of the generated summaries.
- Support for processing various types of text documents, such as articles, blog posts, research papers, and more.
- Support for Rouge score evaluation
Summarization Process:
The summarization process consists of three main phases where the first phase starts by retrieving textual data from the article source, followed by the preprocessing of the textual data. Preprocessing includes stopword removal, tokenization, POS tagging, and lemmatization. In the next phase, the similarity between each sentence pair of the article has been calculated then the article is modeled as a graph using the similarity matrix. Finally, in the last phase, a modified TextRank algorithm has been applied and the article summary is generated.The project is primarily built using the Python programming language with the help of the following libraries:
- NLTK - Natural Language Toolkit
- Networkx - Network Analysis in Python
- Gensim
- Scikit-Learn
- Numpy - Numerical Python
- Pandas - Python Data Analysis Library
- Rank-BM25
- Rouge-Score
- Prettytable
To start using the project, you first have to setup your local machine to meet the system prerequisites. For this, just follow the below steps:
- Clone this repository to your local machine:
git clone https://github.com/gulvaibhav20/extractive-text-summarizer.git
- Navigate to the source code directory (src) inside the project repository:
cd extractive-text-summarizer/src
- Install the required Python dependencies:
pip install -r requirements.txt
- Use the
src/main.py
to start using the extractive text summarizer.
NOTE: Head over to the main source code repository to understand the configurations and input/output settings for the summarizer.
- Prepare the text document / article URL you want to summarize. (Note: The document should be in plain text format i.e. in (.txt) format)
- Modify the summarization parameters in the src/config.ini file based on your personal preference.
- Run the summarization script:
python src/main.py
If you use this implementation in your research or publication, please cite the original research paper:
- [MLA] : Gulati, Vaibhav, et al. "Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm." Electronics 12.2 (2023): 372.
- [APA] : Gulati, V., Kumar, D., Popescu, D. E., & Hemanth, J. D. (2023). Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm. Electronics, 12(2), 372.
- [Chicago] : Gulati, Vaibhav, Deepika Kumar, Daniela Elena Popescu, and Jude D. Hemanth. "Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm." Electronics 12, no. 2 (2023): 372.
- [Harvard] : Gulati, V., Kumar, D., Popescu, D.E. and Hemanth, J.D., 2023. Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm. Electronics, 12(2), p.372.
- [Vancouver] : Gulati V, Kumar D, Popescu DE, Hemanth JD. Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm. Electronics. 2023 Jan 11;12(2):372.
- [BibTex] :
@article{gulati2023extractive,
title={Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm},
author={Gulati, Vaibhav and Kumar, Deepika and Popescu, Daniela Elena and Hemanth, Jude D},
journal={Electronics},
volume={12},
number={2},
pages={372},
year={2023},
publisher={MDPI}
}
Any contributions you make are greatly appreciated !. If you find any bugs or have ideas for improvements, please open an issue or submit a pull request. Follow the below steps to contribute:
- Fork the Project
- Create your Feature Branch (
git checkout -b feature
) - Commit your Changes (
git commit -m 'Add some feature'
) - Push to the Branch (
git push origin feature
) - Open a Pull Request
PS: Don't forget to give the project a star! Thanks again!
Distributed under the MIT License. See LICENSE.txt
for more information.