Skip to content

Extractive Text Summarization using Integrated TextRank and BM25+ Algorithm

License

Notifications You must be signed in to change notification settings

gulvaibhav20/extractive-text-summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Contributors Forks Stargazers Issues MIT License LinkedIn


Logo

Extractive Text Summarizer

Using Integrated TextRank and BM25+ algorithm
Explore the research publication »

Overview

Extractive Text Summarizer Output

This repository contains the Python implementation of my research publication titled "Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm". The project aims to provide a practical implementation of the extractive text summarization technique proposed in the research publication.

Features:

  • Covers the implementation of the multiple similarity matrix variations in the TextRank's extractive text summarization algorithm. The variations include:
    • Integrated TextRank and BM25+ Algorithm (Proposed model)
    • Original TextRank Algorithm
    • BM25+ similarity-based TextRank Algorithm
    • TF-IDF-Cosine similarity-based TextRank Algorithm
    • LCS similarity-based TextRank Algorithm
  • Configurable summarization parameters for controlling the length and quality of the generated summaries.
  • Support for processing various types of text documents, such as articles, blog posts, research papers, and more.
  • Support for Rouge score evaluation


Summarization Process:

Summarization Process Flowchart

The summarization process consists of three main phases where the first phase starts by retrieving textual data from the article source, followed by the preprocessing of the textual data. Preprocessing includes stopword removal, tokenization, POS tagging, and lemmatization. In the next phase, the similarity between each sentence pair of the article has been calculated then the article is modeled as a graph using the similarity matrix. Finally, in the last phase, a modified TextRank algorithm has been applied and the article summary is generated.

(back to top)

Built With

The project is primarily built using the Python programming language with the help of the following libraries:

(back to top)

Getting Started

To start using the project, you first have to setup your local machine to meet the system prerequisites. For this, just follow the below steps:

  1. Clone this repository to your local machine:
    git clone https://github.com/gulvaibhav20/extractive-text-summarizer.git
  2. Navigate to the source code directory (src) inside the project repository:
    cd extractive-text-summarizer/src
  3. Install the required Python dependencies:
    pip install -r requirements.txt
  4. Use the src/main.py to start using the extractive text summarizer.

NOTE: Head over to the main source code repository to understand the configurations and input/output settings for the summarizer.

(back to top)

Usage

  1. Prepare the text document / article URL you want to summarize. (Note: The document should be in plain text format i.e. in (.txt) format)
  2. Modify the summarization parameters in the src/config.ini file based on your personal preference.
  3. Run the summarization script:
    python src/main.py

(back to top)

Citation

If you use this implementation in your research or publication, please cite the original research paper:

  • [MLA] : Gulati, Vaibhav, et al. "Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm." Electronics 12.2 (2023): 372.
  • [APA] : Gulati, V., Kumar, D., Popescu, D. E., & Hemanth, J. D. (2023). Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm. Electronics, 12(2), 372.
  • [Chicago] : Gulati, Vaibhav, Deepika Kumar, Daniela Elena Popescu, and Jude D. Hemanth. "Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm." Electronics 12, no. 2 (2023): 372.
  • [Harvard] : Gulati, V., Kumar, D., Popescu, D.E. and Hemanth, J.D., 2023. Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm. Electronics, 12(2), p.372.
  • [Vancouver] : Gulati V, Kumar D, Popescu DE, Hemanth JD. Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm. Electronics. 2023 Jan 11;12(2):372.
  • [BibTex] :
@article{gulati2023extractive,
  title={Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm},
  author={Gulati, Vaibhav and Kumar, Deepika and Popescu, Daniela Elena and Hemanth, Jude D},
  journal={Electronics},
  volume={12},
  number={2},
  pages={372},
  year={2023},
  publisher={MDPI}
}


Contributing

Any contributions you make are greatly appreciated !. If you find any bugs or have ideas for improvements, please open an issue or submit a pull request. Follow the below steps to contribute:

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature)
  3. Commit your Changes (git commit -m 'Add some feature')
  4. Push to the Branch (git push origin feature)
  5. Open a Pull Request

PS: Don't forget to give the project a star! Thanks again!

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)