This is a project made by:
- Prajneya Kumar
- Shivansh S.
- Tejasvi Chebrolu
- Clone the repository
- Install all dependencies mentioned in
requirements.txt
- Choose which method you would like to use, and depending on that go to appropriate section
This model generates a summary using a Document Term Matrix and frequency count. To use this
-
Go to the
method_1
folder -
Place your article in
valid
folder named asarticle.txt
. -
Run the
extractive.py
file using python3. -
You will end up getting a summary named as
summary.txt
inside thevalid
folder.
This model generates a summary using modified TF-IDF of the document dataset, with weights attached. To use this
-
Go to the
method_2
folder -
Place your article in
valid
folder -
Run the code in jupyter notebook
-
Input the name of your file which is within that directory
-
You will end up getting a summary + wordcloud in the output folder :)
- Add the Gold standard for the summary as
n.txt
in the Gold folder in the Summaries directory. Heren
is the next number in the sequence in the Gold folder. - For example, if there are 7 files in the Gold Folder, they must be labelled as
1.txt
2.txt
...7.txt
etc. - Repeat this process for the summaries generated by the rule-based method and the extractive method and store them in the
Extractive
andRuleBased
directories. - You can do this on the terminal via simple redirection.
- Now, in the
accuracy.py
file on line number 15, change the code tofor i in range(1, n+1):
where n is the same variable as above. - For example, if your file was saved as
9.txt
you would change the code tofor i in range(1, 10):
- Run the code as
python accuracy.py
- If you want individual accuracies for any article, you can uncomment line number 62 in the
Rouge_1.py
file. - It is advised then to redirect to a new file as
python accuracy.py > output.txt
to enable better formatting.
For Method I we got an accuracy of 74.1% For Method II we got an accuracy of 83.4%
The evaulation was done based on the Rouge method proposed by Chin-Yew Lin. For this project, since the summarization has been extractive, only Rouge-I has been used. To generate the gold standard for the summaries, the annotation was done manually. For any given article, the annotators were asked to pick the most important sentences. The only rule was that the number of sentences they could choose was equal to 0.3N where N was the number of sentences in the initial article.
We thank the following for creating the gold standard summaries:
- Abhinav Menon
- Trisha Kaore
- Yash Agrawal
- Eshika Khandelwal
- Vidushi Bhartari
- Shashwat Singh
- Shubhankar Kamthankar
- Fork this repository
- Clone the forked repository to your local system
git remote add upstream https://github.com/AurumnPegasus/Text-Summariser.git
- Install all required dependencies (mentioned in
requirements.txt
) - Commit and Send PRs :)