Skip to content

A program that reads .pdf files and finds the most used words in them

Notifications You must be signed in to change notification settings

balciiberk/most-used-words

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Most Used Words

This program finds the most used words in the given .pdf files.

It can be used for finding the most used words in a language by adding .pdf e-books. However if only one book is given, the most used words would be the name of the characters in the book or it would depend on the writing style of the author. So this program is written for reading multiple .pdf files.

It reads the .pdf files, cleans the words from punctuation, converts to lowercase and splits the words. Then it saves the list of the words in json format. Lastly it reads the json file counts the words and shows most used words or plots it in a bar plot.

An example bar plot:

most-used-10-words

About

A program that reads .pdf files and finds the most used words in them

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages