Skip to content

The official GitHub repository for the Newspaper Scrape Project, which uses natural language processing and web scraping to extract data from articles on the technology section of the New York Times.

License

Notifications You must be signed in to change notification settings

Rohak72/Newspaper-Scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

Newspaper-Scrape

By: Rohak Jain

The Newspaper Scrape project employs advanced natural language processing techniques, sentiment analysis, and web scraping to extract summaries, metadata, and levels of polarity/subjectivity from articles in the New York Times' technology section. A complete, 15-part video series was made in alignment with this effort; if you're interested, the YouTube playlist can be found here.

Throughout my scripts, I have placed detailed comments explaining what each method does, its function, and relevant explanations on complex lines of code. Additionally, if you'd like to clone or use this repository for yourself, feel free to use GitHub Desktop or PyCharm to copy my code and tweak it to your liking! If you have any questions, feel free to post them in the "Issues" section of the repo and I will try my best to get back to you as soon as possible.

Dependencies

To run and modify this program on your machine, you'll want to have installed the following packages:

Important

time, random - These packages should be built-in into any version of Python 3 and above.

textblob - Package Install: pip install textblob.

newspaper3k - Package Install: pip install newspaper3k.

requests - Package Install: pip install requests.

bs4 (BeautifulSoup) - Package Install: pip install bs4.

Cheers!

Thanks for stopping by, I hope this helped you out!

About

The official GitHub repository for the Newspaper Scrape Project, which uses natural language processing and web scraping to extract data from articles on the technology section of the New York Times.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages