pygetpapers

getpapers (https://github.com/petermr/openVirus/wiki/getpapers), the primary scraper that we've been using so far, is written in Java and requires Node.js to run. Driven by the problems of maintaining and extending the Node-based getpapers, we've decided to re-write the whole thing in Python and call it pygetpapers.

People

PMR
Ayush
Dheeraj
Shweata

Our Initial Plans

PMR: This project is well suited to a modular approach, both in content and functionality. For example, each target repo is a subproject and as long as the framework is well designed it should be possible to add repos independently. An important aspect (missing at the moment) is "how to add a new repo" for example.

Requirements and Bugs to Fix

General API
Sort the Date
Specifically, download only the Review, Research etc.
Add attributes for repository specific functions
Add option to get raw files as well as files in format such as xml and pdfs
Convert XML papers in a user readable format.
Specify a wordlist and then get the count of those words for each paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pygetpapers

People

Our Initial Plans

Requirements and Bugs to Fix

Clone this wiki locally