-
Notifications
You must be signed in to change notification settings - Fork 17
pygetpapers
Ayush Garg edited this page Jan 14, 2021
·
17 revisions
getpapers
(https://github.com/petermr/openVirus/wiki/getpapers), the primary scraper that we've been using so far, is written in Java and requires Node.js
to run. Driven by the problems of maintaining and extending the Node-based getpapers
, we've decided to re-write the whole thing in Python and call it pygetpapers
.
- PMR
- Ayush
- Dheeraj
- Shweata
PMR: This project is well suited to a modular approach, both in content and functionality. For example, each target repo is a subproject and as long as the framework is well designed it should be possible to add repos independently. An important aspect (missing at the moment) is "how to add a new repo" for example.
- General API
- Sort the Date
- Specifically, download only the Review, Research etc.
- Add attributes for repository specific functions
- Add option to get raw files as well as files in format such as xml and pdfs
- Convert XML papers in a user readable format.
- Specify a wordlist and then get the count of those words for each paper.