Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE API: Add PDF download feature #59

Open
lobodemonte opened this issue Feb 24, 2020 · 4 comments
Open

CORE API: Add PDF download feature #59

lobodemonte opened this issue Feb 24, 2020 · 4 comments

Comments

@lobodemonte
Copy link

lobodemonte commented Feb 24, 2020

GET /articles/get/{coreId}/download/pdf

@lobodemonte lobodemonte self-assigned this Feb 24, 2020
@ceteri
Copy link
Contributor

ceteri commented Feb 24, 2020

That's so good to know. Frankly, I'd never thought about programmatic access to PDFs -- that could help resolve some of the errors that we've had at that stage of the workflow.

@lobodemonte
Copy link
Author

I thought we want to use the api's to be able to find and download PDFs where possible

@lobodemonte
Copy link
Author

@ceteri Should a pdf_lookup method return a PDF object or should it return a link to the PDF of the publication?

@ceteri
Copy link
Contributor

ceteri commented Feb 24, 2020

Good question. That depends on whether the returned URL has limitations:

  • does it require a cookie?
  • is there potentially a time limit for its use?
  • are there any other limitations?

Ideally, if there aren't these or other limitations then we'd prefer to simply lookup the PDF URL and it to the KG for later processing. Alternatively we can refactor the PDF download to an early stage of the workflow if needed.

I've noticed a trend where publishers are claiming to have "open access" articles, but in reality you must be logged in and use your browser/cookie "watermarks" where the PDF link requires some JavaScript to run -- in other words, it's not a direct download URL. So many of the errors that we see (e.g., Wiley, OUP ScienceDirect) appear to be due to limitations:
https://github.com/Coleridge-Initiative/rclc/blob/master/errors.txt

Although maybe we need to troubleshoot that download code?

@lobodemonte lobodemonte changed the title CORE API: Add PDF download integration CORE API: Add PDF download feature Mar 23, 2020
@lobodemonte lobodemonte removed their assignment Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants