CORE API: Add PDF download feature #59

lobodemonte · 2020-02-24T18:53:15Z

GET /articles/get/{coreId}/download/pdf

ceteri · 2020-02-24T19:16:19Z

That's so good to know. Frankly, I'd never thought about programmatic access to PDFs -- that could help resolve some of the errors that we've had at that stage of the workflow.

lobodemonte · 2020-02-24T19:57:14Z

I thought we want to use the api's to be able to find and download PDFs where possible

lobodemonte · 2020-02-24T20:54:21Z

@ceteri Should a pdf_lookup method return a PDF object or should it return a link to the PDF of the publication?

ceteri · 2020-02-24T21:08:05Z

Good question. That depends on whether the returned URL has limitations:

does it require a cookie?
is there potentially a time limit for its use?
are there any other limitations?

Ideally, if there aren't these or other limitations then we'd prefer to simply lookup the PDF URL and it to the KG for later processing. Alternatively we can refactor the PDF download to an early stage of the workflow if needed.

I've noticed a trend where publishers are claiming to have "open access" articles, but in reality you must be logged in and use your browser/cookie "watermarks" where the PDF link requires some JavaScript to run -- in other words, it's not a direct download URL. So many of the errors that we see (e.g., Wiley, OUP ScienceDirect) appear to be due to limitations:
https://github.com/Coleridge-Initiative/rclc/blob/master/errors.txt

Although maybe we need to troubleshoot that download code?

lobodemonte self-assigned this Feb 24, 2020

lobodemonte changed the title ~~CORE API: Add PDF download integration~~ CORE API: Add PDF download feature Mar 23, 2020

lobodemonte removed their assignment Jul 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CORE API: Add PDF download feature #59

CORE API: Add PDF download feature #59

lobodemonte commented Feb 24, 2020 •

edited

Loading

ceteri commented Feb 24, 2020

lobodemonte commented Feb 24, 2020

lobodemonte commented Feb 24, 2020

ceteri commented Feb 24, 2020 •

edited

Loading

CORE API: Add PDF download feature #59

CORE API: Add PDF download feature #59

Comments

lobodemonte commented Feb 24, 2020 • edited Loading

ceteri commented Feb 24, 2020

lobodemonte commented Feb 24, 2020

lobodemonte commented Feb 24, 2020

ceteri commented Feb 24, 2020 • edited Loading

lobodemonte commented Feb 24, 2020 •

edited

Loading

ceteri commented Feb 24, 2020 •

edited

Loading