Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review pdf2xml, crossref and CCC RightsLink approaches to providing full text for text mining #202

Open
rkboyce opened this issue Dec 31, 2016 · 0 comments

Comments

@rkboyce
Copy link
Collaborator

rkboyce commented Dec 31, 2016

There has been considerable progress in the publishing community for supporting text mining of full text articles. We need to consider how these are relevant for the current NLM R01 future and to further enhancements to AnnotationPress. Here are some things to pay attention to:

  1. Crossref provides an API (https://github.com/CrossRef/rest-api-doc/blob/master/rest_api.md) that is oriented towards helping identify the rights for full text and even the location of PDF or XML documents: https://www.youtube.com/watch?v=LBYgq6jPoyk&feature=youtu.be. There is some important background info on crossref here: https://www.youtube.com/watch?v=YPCRfNFJgj8

  2. RightFind is the copyright clearance center's new solution for helping researchers find XML versions of full text for text mining purposes, along with knowledge of the rights they have to work with those documents: https://www.youtube.com/watch?v=-gUhAkwZbVQ

  3. pdf2xml seems to be a highly preferred approach by the text mining community for working with PDF content. We need to think about how annotations created in AnnotationPress using PDF documents can be translated to the equivalent XML versions of the documents because that will be very useful for text miners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants