-
Notifications
You must be signed in to change notification settings - Fork 17
Outreach: InCoB2021 PMR presentation
Title: ContentMining the Biological Literature
Abstract: Much science is only published as PDF "papers" designed for sighted english-speaking humans to read. This huge, multidisciplinary resource becomes much more useful when we convert it automatically into structured, semantic, machine-understandable form. Our Open Source toolkit allows scientists to rapidly mine the Open literature and create their own ontologies and semantic knowledgebase. ContentMining involves a number of rapid heuristic steps: downloading from Open repositories; converting PDF to structured documents; NLP analysis and term/phrase extraction; extraction of text and data from diagrams; annotation with Wikidata into dictionaries; searching documents with multiple dictionaries. Results can be analysed with standard tools to give tables, co-occurrences, maps, chemical pathways etc. The tools (Python3) are accessible for everyone, especially early career researchers, and include support for multiple languages through Wikidata synonyms. We work as an Open Notebook team, develop collaboratively, and welcome volunteers.
- Discover:
- Refine:
- Re-use
Presentations
- @Ayush
Pygetpapers. Include example of query generation from dictionaries
- @Shweata
Docanalysis
- @Anuv
pyamiimage
Notes: @mbeisen https://twitter.com/mbeisen/status/1451233646761824284 The current science publishing system is the worst form of science publishing system. That's it. There's no except.