ws22 23_chemicals

Jump to bottom

tholzheim edited this page Oct 21, 2022 · 4 revisions

Information Extraction from Scientific Literature and Patents

Motivation

Information is published in scientific paper or patents
The information is not machine-understandable (“hidden” in PDF documents)
Data is not FAIR (Findability, Accessibility, Interoperability, and Reuse)

Task

We aim to extract and structure the “hidden” information from scientific literature and patents
We focus on figures (images) in patents and publications

Approach

Image extraction and classification pipeline