Skip to content

ws22 23_chemicals

tholzheim edited this page Oct 21, 2022 · 4 revisions

Information Extraction from Scientific Literature and Patents

Motivation

  • Information is published in scientific paper or patents
  • The information is not machine-understandable (“hidden” in PDF documents)
  • Data is not FAIR (Findability, Accessibility, Interoperability, and Reuse)

Task

  • We aim to extract and structure the “hidden” information from scientific literature and patents
  • We focus on figures (images) in patents and publications

Approach

Image extraction and classification pipeline