Skip to content

ws21 22_chemical compounds

tholzheim edited this page Apr 15, 2022 · 1 revision

Motivation

The chemical industry aims to assess and reduce environmental impacts of their products. This is a challenging task because the environmental impacts depend various production technologies and local circumstances in the global supply chain. However, currently information about production plants is not available in a structured way. Collecting and structuring the information about chemical production technologies are a necessary prerequisite for a holistic assessment of environmental impacts.

Task description

In this project, the teams extract information about chemical production plants from news articles. This includes the extractions of chemical compounds, locations of chemical plants, company names, and chemical process technologies. Moreover, the extracted information is linked in a knowledge graph (e.g. link the chemical plants, chemical compounds, and their location in news) and potentially illustrated on a map. The two teams can focus their development on different news webpages and then test their approaches on the remaining news pages.

Possible procedures:

  1. web crawler
  1. name entities recognition
  2. query wikidata

Initial example code to get started

Chemistry.ipynb in the github