Skip to content

Data Quality check

Saif Shabou edited this page Jun 4, 2021 · 1 revision

After integrating raw data with respect to a common data model, we need to check its quality and implement various treatments in order to provide clean data.

Since all data sources are stored with respect to one data model, it makes easy the development of generic functions that we can apply to the different datasets to detect and filter non-useful records.

Different data quality treatments have been identified:

  • missing values
  • outlier values in recorded emissions
  • duplicates
  • wrong emissions formats
  • ...

The goal of this section is to priovde a census of different data quality treatments that we should apply to the integrated datasets.

Clone this wiki locally