From a78ec1b244c9da2c6eaa9d564404203d1aeaa6c4 Mon Sep 17 00:00:00 2001 From: Maxim Moinat Date: Sat, 27 Jan 2024 11:32:38 +0100 Subject: [PATCH] upate plausibleAfterBirth and small formatting of headings --- vignettes/CheckStatusDefinitions.rmd | 8 ++-- vignettes/DataQualityDashboard.rmd | 3 -- vignettes/DqdForCohorts.rmd | 2 - vignettes/checks/plausibleAfterBirth copy.Rmd | 46 ------------------- vignettes/checks/plausibleAfterBirth.Rmd | 29 +++++++++++- 5 files changed, 31 insertions(+), 57 deletions(-) delete mode 100644 vignettes/checks/plausibleAfterBirth copy.Rmd diff --git a/vignettes/CheckStatusDefinitions.rmd b/vignettes/CheckStatusDefinitions.rmd index 37841371..50e5cac7 100644 --- a/vignettes/CheckStatusDefinitions.rmd +++ b/vignettes/CheckStatusDefinitions.rmd @@ -1,6 +1,6 @@ --- -title: "Check Status Descriptions" -author: "Dmitry Ilyn" +title: "Check Status Definitions" +author: "Dmitry Ilyn, Maxim Moinat" date: "`r Sys.Date()`" output: pdf_document: @@ -10,13 +10,11 @@ output: number_sections: yes toc: yes vignette: > - %\VignetteIndexEntry{Check Status Descriptions} + %\VignetteIndexEntry{Check Status Definitions} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::knitr} --- -# DQD check statuses - ## Introduction In the DataQualityDashboard v2, new check statuses were introduced: `Error` and `Not Applicable`. These were introduced to more accurately reflect the quality of data contained in a CDM instance, addressing scenarios where pass/fail is not appropriate. The new set of mutually exclusive status states are listed below in priority order: diff --git a/vignettes/DataQualityDashboard.rmd b/vignettes/DataQualityDashboard.rmd index af09c1d6..6c45fa10 100644 --- a/vignettes/DataQualityDashboard.rmd +++ b/vignettes/DataQualityDashboard.rmd @@ -15,9 +15,6 @@ vignette: > %\VignetteEngine{knitr::knitr} --- -# Getting Started -*** - R Installation =============== diff --git a/vignettes/DqdForCohorts.rmd b/vignettes/DqdForCohorts.rmd index a72517d6..59812e61 100644 --- a/vignettes/DqdForCohorts.rmd +++ b/vignettes/DqdForCohorts.rmd @@ -15,8 +15,6 @@ vignette: > %\VignetteEngine{knitr::knitr} --- -# DQD Cohort Functionality - Running the Data Quality Dashboard for a cohort is fairly straightforward. There are two options in the `executeDqChecks` function, `cohortDefinitionId` and `cohortDatabaseSchema`. These options will point the DQD to the schema where the cohort table is located and provide the id of the cohort on which the DQD will be run. By default, the tool assumes that the table being referenced is the standard OHDSI cohort table named **COHORT** with at least the columns **cohort_definition_id** and **subject_id**. For example, if I have a cohort number 123 and the cohort is in the *results* schema of the *IBM_CCAE* database, the `executeDqChecks` function would look like this: ```r diff --git a/vignettes/checks/plausibleAfterBirth copy.Rmd b/vignettes/checks/plausibleAfterBirth copy.Rmd deleted file mode 100644 index 568cbffa..00000000 --- a/vignettes/checks/plausibleAfterBirth copy.Rmd +++ /dev/null @@ -1,46 +0,0 @@ ---- -title: "plausibleAfterBirth" -author: "" -date: "`r Sys.Date()`" -output: - html_document: - number_sections: yes - toc: yes ---- - -## Summary - -**Level**: FIELD\ -**Context**: Verification\ -**Category**: Plausibility\ -**Subcategory**: Temporal\ -**Severity**: - - -## Description -The number and percent of records with a date value in the @cdmFieldName field of the @cdmTableName table that occurs prior to birth. - - -## Definition - -- *Numerator*: -- *Denominator*: -- *Related CDM Convention(s)*: -- *CDM Fields/Tables*: -- *Default Threshold Value*: - - -## User Guidance - - -### Violated rows query -```sql - -``` - - -### ETL Developers - - -### Data Users - diff --git a/vignettes/checks/plausibleAfterBirth.Rmd b/vignettes/checks/plausibleAfterBirth.Rmd index 8829864b..66697dd5 100644 --- a/vignettes/checks/plausibleAfterBirth.Rmd +++ b/vignettes/checks/plausibleAfterBirth.Rmd @@ -31,7 +31,14 @@ This check verifies that events happen after birth. This check is only run on fi ## User Guidance There might be valid reasons why a record has a date value that occurs prior to birth. For example, prenatal observations might be captured or procedures on the mother might be added to the file of the child. Therefore, some failing records are expected and the default threshold of 1% accounts for that. -However, if more records violate this check, there might be an issue with incorrect birthdates or events with a default date. It is recommended to investigate the records that fail this check to determine the cause of the error. +However, if more records violate this check, there might be an issue with incorrect birthdates or events with a default date. It is recommended to investigate the records that fail this check to determine the cause of the error and set proper dates. If it is impossible to fix, then implement one of these: + + - Aggressive: Remove all patients who have at least one record before birth (if the birthdate of this patient is unreliable) + - Less aggressive: Remove all rows that happen before birth. Probably this should be chosen as a conventional approach for data clean up (if the event dates are unreliable) + - Conservative: set event date to birth date. + +Make sure to clearly document the choices in your ETL specification. + ### Violated rows query You may also use the “violated rows” SQL query to inspect the violating rows and help diagnose the potential root cause of the issue: @@ -60,6 +67,26 @@ WHERE cdmTable.@cdmFieldName < CAST(CONCAT( ) AS DATE) ``` +Also, the length of the time interval between these dates might give you a hint of why the problem appears. +```sql +select date_difference, count(*) +from ( + select daydiff( + day, + @cdmFieldName, + COALESCE( + CAST(p.birth_datetime AS DATE), + CAST(CONCAT(p.year_of_birth,'-01-01') AS DATE)) + ) as date_difference + from @cdmTableName ct + join person p on ct.person_id = p.person_id +) cte +where date_difference < 0 +group by date_difference +order by count(*) desc +; +``` + ### ETL Developers As above, if the number of failing records is high, it is recommended to investigate the records that fail this check to determine the underlying cause of the error.