upate plausibleAfterBirth and small formatting of headings

katy-sadowski · Jan 27, 2024 · a78ec1b · a78ec1b
1 parent e26603d
commit a78ec1b
Show file tree

Hide file tree

Showing 5 changed files with 31 additions and 57 deletions.
diff --git a/vignettes/CheckStatusDefinitions.rmd b/vignettes/CheckStatusDefinitions.rmd
@@ -1,6 +1,6 @@
 ---
-title: "Check Status Descriptions"
-author: "Dmitry Ilyn"
+title: "Check Status Definitions"
+author: "Dmitry Ilyn, Maxim Moinat"
 date: "`r Sys.Date()`"
 output:
     pdf_document:
@@ -10,13 +10,11 @@ output:
         number_sections: yes
         toc: yes
 vignette: >
-    %\VignetteIndexEntry{Check Status Descriptions}
+    %\VignetteIndexEntry{Check Status Definitions}
     %\VignetteEncoding{UTF-8}
     %\VignetteEngine{knitr::knitr}
 ---
 
-# DQD check statuses
-
 ## Introduction
 In the DataQualityDashboard v2, new check statuses were introduced: `Error` and `Not Applicable`. These were introduced to more accurately reflect the quality of data contained in a CDM instance, addressing scenarios where pass/fail is not appropriate. The new set of mutually exclusive status states are listed below in priority order:
 

diff --git a/vignettes/DataQualityDashboard.rmd b/vignettes/DataQualityDashboard.rmd
@@ -15,9 +15,6 @@ vignette: >
   %\VignetteEngine{knitr::knitr}
 ---
 
-# Getting Started
-***
-
 R Installation
 ===============
 

diff --git a/vignettes/DqdForCohorts.rmd b/vignettes/DqdForCohorts.rmd
@@ -15,8 +15,6 @@ vignette: >
   %\VignetteEngine{knitr::knitr}
 ---
 
-# DQD Cohort Functionality
-
 Running the Data Quality Dashboard for a cohort is fairly straightforward. There are two options in the `executeDqChecks` function, `cohortDefinitionId` and `cohortDatabaseSchema`. These options will point the DQD to the schema where the cohort table is located and provide the id of the cohort on which the DQD will be run. By default, the tool assumes that the table being referenced is the standard OHDSI cohort table named **COHORT** with at least the columns **cohort_definition_id** and **subject_id**. For example, if I have a cohort number 123 and the cohort is in the *results* schema of the *IBM_CCAE* database, the `executeDqChecks` function would look like this:
 
   ```r

diff --git a/vignettes/checks/plausibleAfterBirth copy.Rmd b/vignettes/checks/plausibleAfterBirth copy.Rmd
diff --git a/vignettes/checks/plausibleAfterBirth.Rmd b/vignettes/checks/plausibleAfterBirth.Rmd
@@ -31,7 +31,14 @@ This check verifies that events happen after birth. This check is only run on fi
 ## User Guidance
 There might be valid reasons why a record has a date value that occurs prior to birth. For example, prenatal observations might be captured or procedures on the mother might be added to the file of the child. Therefore, some failing records are expected and the default threshold of 1% accounts for that.
 
-However, if more records violate this check, there might be an issue with incorrect birthdates or events with a default date. It is recommended to investigate the records that fail this check to determine the cause of the error.
+However, if more records violate this check, there might be an issue with incorrect birthdates or events with a default date. It is recommended to investigate the records that fail this check to determine the cause of the error and set proper dates. If it is impossible to fix, then implement one of these:
+
+ - Aggressive: Remove all patients who have at least one record before birth (if the birthdate of this patient is unreliable)
+ - Less aggressive: Remove all rows that happen before birth. Probably this should be chosen as a conventional approach for data clean up (if the event dates are unreliable)
+ - Conservative: set event date to birth date.
+
+Make sure to clearly document the choices in your ETL specification.
+
 
 ### Violated rows query
 You may also use the “violated rows” SQL query to inspect the violating rows and help diagnose the potential root cause of the issue:
@@ -60,6 +67,26 @@ WHERE cdmTable.@cdmFieldName < CAST(CONCAT(
     ) AS DATE)
 ```
 
+Also, the length of the time interval between these dates might give you a hint of why the problem appears.
+```sql
+select date_difference, count(*)
+from (
+    select daydiff(
+        day, 
+        @cdmFieldName, 
+        COALESCE(
+            CAST(p.birth_datetime AS DATE),
+            CAST(CONCAT(p.year_of_birth,'-01-01') AS DATE))
+        ) as date_difference
+    from @cdmTableName ct
+    join person p on ct.person_id = p.person_id 
+) cte
+where date_difference < 0
+group by date_difference
+order by count(*) desc
+;
+```
+
 ### ETL Developers
 As above, if the number of failing records is high, it is recommended to investigate the records that fail this check to determine the underlying cause of the error.
-Original file line number
+Diff line change
@@ Expand Up / @@ -15,8 +15,6 @@ vignette: > @@
       %\VignetteEngine{knitr::knitr}
     ---
-    # DQD Cohort Functionality
     Running the Data Quality Dashboard for a cohort is fairly straightforward. There are two options in the `executeDqChecks` function, `cohortDefinitionId` and `cohortDatabaseSchema`. These options will point the DQD to the schema where the cohort table is located and provide the id of the cohort on which the DQD will be run. By default, the tool assumes that the table being referenced is the standard OHDSI cohort table named **COHORT** with at least the columns **cohort_definition_id** and **subject_id**. For example, if I have a cohort number 123 and the cohort is in the *results* schema of the *IBM_CCAE* database, the `executeDqChecks` function would look like this:
       ```r
@@ Expand Down @@