Skip to content

Commit

Permalink
upate plausibleAfterBirth and small formatting of headings
Browse files Browse the repository at this point in the history
  • Loading branch information
MaximMoinat committed Jan 27, 2024
1 parent e26603d commit a78ec1b
Show file tree
Hide file tree
Showing 5 changed files with 31 additions and 57 deletions.
8 changes: 3 additions & 5 deletions vignettes/CheckStatusDefinitions.rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Check Status Descriptions"
author: "Dmitry Ilyn"
title: "Check Status Definitions"
author: "Dmitry Ilyn, Maxim Moinat"
date: "`r Sys.Date()`"
output:
pdf_document:
Expand All @@ -10,13 +10,11 @@ output:
number_sections: yes
toc: yes
vignette: >
%\VignetteIndexEntry{Check Status Descriptions}
%\VignetteIndexEntry{Check Status Definitions}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::knitr}
---

# DQD check statuses

## Introduction
In the DataQualityDashboard v2, new check statuses were introduced: `Error` and `Not Applicable`. These were introduced to more accurately reflect the quality of data contained in a CDM instance, addressing scenarios where pass/fail is not appropriate. The new set of mutually exclusive status states are listed below in priority order:

Expand Down
3 changes: 0 additions & 3 deletions vignettes/DataQualityDashboard.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,6 @@ vignette: >
%\VignetteEngine{knitr::knitr}
---

# Getting Started
***

R Installation
===============

Expand Down
2 changes: 0 additions & 2 deletions vignettes/DqdForCohorts.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@ vignette: >
%\VignetteEngine{knitr::knitr}
---

# DQD Cohort Functionality

Running the Data Quality Dashboard for a cohort is fairly straightforward. There are two options in the `executeDqChecks` function, `cohortDefinitionId` and `cohortDatabaseSchema`. These options will point the DQD to the schema where the cohort table is located and provide the id of the cohort on which the DQD will be run. By default, the tool assumes that the table being referenced is the standard OHDSI cohort table named **COHORT** with at least the columns **cohort_definition_id** and **subject_id**. For example, if I have a cohort number 123 and the cohort is in the *results* schema of the *IBM_CCAE* database, the `executeDqChecks` function would look like this:

```r
Expand Down
46 changes: 0 additions & 46 deletions vignettes/checks/plausibleAfterBirth copy.Rmd

This file was deleted.

29 changes: 28 additions & 1 deletion vignettes/checks/plausibleAfterBirth.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,14 @@ This check verifies that events happen after birth. This check is only run on fi
## User Guidance
There might be valid reasons why a record has a date value that occurs prior to birth. For example, prenatal observations might be captured or procedures on the mother might be added to the file of the child. Therefore, some failing records are expected and the default threshold of 1% accounts for that.

However, if more records violate this check, there might be an issue with incorrect birthdates or events with a default date. It is recommended to investigate the records that fail this check to determine the cause of the error.
However, if more records violate this check, there might be an issue with incorrect birthdates or events with a default date. It is recommended to investigate the records that fail this check to determine the cause of the error and set proper dates. If it is impossible to fix, then implement one of these:

- Aggressive: Remove all patients who have at least one record before birth (if the birthdate of this patient is unreliable)
- Less aggressive: Remove all rows that happen before birth. Probably this should be chosen as a conventional approach for data clean up (if the event dates are unreliable)
- Conservative: set event date to birth date.

Make sure to clearly document the choices in your ETL specification.


### Violated rows query
You may also use the “violated rows” SQL query to inspect the violating rows and help diagnose the potential root cause of the issue:
Expand Down Expand Up @@ -60,6 +67,26 @@ WHERE cdmTable.@cdmFieldName < CAST(CONCAT(
) AS DATE)
```

Also, the length of the time interval between these dates might give you a hint of why the problem appears.
```sql
select date_difference, count(*)
from (
select daydiff(
day,
@cdmFieldName,
COALESCE(
CAST(p.birth_datetime AS DATE),
CAST(CONCAT(p.year_of_birth,'-01-01') AS DATE))
) as date_difference
from @cdmTableName ct
join person p on ct.person_id = p.person_id
) cte
where date_difference < 0
group by date_difference
order by count(*) desc
;
```

### ETL Developers
As above, if the number of failing records is high, it is recommended to investigate the records that fail this check to determine the underlying cause of the error.

Expand Down

0 comments on commit a78ec1b

Please sign in to comment.