Add example GA4GH tables from dbGaP to demonstrate how a user would resolve differences in values/coding #4

ianfore · 2020-11-08T14:52:42Z

Added one table so far with unique codings for sex and race.
Column (variable) names also unique
The table so far search_cloud.cshcodeathon.organoid_profiling_pc_subject_phenotypes_gru

Aiming for three or four such tables from dbGaP. The codings and column names will vary.

The question is: how the machine readable information (schema) provided about each table can help make it easier for a data scientist? We assume they are using tools such as python or R and can transform the data in those tools quite easily as long as they have the information to do so. /table/tablename/info provides that information.

Note that in dbGaP the data used in the table above is controlled access. The dataset available through the GA4GH Search API uses values from the dataset but each record (row) is a simulated example - not a real record.

ianfore · 2020-11-08T17:50:49Z

See [this diagram](

) for a flow of how the data dictionaries get from dbGaP to GA4GH Search. In the case of the tables I've created this weekend we haven't yet run the data dictionaries through the DNAStack step. So for the moment the schemas we seen in GA4GH Search are autogenerated from the BigQuery table definitions. That doesn't include the enumerated listings of codes. However we can for the moment get the definition from the dbGaP data dictionary itself. For example, here's a link to the data dictionary for the data in the organoid_profiling_pc_subject_phenotypes_gru table.

Note that if you open the link in a web browser it will display as an html table. For API purposes you can read the xml programmatically. One approach is to read the dictionary visually and translate the data in Python or R.

There are some mapping tables that can help which I will add to GA4Gh Search.

ianfore · 2020-11-08T19:22:45Z

Created these two tables to do a mapping.
search_cloud.cshcodeathon.md_mapping
search_cloud.cshcodeathon.md_mapping_term
Working on an example to use them.

ianfore · 2020-11-08T20:39:16Z

Mapping example added.

Now need to map more columns!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example GA4GH tables from dbGaP to demonstrate how a user would resolve differences in values/coding #4

Add example GA4GH tables from dbGaP to demonstrate how a user would resolve differences in values/coding #4

ianfore commented Nov 8, 2020

ianfore commented Nov 8, 2020 •

edited

Loading

ianfore commented Nov 8, 2020

ianfore commented Nov 8, 2020

Add example GA4GH tables from dbGaP to demonstrate how a user would resolve differences in values/coding #4

Add example GA4GH tables from dbGaP to demonstrate how a user would resolve differences in values/coding #4

Comments

ianfore commented Nov 8, 2020

ianfore commented Nov 8, 2020 • edited Loading

ianfore commented Nov 8, 2020

ianfore commented Nov 8, 2020

ianfore commented Nov 8, 2020 •

edited

Loading