You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added one table so far with unique codings for sex and race.
Column (variable) names also unique
The table so far search_cloud.cshcodeathon.organoid_profiling_pc_subject_phenotypes_gru
Aiming for three or four such tables from dbGaP. The codings and column names will vary.
The question is: how the machine readable information (schema) provided about each table can help make it easier for a data scientist? We assume they are using tools such as python or R and can transform the data in those tools quite easily as long as they have the information to do so. /table/tablename/info provides that information.
Note that in dbGaP the data used in the table above is controlled access. The dataset available through the GA4GH Search API uses values from the dataset but each record (row) is a simulated example - not a real record.
The text was updated successfully, but these errors were encountered:
) for a flow of how the data dictionaries get from dbGaP to GA4GH Search. In the case of the tables I've created this weekend we haven't yet run the data dictionaries through the DNAStack step. So for the moment the schemas we seen in GA4GH Search are autogenerated from the BigQuery table definitions. That doesn't include the enumerated listings of codes. However we can for the moment get the definition from the dbGaP data dictionary itself. For example, here's a link to the data dictionary for the data in the organoid_profiling_pc_subject_phenotypes_gru table.
Note that if you open the link in a web browser it will display as an html table. For API purposes you can read the xml programmatically. One approach is to read the dictionary visually and translate the data in Python or R.
There are some mapping tables that can help which I will add to GA4Gh Search.
Created these two tables to do a mapping.
search_cloud.cshcodeathon.md_mapping
search_cloud.cshcodeathon.md_mapping_term
Working on an example to use them.
Added one table so far with unique codings for sex and race.
Column (variable) names also unique
The table so far search_cloud.cshcodeathon.organoid_profiling_pc_subject_phenotypes_gru
Aiming for three or four such tables from dbGaP. The codings and column names will vary.
The question is: how the machine readable information (schema) provided about each table can help make it easier for a data scientist? We assume they are using tools such as python or R and can transform the data in those tools quite easily as long as they have the information to do so. /table/tablename/info provides that information.
Note that in dbGaP the data used in the table above is controlled access. The dataset available through the GA4GH Search API uses values from the dataset but each record (row) is a simulated example - not a real record.
The text was updated successfully, but these errors were encountered: