The AD Knowledge Portal data model

Production data model
Editing data models
Automation
Developing in a codespace
Legacy data models:

Production data model

AD.model.* (csv | jsonld): this is the current, "live" version of the AD Portal data model. It is being used by both the staging and production versions of the multitenant Data Curator App.

Editing data models

⚠️ Do not edit AD.model.csv or AD.model.jsonld by hand! ⚠️

Github branch procedure:

The main branch of this repo is protected, so you cannot push changes to main. To make changes to the data model:

Create a new branch in this repo and give it an informative name. The schema-convert workflow will not work from a private fork.
On that branch, make and commit any changes. You can do this by cloning the repo locally or by using a Github codespace. Please write informative commit messages in case we need to track down data model inconsistencies or introduced bugs.
Open a pull request and request review from someone else on the AD DCC team. The Github Action described in Automation will run as soon as you open the PR. If this action fails, something about the data model csv could not be converted to a json-ld and should be investigated. If this action passes, the PR can be merged with one approving review.
After the PR is merged, delete your branch.

Editing attributes by module:

The full AD.model.csv file has over 1400 attributes and is unwieldy to edit and hard to review changes for. For ease of editing, the full data model is divided into "module" subfolders, like so:

data-models/
├── AD.model.csv (do not edit!)
├── AD.model.jsonld (do not edit!)
└── modules/
    ├── biospecimen/
    │   ├── specimenID.csv
    │   ├── organ.csv
    │   └── tissue.csv
    └── sequencing/
        ├── readLength.csv
        └── platform.csv

Within each module, every attribute in the data model where Parent = DataProperty has its own csv, named after that attribute (example: organ.csv). Any valid values of the attribute "organ" have Parent = organ and are listed as rows in the file organ.csv. Attributes with Parent = DataProperty are used as columns in metadata and annotation manifest templates. Attributes with Parent = DataType describe the templates themselves. At this time, any other value for Parent means the attribute is a valid value of some other enumerated attribute.

Some common data model editing scenarios are:

Adding a new valid value to an existing manifest column:

If you wanted to add a new valid value "eyeball" to our existing column attribute "organ", after making a new branch and opening the repo either locally or within a codespace, you would go to modules/biospecimen/organ.csv.
Then, create a new row for an attribute named "eyeball", with a description and source (preferably an ontology URI). In the Parent column, make sure the value is "organ".
Next, find the row for the attribute "organ" (should be the first row), and w/in the valid values column, add "eyeball" to the comma-separated list of valid values.
Save your changes and write an informative commit. Please try to add valid values alphabetically!

Adding a new column to a manifest template:

If you wanted to add the column "furColor" to the "model-ad_individual_animal_metadata" template, first decide which module the new column should belong to. In this case, "MODEL-AD" makes the most sense.
W/in the MODEL-AD subfolder, create a new csv called furColor.csv with the required schematic column headers. Describe the attribute "furColor" as necessary and make sure Parent = DataProperty. Add any valid values for "furColor" as new rows to this csv as described in the previous scenario.
Find the manifest template attributes in modules/template/templates.csv. In the "model-ad_individual_animal_metadata" row, add your new column "furColor" to the comma-separated list of attributes in the DependsOn column.
Save your changes and write an informative commit.

For more advanced data modeling scenarios like adding conditional logic, creating validation rules, or creating new manifests, please consult the #ad-dcc-team slack channel.

Notes on collaboratively editing csvs

A persistent issue is that manually editing csvs is challening. Some columns in our modules are very short, and others are veeeeery long (Description, Valid Values). Some options for working on csvs, and their pros and cons:

Editing in the Github UI : convenient, but challenging to keep track of columns in plain text format.
Cloning the repo, making a branch, and opening csvs locally in Excel or another spreadsheet program 🖥️ : probably the best UI experience, but involves a few extra steps with git.
Using a Github codespace to launch VSCode in the browser, and editing with the pre-installed RainbowCSV extension 🌈 : Still difficult to edit csvs as plain text, but the color formatting and ability to use a soft word wrap makes it much easier to distinguish columns. RainbowCSV lets you designate "sticky" rows and columns for easier scrolling, and also has a nice "CSVLint" function that will check for formatting errors after you make changes.

We are exploring better solutions to this problem -- if you have ideas, tell us!

Automation

When you open a PR that includes any changes to files in the modules/ directory, a Github Action will automatically run before merging is allowed. This action:

Runs the assemble_csv_data_model.py script to concatenate the modular attribute csvs into one data frame, sort alphabetically by Parent and then Attribute, and write the combined dataframe to AD.model.csv. The action then commits the changes to the master data model csv.
Installs schematic from the develop branch and runs schema convert on the newly-concatenated data model csv to generate a new version of the jsonld file AD.model.jsonld. The action also commits the changes to the jsonld.

If this automated workflow fails, then the data model may be invalid and further investigation is needed.

Developing in a codespace

⚠️ If you are working in a Github Codespace, do NOT commit any Synapse credentials to the repository and do NOT use any real human data when testing data model function. This is not a secure environment!

If you want to make changes to the data model and test them out by generating manifests with schematic, you can use the devcontainer in this repo with a Github Codespace. This will open a container in a remote instance of VSCode and install the latest version of schematic.

Codespace secrets:

SYNAPSE_PAT: scoped to view and download permissions on the sysbio-dcc-tasks-01 Synapse service account
SERVICE_ACCOUNT_CREDS: these are creds for using the Google sheets api with schematic

Legacy data models:

Previous versions of the data model live in the legacy-data-models/ folder. This include the Diverse Cohorts pilot model and the intial "legacy" model representing the AD Portal Synapse project metadata dictionary and metadata templates from August 2023. These are not being used by DCA.

Name		Name	Last commit message	Last commit date
Latest commit History 292 Commits
.devcontainer		.devcontainer
.github		.github
legacy-data-models		legacy-data-models
modules		modules
scripts		scripts
tests		tests
.gitignore		.gitignore
AD.model.csv		AD.model.csv
AD.model.jsonld		AD.model.jsonld
ADKP_logo.png		ADKP_logo.png
LICENSE		LICENSE
README.md		README.md
dca-template-config.json		dca-template-config.json
requirements.txt		requirements.txt
schematic-config.yml		schematic-config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The AD Knowledge Portal data model

Production data model

Editing data models

Github branch procedure:

Editing attributes by module:

Adding a new valid value to an existing manifest column:

Adding a new column to a manifest template:

Notes on collaboratively editing csvs

Automation

Developing in a codespace

Legacy data models:

About

Releases

Packages

Languages

License

nlee-sage/data-models

Folders and files

Latest commit

History

Repository files navigation

The AD Knowledge Portal data model

Production data model

Editing data models

Github branch procedure:

Editing attributes by module:

Adding a new valid value to an existing manifest column:

Adding a new column to a manifest template:

Notes on collaboratively editing csvs

Automation

Developing in a codespace

Legacy data models:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages