Skip to content

BCDC Upload Detailed Example

Gene Hsu edited this page Feb 27, 2019 · 7 revisions

This is a detailed example of what needs to be done for an initial ingest of transcriptomics data by the U01/U19 user (the User), the R24 archive (the Archive), and the BCDC metadata repository (BCDC).

Data and Metadata Generation

Experimental Data

When the User collects their data, the User will generate files that will need to be uploaded to the Archive.

Registration with BCDC

In order to upload data to BCDC, the User must be set up with accounts and reference objects. For each grant, a namespace will be created to contain the metadata associated with the experimental data. The User should work with a representative of BCDC in order to set up the reference objects for the User. These reference objects should represent entities in ontology that include (not a complete list)

  • grants
  • organizations
  • protocols
  • species
  • sex
  • techniques
  • archives

These reference objects must be created before they can be used when metadata is uploaded to BCDC.

Association with metadata

For each experimental file, the User must associate metadata with each file. The metadata that the User creates should represent entities in the ontology that include (not a complete list)

  • files
  • feature sets
  • features
  • processes
  • observations
  • specimens
  • genome alignments
  • cell phenotypes
  • projects

The User should refer to the ontology documentation to know how to model the information about the experiment to these ontological entities.

Creation of a File Manifest

In order to confirm that the Archive can receive all experimental files correctly, a file manifest must be created that lists all experimental files with validation information. BCDC may create a script that can be used to create this file manifest from a directory of files.

Here is more information about the File Manifest.

Upload data

Upload experimental data to the Archive

Each Archive will provide instructions how to upload experimental data to its data store. Along with the experimental data, the User will also upload a file manifest to the Archive.

Upload metadata to BCDC

BCDC will provide an API interface to allow the User to upload their metadata to BCDC. The User will upload metadata to BCDC in this order (to preserve referential integrity).

  • cell phenotypes
  • specimens
  • processes
  • observations
  • feature sets
  • features
  • genome alignments
  • projects
  • files

Metadata Sync from BCDC to the Archive

BCDC will provide an API interface for the Archive to retrieve the metadata for a file. A bulk retrieval interface may be also be implemented.

Retrieval of data URLs from the Archive

The Archive will provide an API interface to retrieve the URL for each file. Other file-specific metadata present in the file manifest may also be included in this retrieval. BCDC may call this endpoint in order to provide a download link in a web interface.