Patch-seq Data Linkage #17

carolth · 2020-02-21T00:33:02Z

This issue is for discussion around Data Linkage for Patch-seq. The goal is to determine the requirements for Data Linkage across archives.
Please feel free to comment and add to the discussion.

Patch-Seq consists of data that will be housed at 3 archives:

Archive	Data Type	File Type (Minimum)	Note
DANDI	Patch Ephys	NWB
BIL	Morphology	SWC	Images optional
NEMO	Transcriptomics	Fastq

Here is a first use case to support:

Use Case: User wants to find data associated with only those cells that have all 3 data types.
Necessary information:

Cell specimen id for linking data across all three archives
Availability of each data type for that cell - some cells will have all 3, some will have 2, some cells may only have ephys.

satra · 2020-02-21T15:32:46Z

We should (eventually) have multiple levels of connection. At the very least we should level 1.

levels:

dataset
e.g. in DANDI dataset metadata: associatedDataset: some_permalink
right now this would be:
https://portal.nemoarchive.org/search/c?filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.study_name%22,%22value%22:%5B%22U01%20Kriegstein%22%5D%7D%7D%5D%7D
(i could not find a dataset level landing page in nemo).
subjects
tissueSamples
slices
cells

not all datasets will have all of these, but this is essentially using a isPartOf/hasPart relationship across the levels. It is a graph, since these entities from 2 - 4 can be in multiple datasets.

The question for the archives are:

Is there going to be an API to access information at any given level?
Are the IDs going to be common across archives
Are the IDs going to be UUIDs for any of these entities?
Are the IDs going to be generated by the archives?
How is the link going to be maintained between lab IDs and archive IDs?

Questions for BCDC:

Will BCDC do the integration?
Will the archives point to other information at all of these levels?
Should such data be replicated across archives in some form?

lydiang · 2020-02-21T20:46:49Z

What are the use cases?

From Maryann
Dataset are usually slices of a larger dataset
People have correlated data
Do things get bundled together of separated?
We do not want people to to enter meta data individually
Handle local ID so that all the pieces can be put together again
From Satra
It would be great if there a centralize ingest broker
It is for the meta-data and not for actually moving the data

From Anita

it needs to be at coordinating center
it would have an pointer to the other

satra · 2020-03-20T17:31:37Z

@hhuot and @ghood - we now have two of the patch-seq data online, and it may be useful to use these as example to coordinate accession across archives.

https://dandiarchive.org/dandiset/000008/draft
https://dandiarchive.org/dandiset/000012/draft

there are two areas, where i would like to coordinate:

at a dataset metadata level (see dataset-metadata #15). here we have a metadata key to link to associated datasets.

associatedData: # REQUIRED if it exists and mandated by your project 
- name: REQUIRED
  identifier: REQUIRED
  repository: REQUIRED
  url: REQUIRED

at the cell id level. right now if you navigate down to each file, following the folder, each file contains in the name (and in the associated metadata), subject, cell, tissueSample IDs. these were given to us by the lab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch-seq Data Linkage #17

Patch-seq Data Linkage #17

carolth commented Feb 21, 2020 •

edited

Loading

satra commented Feb 21, 2020 •

edited

Loading

lydiang commented Feb 21, 2020

satra commented Mar 20, 2020

Patch-seq Data Linkage #17

Patch-seq Data Linkage #17

Comments

carolth commented Feb 21, 2020 • edited Loading

satra commented Feb 21, 2020 • edited Loading

lydiang commented Feb 21, 2020

satra commented Mar 20, 2020

carolth commented Feb 21, 2020 •

edited

Loading

satra commented Feb 21, 2020 •

edited

Loading