Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch-seq Data Linkage #17

Open
carolth opened this issue Feb 21, 2020 · 3 comments
Open

Patch-seq Data Linkage #17

carolth opened this issue Feb 21, 2020 · 3 comments

Comments

@carolth
Copy link
Collaborator

carolth commented Feb 21, 2020

This issue is for discussion around Data Linkage for Patch-seq. The goal is to determine the requirements for Data Linkage across archives.
Please feel free to comment and add to the discussion.

Patch-Seq consists of data that will be housed at 3 archives:

Archive Data Type File Type (Minimum) Note
DANDI Patch Ephys NWB  
BIL Morphology SWC Images optional
NEMO Transcriptomics Fastq  

Here is a first use case to support:

  1. Use Case: User wants to find data associated with only those cells that have all 3 data types.
    Necessary information:
  • Cell specimen id for linking data across all three archives
  • Availability of each data type for that cell - some cells will have all 3, some will have 2, some cells may only have ephys.
@satra
Copy link
Contributor

satra commented Feb 21, 2020

We should (eventually) have multiple levels of connection. At the very least we should level 1.

levels:

  1. dataset
    e.g. in DANDI dataset metadata: associatedDataset: some_permalink
    right now this would be:
    https://portal.nemoarchive.org/search/c?filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.study_name%22,%22value%22:%5B%22U01%20Kriegstein%22%5D%7D%7D%5D%7D
    (i could not find a dataset level landing page in nemo).

  2. subjects

  3. tissueSamples

  4. slices

  5. cells

not all datasets will have all of these, but this is essentially using a isPartOf/hasPart relationship across the levels. It is a graph, since these entities from 2 - 4 can be in multiple datasets.

The question for the archives are:

  1. Is there going to be an API to access information at any given level?
  2. Are the IDs going to be common across archives
  3. Are the IDs going to be UUIDs for any of these entities?
  4. Are the IDs going to be generated by the archives?
  5. How is the link going to be maintained between lab IDs and archive IDs?

Questions for BCDC:

  1. Will BCDC do the integration?
  2. Will the archives point to other information at all of these levels?
  3. Should such data be replicated across archives in some form?

@lydiang
Copy link

lydiang commented Feb 21, 2020

What are the use cases?

  • From Maryann

  • Dataset are usually slices of a larger dataset

  • People have correlated data

  • Do things get bundled together of separated?

  • We do not want people to to enter meta data individually

  • Handle local ID so that all the pieces can be put together again

  • From Satra

  • It would be great if there a centralize ingest broker

  • It is for the meta-data and not for actually moving the data

From Anita

  • it needs to be at coordinating center

  • it would have an pointer to the other

@satra
Copy link
Contributor

satra commented Mar 20, 2020

@hhuot and @ghood - we now have two of the patch-seq data online, and it may be useful to use these as example to coordinate accession across archives.

https://dandiarchive.org/dandiset/000008/draft
https://dandiarchive.org/dandiset/000012/draft

there are two areas, where i would like to coordinate:

  1. at a dataset metadata level (see dataset-metadata #15). here we have a metadata key to link to associated datasets.
associatedData: # REQUIRED if it exists and mandated by your project 
- name: REQUIRED
  identifier: REQUIRED
  repository: REQUIRED
  url: REQUIRED
  1. at the cell id level. right now if you navigate down to each file, following the folder, each file contains in the name (and in the associated metadata), subject, cell, tissueSample IDs. these were given to us by the lab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants