Skip to content

Commit

Permalink
Merge pull request #417 from ncihtan/DAG-fix
Browse files Browse the repository at this point in the history
Revisit CDS template
  • Loading branch information
adamjtaylor authored Jun 6, 2024
2 parents ed266a3 + 5227f14 commit 1dbb1b6
Show file tree
Hide file tree
Showing 2 changed files with 27,217 additions and 24,726 deletions.
28 changes: 19 additions & 9 deletions HTAN.model.csv
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,22 @@ Patient,HTAN patient,,"Component, HTAN Participant ID",,FALSE,Individual Organis
File,A type of Information Content Entity specific to OS,,,,FALSE,Information Content Entity,,https://w3id.org/biolink/vocab/DataFile,
Filename,Name of a file,,,,TRUE,,,,regex search ^.+\/\S*$
File Format,"Format of a file (e.g. txt, csv, fastq, bam, etc.)","hdf5, bedgraph, idx, idat, bam, bai, excel, powerpoint, tif, tiff, OME-TIFF, png, doc, pdf, fasta, fastq, sam, vcf, bcf, maf, bed, chp, cel, sif, tsv, csv, txt, plink, bigwig, wiggle, gct, bgzip, zip, seg, html, mov, hyperlink, svs, md, flagstat, gtf, raw, msf, rmd, bed narrowPeak, bed broadPeak, bed gappedPeak, avi, pzfx, fig, xml, tar, R script, abf, bpm, dat, jpg, locs, Sentrix descriptor file, Python script, sav, gzip, sdf, RData, hic, ab1, 7z, gff3, json, sqlite, svg, sra, recal, tranches, mtx, tagAlign, dup, DICOM, czi, mex, cloupe, am, cell am, mpg, m, mzML,scn, dcc, rcc, pkc, sf, bedpe",,,TRUE,,,,
CDS Sequencing Template,"CDS compatible template file, includes attributes for Genomic Reference, Library Layout, Data Type, Sequencing Platform, Library Selection Method",,"Component, Filename, File Format, HTAN Data File ID, HTAN Parent Biospecimen ID, CDS Genomic Reference, CDS Library Layout, CDS Data Type, CDS Sequencing Platform, CDS Library Selection Method",,TRUE,,,,
CDS Genomic Reference,One or more characters used to identify the published NCBI genetic sequence that is used as a reference against which other sequences are compared.,,,,TRUE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS Library Layout,The read strategy or method that was used for sequencing and analysis of a nucleotide library.,"Paired End, Single Read",,,TRUE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,
CDS Data Type,"Types of data associated with the content. Fill out Other Data Type Specified, if not on the list.","10x Visium Spatial Transcriptomics, Bulk Methylation-seq, Bulk RNA-seq, Bulk WES, Electron Microscopy, ExSeq, HI-C-seq, RPPA, Imaging, Mass Spectrometry, NanoString GeoMx DSP Spatial Transcriptomics, Other Assay, SRRS Imaging, Slide-seq, scATAC-seq, scDNA-seq, scRNA-seq, Accessory Manifest, Other Data Type Specified",,,TRUE,Publication,,https://dataservice.datacommons.cancer.gov/#/resources,list like
CDS Sequencing Platform,The words used to describe the instrument used to carry out a high-throughput sequencing experiment.,"Illumina Next Seq 500, Illumina Next Seq 550, Illumina Next Seq 2500, Illumina NovaSeq 6000, Illumina MiSeq, 454 GS FLX Titanium, AB SOLiD 4, AB SOLiD 2, AB SOLiD 3, Complete Genomics, Illumina HiSeq X Ten, Illumina HiSeq X Five, Illumina Genome Analyzer II, Illumina Genome Analyzer IIx, Illumina HiSeq 2000, Illumina HiSeq 2500, Illumina HiSeq 4000, Illumina MiSeq, Illumina NextSeq, Ion Torrent PGM, Ion Torrent Proton, Ion Torrent S5, PacBio RS, NovaSeq 6000, NovaSeqS4, Ultima Genomics UG100, Oxford Nanopore minION, GridION, PromethION, PacBio Sequel2, Revio, Illumina NextSeq 1000, Illumina NextSeq 2000, Other, unknown, Not Reported",,,TRUE,Device,,https://dataservice.datacommons.cancer.gov/#/resources,
CDS Library Selection Method,The type of systematic actions performed to select or enrich DNA fragments used in analysis by high-throughput sequencing.,"Random, rRNA Depletion, Other",,,TRUE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,
CDS Other Data Type Specified,Other types of data associated with the content.,,CDS Data Type,,FALSE,Sequencing,,,
CDS Sequencing Template,"CDS compatible template file, includes attributes for Genomic Reference, Library Layout, Data Type, Sequencing Platform, Library Selection Method",,"Component, Filename, File Format, HTAN Data File ID, HTAN Parent Biospecimen ID, CDS library_id, CDS library_strategy, CDS library_source, CDS library_selection, CDS library_layout, CDS platform, CDS instrument_model, CDS design_description, CDS reference_genome_assembly, CDS custom_assembly_fasta_file_for_alignment, CDS bases, CDS number_of_reads, CDS coverage, CDS avg_read_length, CDS sequence_alignment_software",,TRUE,,,,
CDS library_id,Short unique identifier for the sequencing library.,,,,TRUE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS library_strategy,Library strategy,"AMPLICON, ATAC-seq, Bisulfite-Seq, ChIA-PET, ChIP-Seq, CLONE, CLONEEND, CTS, DNase-Hypersensitivity, EST, FAIRE-seq, FINISHING, FL-cDNA, Hi-C, MBD-Seq, MeDIP-Seq, miRNA-Seq, MNase-Seq, MRE-Seq, ncRNA-Seq, OTHER, POOLCLONE, RAD-Seq, RIP-Seq, RNA-Seq, SELEX, ssRNA-seq, Synthetic-Long-Read, Targeted-Capture, Tethered Chromatin Conformation Capture, Tn-Seq, WCS, WGA, WGS, WXS",,,TRUE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS library_source,The Library Source specifies the type of source material that is being sequenced,"GENOMIC, GENOMIC SINGLE CELL, METAGENOMIC, METATRANSCRIPTOMIC, OTHER, SYNTHETIC, TRANSCRIPTOMIC, TRANSCRIPTOMIC SINGLE CELL, VIRAL RNA",,,TRUE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS library_selection,Library Selection Method,"5-methylcytidine antibody, CAGE, cDNA, cDNA_oligo_dT, cDNA_randomPriming, CF-H, CF-M, CF-S, CF-T, ChIP, DNAse, HMPR, Hybrid Selection, Inverse rRNA, MBD2 protein methyl-CpG binding domain, MDA, MF, MNase, MSLL, Oligo-dT, other, Padlock probes capture method, PCR, PolyA, RACE, RANDOM, RANDOM PCR, Reduced Representation, repeat fractionation, Restriction Digest, RT-PCR, size fractionation, unspecified",,,TRUE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS library_layout,Paired-end or Single,"Paired-end, Single-end",,,TRUE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS platform,Sequencing Platform used for Sequencing,"LS454, ABI_SOLID, BGISEQ, CAPILLARY, COMPLETE_GENOMICS, HELICOS, ILLUMINA, ION_TORRENT, OXFORD_NANOPORE, PACBIO_SMRT",,,TRUE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS instrument_model,Instrument model used for sequencing,"454 GS, 454 GS 20, 454 GS FLX, 454 GS FLX+, 454 GS FLX Titanium, 454 GS Junior, HiSeq X Five, HiSeq X Ten, Illumina Genome Analyzer, Illumina Genome Analyzer II, Illumina Genome Analyzer IIx, Illumina HiScanSQ, Illumina HiSeq 1000, Illumina HiSeq 1500, Illumina HiSeq 2000, Illumina HiSeq 2500, Illumina HiSeq 3000, Illumina HiSeq 4000, Illumina iSeq 100, Illumina NovaSeq 6000, Illumina MiniSeq, Illumina MiSeq, NextSeq 500, NextSeq 550, Helicos HeliScope, AB 5500 Genetic Analyzer, AB 5500xl Genetic Analyzer, AB 5500x-Wl Genetic Analyzer, AB SOLiD 3 Plus System, AB SOLiD 4 System, AB SOLiD 4hq System, AB SOLiD PI System, AB SOLiD System, AB SOLiD System 2.0, AB SOLiD System 3.0, Complete Genomics, PacBio RS, PacBio RS II, PacBio Sequel, PacBio Sequel II, Ion Torrent PGM, Ion Torrent Proton, Ion Torrent S5 XL, Ion Torrent S5, AB 310 Genetic Analyzer, AB 3130 Genetic Analyzer, AB 3130xL Genetic Analyzer, AB 3500 Genetic Analyzer, AB 3500xL Genetic Analyzer, AB 3730 Genetic Analyzer, AB 3730xL Genetic Analyzer, GridION, MinION, PromethION, BGISEQ-500, DNBSEQ-G400, DNBSEQ-T7, DNBSEQ-G50, MGISEQ-2000RS",,,TRUE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS design_description,Free-form description of the methods used to create the sequencing library; a brief 'materials and methods' section.,,,,FALSE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS reference_genome_assembly,This is only if you are submitting a bam file aligned against a NCBI assembly.,,,,FALSE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS custom_assembly_fasta_file_for_alignment,Please provide the name of the custom assembly fasta file used during alignment,,,,FALSE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
CDS bases,Count of unique basecalls present in the data. Please count each base only once if using secondary alignments.,,,,FALSE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,int
CDS number_of_reads,Count of the number of reads in the data. Please count each read only once if using secondary alignments.,,,,FALSE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,int
CDS coverage,Depth of coverage on assembly used. Found by (Unique Aligned Basecalls)/(Reference Length),,,,FALSE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,int
CDS avg_read_length,Found by (Bases)/(Reads),,,,FALSE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,int
CDS sequence_alignment_software,The name of the software program used to align nucleotide sequencing data.,,,,FALSE,Sequencing,,https://dataservice.datacommons.cancer.gov/#/resources,str
Checksum,MD5 checksum of the BAM file,,,,TRUE,Information Content Entity,,,
HTAN Data File ID,Self-identifier for this data file - HTAN ID of this file HTAN ID SOP (eg HTANx_yyy_zzz),,,,TRUE,File,,https://docs.google.com/document/d/1podtPP8L1UNvVxx9_c_szlDcU1f8n7bige6XA_GoRVM/edit?usp=sharing,regex match ^(HTA([1-9]|1[0-6]))_((EXT)?([0-9]\d*|0000))_([0-9]\d*|0000)$ warning
HTAN Participant ID,HTAN ID associated with a patient based on HTAN ID SOP (eg HTANx_yyy ),,,,TRUE,Patient,,https://docs.google.com/document/d/1podtPP8L1UNvVxx9_c_szlDcU1f8n7bige6XA_GoRVM/edit?usp=sharing,regex match ^(HTA([1-9]|1[0-6]))_((EXT)?([0-9]\d*|0000))$ warning
Expand Down Expand Up @@ -928,8 +937,9 @@ Other Data Type Specified,Other types of data associated with the content.,,,,FA
Supporting Link,Relevant external links associated with the content (e.g external datasets used for validation). Please note: Supporting Links and Supporting Link Descriptions are provided by authors and are not verified by the NIH NCI or the HTAN DCC. This information and any linked data should only be shared by an authorized individual(s) in accordance with the terms of the HTAN data sharing agreements and policies and/or any other applicable agreement(s). Validated as URL,,,,FALSE,Publication,,,url warning
Supporting Link Description,Description of relevant external links associated with the publication (e.g An external mouse dataset used for validation). Please note: Supporting Links and Supporting Link Descriptions are provided by authors and are not verified by the NIH NCI or the HTAN DCC. This information and any linked data should only be shared by an authorized individual(s) in accordance with the terms of the HTAN data sharing agreements and policies and-or any other applicable agreement(s).,,,,FALSE,Publication,,,
Tool,Were any software or computational tools generated for this content,"Yes, No",,,TRUE,Publication,,,
Accessory Data Type,Accesory specific data type,,,,FALSE,,,,
Accessory,An empty parent attribute for accessory ,,,,FALSE,,,,
Accessory Manifest,Accessory specific attributes,,"Component,Dataset Name,Accessory Synapse ID,Accessory Description,Data Type,HTAN Center ID,HTAN Parent Biospecimen ID,Accessory-associated HTAN Parent Data File ID",,FALSE,Accessory,,,
Accessory Manifest,Accessory specific attributes,,"Component,Dataset Name,Accessory Synapse ID,Accessory Description, Accessory Data Type,HTAN Center ID,HTAN Parent Biospecimen ID,Accessory-associated HTAN Parent Data File ID",,FALSE,Accessory,,,
Dataset Name,Name of a dataset (e.g. a Synapse folder),,,,TRUE,Accessory,,,
Accessory Synapse ID,Synapse ID of folder containing accessory files,,,,TRUE,Accessory,,,regex match syn\d+
Accessory Description,Free text field containing description of accessory file(s),,,,TRUE,Accessory,,,
Expand Down Expand Up @@ -1051,4 +1061,4 @@ Days to Vital Status Reference,Number of days between the date used for index an
Precancer Case,Yes/No indicator to designate the participant for whom precancerous lesion(s) was identified (premalignancy only).,"Yes - Precancer Case, No, Not Reported, unknown",,,TRUE,Patient,,,
Yes - Precancer Case,Indicates that the participant is a precancer case,,"Precancerous Condition Type, Days to Precancer Case Designation, WHO Precursor Lesion Code",,FALSE,Patient,,,
Days to Precancer Case Designation,Number of days between the date used for index and the reference date for designation of precancer status.,,,,FALSE,Patient,,,int
WHO Precursor Lesion Code,"World Health Organization Classification of Tumour cytopathology-based coding system, includes 'precursor lesion' designations for precancers. ICD-O-3 morphology axis format eg 1234/1",,,,FALSE,Patient,,,regex match ^\d{4}\/[0-3]$
WHO Precursor Lesion Code,"World Health Organization Classification of Tumour cytopathology-based coding system, includes 'precursor lesion' designations for precancers. ICD-O-3 morphology axis format eg 1234/1",,,,FALSE,Patient,,,regex match ^\d{4}\/[0-3]$
Loading

0 comments on commit 1dbb1b6

Please sign in to comment.