Files, directories

never ever use space/tabs in your file/directory names
never use Unix special characters in file/directory names (!?"'%&^~*$|/{}<>:)

Samples/tissues

use few letter prefix for the disease

project	prefix
ca. breast	cbr
ca. colon	cco
ca. prostate	cpr
ca. bladder	cbl
Alzheimer	alz
control	ctr
fibroadenoma	fad

keep the patient/person ID unchanged in all the experimental data.
if not created by the in house database system, the patient_id should be: 2 letters researcher (responsible for the patient sample collection) initial followed by 4 numbers.

dk0001
dk9999

pm0543

merge prefix (project) with patient_id using one underscore "_"

alz_dk0123

provide a code for the tissue

tissue	code
blood	BL
primary tumor	PT
unaffected margin	UM
unaffected margin 99	UM99

merge with project code & patient_id using underscore

cbr_ak0125_PT

if not provided with a sample code id from the in house database:

provide the sample number: 0 for the first sample of that type (i.e. the tube #1 of blood used for DNA extraction) 1, 2 etc for the subsequent material preparations from the given patient-sampletype combo

cbr_ak0125_PT_0

cbr_ak0125_PT_1

in case of repeating the experiment, i.e. re-sequencing the same sample, lets add another number. To be discussed

Hello,

currently I started to use the following system:

cbrako_fix034_PT_0_9
cbr  = > ca breast (major disease prefix)
ako =>  specific project / sample source if applicable. Can be xyz for all samples stored here
fix034 => internal to that project patient id
PT => tissue type
_0_9 => biological-replicate_technical-replicate

I also used PT_13_0 for the tumor in the same patient after remission.

Above system is a quick fix for tissue samples stored in tubes labeled with non-consistent patient id's followed by generating ~10k sequencing files with groups of 4-8 with ~identical patient/tissue identifiers (fix034_PT.result,  fix034PT.result, fix034-PT.result,  fix034PT.result, etc).

Since we will get the unique ids for all the tissue samples, we can do:
a) cbrako_fix034_PT_xd2f_9 (where xd2f is some unique string to identify tumor samples from fix034 patient
b) enumerate tumor/blood samples from a given patient 
xd2f 0 
jk5p 1 

keep the  cbrako_fix034_PT_0_1
c) go for an overkill
cbrako_fix034_PT_0_1.xd2f.

where xd2f will be id for DNA/RNA prep.

I am concatenating labels for a various processing steps for each of the data files:
cbrako_fix039_PT_0_2.rna.mrgd_4.clump.r12.star_hg38p13.md.bam

to minimize the chance of screwing things and be able to figure out things in a week or in a year from now. But it is a non-trivial problem.

This is just to keep you posted before the tomorrows meeting/save some time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

naming_rules.md

naming_rules.md

Files, directories

Samples/tissues

Files

naming_rules.md

Latest commit

History

naming_rules.md

File metadata and controls

Files, directories

Samples/tissues