-
Notifications
You must be signed in to change notification settings - Fork 63
Data Format
Upload CSV (comma-separated value) or tab-delimited text files containing gene expression data. For RNA-seq data, gene-level read count data is recommended, as differentially expressed genes (DEGs) can be identified based on statistical modeling of counts using DESeq2. FPKM, RPKM or other types of normalized expression data also accepted. File size limit is 50Mb.
Each row represents expression data of a gene. The first row is treated as sample names. Each column contains data on a sample with the first column as gene IDs. You do not need to log-transform your data before uploading to iDEP. The first few lines of a typical read-count file look like this:
WT_Rep1 WT_Rep2 hoxa1_Rep1 hoxa1_Rep2
ENSG00000198888 17528 23007 24418 29152
ENSG00000198763 21264 26720 28878 32416
ENSG00000198804 130975 151207 178130 196727
ENSG00000198712 49769 61906 66478 69758
ENSG00000228253 9304 11160 12608 13041
iDEP can convert most types of common gene IDs to Ensembl gene IDs, which is used internally for enrichment analyses. It also uses gene IDs to guess what organism your data is derived from. See a list of plant and animal species covered by the database here. If your gene IDs does not match anything in the database, you can still use the website to do clustering analysis and identify DEGs.
If you are working on a species covered by iDEP, but your ID is not recognized, you can send us a file that maps your IDs to Ensembl gene IDs. We will incorporate it into the system, and you should be able to use the website after a few days.
Name columns carefully as iDEP parses column names to define sample groups.
Replicates should be denoted by “_Rep1”, “_Rep2”, “_Rep3”, and so on, at the end. For example, Control_Rep1, Control_Rep2, TreatmentA_Rep1, TreatmentA_Rep2, TreatmentB_Rep1, TreatmentB_Rep2. The first part defines sample groups that form the basis for differential expression analysis. You can have more than two groups. All pair-wise comparisons are listed and analyzed. Also, avoid using a hyphen “-” or a dot “.” in sample names. It affects the parsing of sample names.
For factorial design, use underscore “_” to separate factors such as genetic background and treatment. For example, WT_control_Rep1, WT_control_Rep2, WT_Treatment_Rep1, WT_treatment_Rep2, Mu_control_Rep1, Mu_control_Rep2, Mu_Treatment_Rep1, Mu_treatment_Rep2. This will define a 2×2 factorial design: