Skip to content

Data Format

Kevin S edited this page Aug 31, 2017 · 1 revision

File format

Upload CSV (comma-separated value) or tab-delimited text files containing gene expression data. For RNA-seq data, gene-level read count data is recommended, as differentially expressed genes (DEGs) can be identified based on statistical modeling of counts using DESeq2. FPKM, RPKM or other types of normalized expression data also accepted. File size limit is 50Mb.

Each row represents expression data of a gene. The first row is treated as sample names. Each column contains data on a sample with the first column as gene IDs. You do not need to log-transform your data before uploading to iDEP. The first few lines of a typical read-count file look like this:

	        WT_Rep1	WT_Rep2	hoxa1_Rep1	hoxa1_Rep2
ENSG00000198888	17528	23007	24418	        29152
ENSG00000198763	21264	26720	28878	        32416
ENSG00000198804	130975	151207	178130	        196727
ENSG00000198712	49769	61906	66478	        69758
ENSG00000228253	9304	11160	12608	        13041

Gene IDs

iDEP can convert most types of common gene IDs to Ensembl gene IDs, which is used internally for enrichment analyses. It also uses gene IDs to guess what organism your data is derived from. See a list of plant and animal species covered by the database here. If your gene IDs does not match anything in the database, you can still use the website to do clustering analysis and identify DEGs.

If you are working on a species covered by iDEP, but your ID is not recognized, you can send us a file that maps your IDs to Ensembl gene IDs. We will incorporate it into the system, and you should be able to use the website after a few days.

Sample Names

Name columns carefully as iDEP parses column names to define sample groups.

Replicates should be denoted by “_Rep1”, “_Rep2”, “_Rep3”, and so on, at the end. For example, Control_Rep1, Control_Rep2, TreatmentA_Rep1, TreatmentA_Rep2, TreatmentB_Rep1, TreatmentB_Rep2. The first part defines sample groups that form the basis for differential expression analysis. You can have more than two groups. All pair-wise comparisons are listed and analyzed. Also, avoid using a hyphen “-” or a dot “.” in sample names. It affects the parsing of sample names.

For factorial design, use underscore “_” to separate factors such as genetic background and treatment. For example, WT_control_Rep1, WT_control_Rep2, WT_Treatment_Rep1, WT_treatment_Rep2, Mu_control_Rep1, Mu_control_Rep2, Mu_Treatment_Rep1, Mu_treatment_Rep2. This will define a 2×2 factorial design:

Clone this wiki locally