A robust workflow to benchmark deconvolution of multi-omic data

This repository contains all the files needed to perform deconvolution on example datasets, along with the ranking and the figures of the paper. Our pipeline can be easily extended to include and evaluate novel methods, as well as other datasets.

---
title: Pipeline summary
config:
  look: handDrawn
---
flowchart LR
    a1("Reference profiles"):::red --"simulations"--> b1("In silico data"):::red
    subgraph Deconvolution["Deconvolution methods"]
       direction TB
       c1("Proportions"):::blue ~~~ c2("Time"):::blue
    end
    b1 --> Deconvolution
    b2("In vitro data"):::red --> Deconvolution
    b3("In vivo data"):::red --> Deconvolution
    subgraph Scoring["Scoring metrics"]
       direction TB
       d1("Ranks"):::green ~~~ d2("Figures"):::green
    end
    Deconvolution --"scoring"--> Scoring
    classDef red stroke:#f00
    classDef blue stroke:#00f
    classDef green stroke:#0f0

How to make in silico data: folder data

Shortly, we made in silico data using reference profiles of pure cell types from different tissues convoluted with proportions generated based on a Dirichlet distribution. The scripts are in simulation_scripts, and the reference profiles can be downloaded from Zenodo (DOI 10.5281/zenodo.14024479).

For example, to generate the simulations for the BlCL dataset, simply run:

cd data/simulation_scripts
Rscript generate_simu_BlCL.R

The folder data has the following architecture:

.
├── simulation_scripts
│   ├── generate_simu_DATA1.R
│   └── generate_simu_DATA2.R
├── references
│   ├── DATA1.rds
│   ├── DATA2_dnam.rds
│   └── DATA2_rna.rds
├── simulations
│   ├── dnam
│   │   ├── 240101_DATA1_sim01.rds
│   │   ├── 240101_DATA1_sim02.rds
│   │   ├── 240101_DATA2_sim01.rds
│   │   └── 240101_DATA2_sim02.rds
│   ├── rna
│   │   ├── 240101_DATA2_sim01.rds
│   │   └── 240101_DATA2_sim02.rds
...

This is also where we store the in vitro and in vivo datasets: data/invitro/DATA_A.rds for the proportion matrix and data/invitro/DATA_D_OMIC.rds for the methylation/expression matrix. Please refer to the table here for the instructions on where to download in vitro and in vivo data.

How to run the deconvolution methods: folder deconvolution

This pipeline uses an Apptainer container and Snakemake. Instructions on how to use these tools can be found here and here.

Briefly, this folder contains the scripts to perform the deconvolution pipeline. There is one script per setting (class of the method/omic type). Those scripts can be modified to include new methods and/or new datasets (cf README).

To build the container and run this pipeline, and after creating a virtual environment YOUR_VENV with snakemake installed on it:

cd deconvolution
sudo apptainer build container2.sif container2.def
mkdir results
mkdir results/prediction
mkdir results/prediction/dnam
mkdir results/prediction/dnam/sup
mkdir results/prediction/dnam/unsup
mkdir results/prediction/rna
mkdir results/prediction/rna/sup
mkdir results/prediction/rna/unsup
mkdir results/timing
mkdir results/timing/dnam
mkdir results/timing/dnam/sup
mkdir results/timing/dnam/unsup
mkdir results/timing/rna
mkdir results/timing/rna/sup
mkdir results/timing/rna/unsup
conda activate YOUR_VENV
snakemake --latency-wait 60 --cores 1 --jobs 50

Snakemake will run all methods for all omics. The Snakefile is self-explanatory and can be modified to include new methods/datasets. In general, you can refer to the README to know how to test new methods/datasets.

Results of the deconvolution, i.e. estimation of the proportion matrix along with elapsed time will be stored in deconvolution/results/prediction/OMIC/CLASS/ for the proportion matrix and deconvolution/results/timing/OMIC/CLASS/ for the time elapsed with the syntax 240101_DATA1_Apred_FS_METHOD_sim01.rds / 240101_DATA1_timing_FS_METHOD_sim01.rds (FS being the feature selection strategy and METHOD the deconvolution algorithm).

How to do the ranking and reproduce the figures of the paper: folder ranking_figures

(a) First, you can compute the different metrics (in our case, RMSE, MAE and Pearson correlation coefficients): just run the script compute_scores.R and the scores will be stored in compute_metrics/scores/: one file for the time (..._time.rds) and one file for the other metrics (..._scores.rds)

To compute those metrics, simply run:

cd ranking_figures/compute_metrics
Rscript compute_scores.R

(b) The different figures of the paper can then be reproduced. For example, and after modifying the folder variable to folder='figure6'in the script figure6.R, Figure 6 can be done by running:

cd ranking_figures/figures/main_figs
mkdir figure6
Rscript figure6.R

Session info

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 14.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] fmsb_0.7.6                             
 [2] tibble_3.2.1                           
 [3] funkyheatmap_0.5.0                     
 [4] ggtext_0.1.2                           
 [5] ComplexHeatmap_2.12.1                  
 [6] circlize_0.4.15                        
 [7] ggpubr_0.6.0                           
 [8] RnBeads_2.14.0                         
 [9] plyr_1.8.9                             
[10] methylumi_2.42.0                       
[11] minfi_1.42.0                           
[12] bumphunter_1.38.0                      
[13] locfit_1.5-9.8                         
[14] iterators_1.0.14                       
[15] Biostrings_2.66.0                      
[16] XVector_0.38.0                         
[17] SummarizedExperiment_1.28.0            
[18] MatrixGenerics_1.10.0                  
[19] FDb.InfiniumMethylation.hg19_2.2.0     
[20] org.Hs.eg.db_3.15.0                    
[21] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[22] GenomicFeatures_1.48.4                 
[23] AnnotationDbi_1.58.0                   
[24] reshape2_1.4.4                         
[25] scales_1.3.0                           
[26] illuminaio_0.38.0                      
[27] limma_3.54.2                           
[28] gridExtra_2.3                          
[29] fields_15.2                            
[30] viridisLite_0.4.2                      
[31] spam_2.10-0                            
[32] ff_4.0.9                               
[33] bit_4.0.5                              
[34] MASS_7.3-60                            
[35] GenomicRanges_1.50.2                   
[36] GenomeInfoDb_1.34.9                    
[37] IRanges_2.32.0                         
[38] S4Vectors_0.36.2                       
[39] RUnit_0.4.32                           
[40] gplots_3.1.3                           
[41] gtools_3.9.5                           
[42] pracma_2.4.4                           
[43] Rcpp_1.0.12                            
[44] cluster_2.1.4                          
[45] rngtools_1.5.2                         
[46] registry_0.5-1                         
[47] Biobase_2.58.0                         
[48] BiocGenerics_0.44.0                    
[49] foreach_1.5.2                          
[50] see_0.8.1                              
[51] ggplot2_3.5.0                          
[52] tidyr_1.3.1                            
[53] matrixStats_1.2.0                      
[54] dplyr_1.1.4                            

loaded via a namespace (and not attached):
  [1] pander_0.6.5              pbapply_1.7-2             lattice_0.21-9           
  [4] rJava_1.0-6               fontquiver_0.2.1          vctrs_0.6.5              
  [7] fastICA_1.2-3             beanplot_1.3.1            blob_1.2.4               
 [10] survival_3.5-7            dynutils_1.0.11           spatstat.data_3.0-3      
 [13] later_1.3.2               DBI_1.1.3                 gfonts_0.2.0             
 [16] rappdirs_0.3.3            uwot_0.1.16               zlibbioc_1.44.0          
 [19] MatrixModels_0.5-3        htmlwidgets_1.6.4         GlobalOptions_0.1.2      
 [22] future_1.33.2             FARDEEP_1.0.1             leiden_0.4.3.1           
 [25] irlba_2.3.5.1             readr_2.1.5               KernSmooth_2.23-22       
 [28] promises_1.3.0            DelayedArray_0.24.0       locfdr_1.1-8             
 [31] RcppParallel_5.1.7        RSpectra_0.16-1           fs_1.6.3                 
 [34] textshaping_0.3.7         digest_0.6.35             png_0.1-8                
 [37] nor1mix_1.3-2             sctransform_0.4.1         cowplot_1.1.3            
 [40] glmnet_4.1-8              crul_1.4.0                pkgconfig_2.0.3          
 [43] gridBase_0.4-7            spatstat.random_3.2-2     DelayedMatrixStats_1.20.0
 [46] nnls_1.5                  reticulate_1.34.0         GetoptLong_1.0.5         
 [49] xfun_0.43                 zoo_1.8-12                tidyselect_1.2.1         
 [52] purrr_1.0.2               granulator_1.4.0          ica_1.0-3                
 [55] rtracklayer_1.56.1        rlang_1.1.3               RefFreeEWAS_2.2          
 [58] glue_1.7.0                gdtools_0.3.4             RColorBrewer_1.1-3       
 [61] deconica_0.1.1            stringr_1.5.1             ggsignif_0.6.4           
 [64] GGally_2.2.0              SparseM_1.81              fontLiberation_0.1.0     
 [67] httpuv_1.6.15             harmony_1.2.0             class_7.3-22             
 [70] preprocessCore_1.58.0     corpcor_1.6.10            annotate_1.74.0          
 [73] jsonlite_1.8.8            fontBitstreamVera_0.1.1   mime_0.12                
 [76] systemfonts_1.0.6         Rsamtools_2.12.0          stringi_1.8.3            
 [79] spatstat.sparse_3.0-3     epiR_2.0.66               scattermore_1.2          
 [82] spatstat.explore_3.2-5    rbibutils_2.2.15          yulab.utils_0.1.0        
 [85] quadprog_1.5-8            bitops_1.0-7              cli_3.6.2                
 [88] Rdpack_2.6                rhdf5filters_1.10.1       maps_3.4.1               
 [91] RSQLite_2.3.1             pheatmap_1.0.12           data.table_1.14.10       
 [94] timechange_0.2.0          officer_0.6.3             rstudioapi_0.16.0        
 [97] units_0.8-4               GenomicAlignments_1.32.1  nlme_3.1-163             
[100] listenv_0.9.1             lpSolve_5.6.19            miniUI_0.1.1.1           
[103] gridGraphics_0.5-1        httpcode_0.3.0            dbplyr_2.4.0             
[106] lifecycle_1.0.4           munsell_0.5.1             proxyC_0.3.4             
[109] caTools_1.18.2            codetools_0.2-19          EpiDISH_2.12.0           
[112] lmtest_0.9-40             ggpp_0.5.5                xtable_1.8-4             
[115] ROCR_1.0-11               ggpmisc_0.5.5             BiocManager_1.30.22      
[118] classInt_0.4-10           abind_1.4-5               farver_2.1.1             
[121] parallelly_1.37.1         RANN_2.6.1                askpass_1.2.0            
[124] SeuratObject_5.0.1        BiocIO_1.6.0              GEOquery_2.64.2          
[127] RcppAnnoy_0.0.21          goftest_1.2-3             patchwork_1.2.0          
[130] future.apply_1.11.0       Seurat_5.0.1              Matrix_1.6-4             
[133] prettyunits_1.2.0         lubridate_1.9.3           ggridges_0.5.6           
[136] mclust_6.0.0              flextable_0.9.4           TOAST_1.10.1             
[139] igraph_1.6.0              multtest_2.54.0           PREDE_1.2.1              
[142] remotes_2.4.2.1           DeconRNASeq_1.38.0        limSolve_1.5.7           
[145] CDSeq_1.0.9               spatstat.utils_3.0-4      htmltools_0.5.8.1        
[148] BiocFileCache_2.6.1       yaml_2.3.8                NMF_0.26                 
[151] utf8_1.2.4                plotly_4.10.3             XML_3.99-0.14            
[154] e1071_1.7-13              withr_3.0.0               fitdistrplus_1.1-11      
[157] BiocParallel_1.32.6       bit64_4.0.5               BiasedUrn_2.0.11         
[160] doRNG_1.8.6               progressr_0.14.0          ggstats_0.5.1            
[163] ragg_1.2.6                memoise_2.0.1             evaluate_0.23            
[166] RcppThread_2.1.6          tzdb_0.4.0                curl_5.2.1               
[169] fansi_1.0.6               fastDummies_1.7.3         tensor_1.5               
[172] polynom_1.4-1             cachem_1.0.8              desc_1.4.3               
[175] dtangle_2.0.9             deldir_2.0-2              rjson_0.2.21             
[178] rstatix_0.7.2             ggrepel_0.9.4             clue_0.3-65              
[181] tools_4.2.0               EDec_0.9                  magrittr_2.0.3           
[184] RCurl_1.98-1.12           proxy_0.4-27              car_3.1-2                
[187] ggplotify_0.1.2           xml2_1.3.6                httr_1.4.7               
[190] dirmult_0.1.3-5           assertthat_0.2.1          rmarkdown_2.26           
[193] globals_0.16.3            R6_2.5.1                  Rhdf5lib_1.20.0          
[196] RcppHNSW_0.5.0            progress_1.2.3            genefilter_1.78.0        
[199] KEGGREST_1.36.3           shape_1.4.6               HDF5Array_1.26.0         
[202] sf_1.0-14                 rhdf5_2.42.1              splines_4.2.0            
[205] carData_3.0-5             colorspace_2.1-0          generics_0.1.3           
[208] gridtext_0.1.5            pillar_1.9.0              tweenr_2.0.2             
[211] sp_2.1-2                  uuid_1.1-1                GenomeInfoDbData_1.2.9   
[214] dotCall64_1.1-1           gtable_0.3.4              zip_2.3.1                
[217] restfulr_0.0.15           knitr_1.46                biomaRt_2.52.0           
[220] fastmap_1.1.1             doParallel_1.0.17         quantreg_5.97            
[223] broom_1.0.5               openssl_2.1.1             filelock_1.0.2           
[226] backports_1.4.1           base64_2.0.1              hms_1.1.3                
[229] ggforce_0.4.1             scrime_1.3.5              Rtsne_0.17               
[232] shiny_1.8.1.1             polyclip_1.10-6           siggenes_1.70.0          
[235] lazyeval_0.2.2            crayon_1.5.2              sparseMatrixStats_1.10.0 
[238] viridis_0.6.4             reshape_0.8.9             debCAM_1.14.0            
[241] compiler_4.2.0            spatstat.geom_3.2-7

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
data		data
deconvolution		deconvolution
ranking_figures		ranking_figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A robust workflow to benchmark deconvolution of multi-omic data

How to make in silico data: folder data

How to run the deconvolution methods: folder deconvolution

How to do the ranking and reproduce the figures of the paper: folder ranking_figures

Session info

About

Releases

Packages

Languages

License

bcm-uga/DeconvBenchmark

Folders and files

Latest commit

History

Repository files navigation

A robust workflow to benchmark deconvolution of multi-omic data

How to make in silico data: folder data

How to run the deconvolution methods: folder deconvolution

How to do the ranking and reproduce the figures of the paper: folder ranking_figures

Session info

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages