Skip to content

Latest commit

 

History

History

02-synthetic

Synthetic datasets

Each synthetic dataset is based on some characteristics of some real datasets. These characteristics include:

  • The number of cells and features
  • The number of features which are differentially expressed in the trajectory
  • Estimates of the distribution of the library sizes, average expression, dropout probabilities, … estimated by Splatter.

Here we estimate the parameters of these “platforms” and use them to simulate datasets using different simulators. Each simulation script first creates a design dataframe, which links particular platforms, different topologies, seeds and other parameters specific for a simulator.

The data is then simulated using wrappers around the simulators (see /package/R/simulators.R), so that they all return datasets in a format consistent with dynwrap.

# script/folder description
1 📄estimate_platform.R Estimation of the platforms from real data done by dynbenchmark::estimate_platform
2a 📄simulate_dyngen_datasets.R dyngen, simulations of regulatory networks which will produce a particular trajectory
2b 📄simulate_prosstt_datasets.R PROSSTT, expression is sampled from a linear model which depends on pseudotime
2c 📄simulate_splatter_datasets.R Splatter, simulations of non-linear paths between different states
2d 📄simulate_dyntoy_datasets.R dyntoy, simulations of toy data using random expression gradients in a reduced space
3 📄gather_metadata.R Gathers some metadata about all the synthetic datasets
4 📄dyngen_samplers_table.R