Skip to content

Latest commit

 

History

History
17 lines (10 loc) · 1.31 KB

README.md

File metadata and controls

17 lines (10 loc) · 1.31 KB

SimDNA

Stable Dev Build Status Coverage

Yes, we can generate synthetic DNA sequence motifs datasets in the following way -- i.i.d. background, and a profile that represents a motif that corresponds to a product multinomial (i.e., PWMs) -- and then plant a realization of that profile at some randomly chosen position for each generated background sequence. But this motif problem is way too easy to tackle. How about we simulate the motif as a mixture of profiles, where each profile may share some identical patterns (i.e., overlaps)? Moreover, what if a motif has a blocked structure such that variable spacings exist between each two adajacent blocks (i.e., gaps)? Maybe let's simulate a mixture of blocked-structured profiles as our ground truth motif? This package creates such patterns.

Basic examples

Coming soon

Undetectable patterns

Coming soon