Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverting the simulated genomes #25

Open
SimiliSerpent opened this issue Aug 9, 2024 · 2 comments
Open

Inverting the simulated genomes #25

SimiliSerpent opened this issue Aug 9, 2024 · 2 comments

Comments

@SimiliSerpent
Copy link

Hi Rory,

I am simulating SARS-CoV-2 diluted in a bacterial environment. My configuration file looks as follows :

output_path = "$SIM_DIR/pod5_files"
target_yield = $TARGET_YIELD
pore_type = "R10"
nucleotide_type = "DNA"

[parameters]
sample_name = "test"
experiment_name = "sim_$SIMULATION_ID"
flowcell_name = "FAQ1234"
experiment_duration_set = 10240000
device_id = "Bantersaurus"
position = "FenceSitter"
sample_rate = 5000

[[sample]]
name = "NC_045512"
input_genome = "$SIM_DIR/ref/${SIM_VIRUS_REF}.fasta"
mean_read_length = $SIM_VIRUS_LEN
weight = $SIM_VIRUS_W
amplicon = false

[[sample]]
name = "U00096_3"
input_genome = "$SIM_DIR/ref/${SIM_NOISE_REF}.fasta"
mean_read_length = $SIM_NOISE_LEN
weight = $SIM_NOISE_W
amplicon = false

For instance, let's say w = 1 for virus and w = 150 for bacteria. However, sometimes the weights for virus and bacteria are inverted by Icarust. I see it because I selectively filter out all DNA different from the COVID19 DNA with Readfish; sometimes, almost no reads are filtered. I check after the run, and indeed find that Icarust only generated 1/151 bacterial reads and 150/151 viral reads.

If I restart the simulation without changing anything, everything works fine! So it is not that big a deal (I have to monitor the start of each simulation, and restart if necessary). But it is a bit worrying and definitely an unexpected behavior. It happens randomly, and I witnessed the issue in different simulation environment (different lab clusters).

Do you have any clues why that is? Does chance intervene at some point in the choice of the weights?

I hope you are doing well and thank you for your help.
Sincerely
Ben

@Adoni5
Copy link
Contributor

Adoni5 commented Aug 20, 2024

hey @SimiliSerpent, sorry for the slow reply, I've just been on holiday for two weeks!
This shouldn't be happening, no doubt. It definitely seems like they're being switched in the code, i assume they are hardcoded in the actual TOML file?

Rory

@SimiliSerpent
Copy link
Author

No worries, I was busy watching the olympics and paralympics! Well, it is my turn to say sorry for the delay.
Yes, the simulated genomes are hardcoded in the TOML file. I will look deeper in the source code if I have a chance, and repost here if I find anything. If any other Icarust user encounters the same issue, please let me know!

Best
Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants