-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding weight distribution #27
Comments
Hi @Nirmal2310 - You have understood correctly! The ratio in this case is between Species, so 6264404 / (6264404 + 5227293) for NC_002516.2 to 5227293 / (6264404 + 5227293) for NC_003997.3. If you are producing R9 data it would absolutely work to just alter the distributions.json weights, you could even just use 1,2,3,4,5 etc. If you are producing R10 data, you could instead list this in the Simulation Profile Toml, where each bacteria is a sample, and the weight is given underneath each sample table. |
Hey @Adoni5, thank you so much for the reply. I will try this approach and get back to you if any problem occurs. |
Dear @Adoni5, output_path = "/DATA2/zymo_enrichment"
global_mean_read_length = 5000 #optional
random_seed = 10
target_yield = 100000000
working_pore_percent = 90 # optional (default 85)
pore_type = "R10" #Optional, one of R10 or R9, default R9
[parameters]
sample_name = "Zymobiomics"
experiment_name = "Zymobiomics_Normal"
flowcell_name = "AGE401"
experiment_duration = 4800 # unused currently
device_id = "MN37483"
position = "Bentasaurus"
break_read_ms = 400 # optional,, default 400
[[sample]]
name = "Pseudomonas aeruginosa"
input_genome = "/DATA2/zymo_enrichment/Genomes/Pseudomonas_aeruginosa_complete_genome.fasta" # Path to (directory of) FASTA file(s)
mean_read_length = 5000.0
weight = 12
uneven = false # Optional
[[sample]]
name = "Escherichia coli"
input_genome = "/DATA2/zymo_enrichment/Genomes/Escherichia_coli_complete_genome.fasta"
mean_read_length = 5000
weight = 12
uneven = false # Optional
[[sample]]
name = "Salmonella enterica"
input_genome = "/DATA2/zymo_enrichment/Genomes/Salmonella_enterica_complete_genome.fasta"
mean_read_length = 5000
weight = 12
uneven = false # Optional
[[sample]]
name = "Lactobacillus fermentum"
input_genome = "/DATA2/zymo_enrichment/Genomes/Lactobacillus_fermentum_complete_genome.fasta"
mean_read_length = 5000
weight = 12
uneven = false # Optional
[[sample]]
name = "Enterococcus faecalis"
input_genome = "/DATA2/zymo_enrichment/Genomes/Enterococcus_faecalis_complete_genome.fasta"
mean_read_length = 5000
weight = 12
uneven = false # Optional
[[sample]]
name = "Staphylococcus aureus"
input_genome = "/DATA2/zymo_enrichment/Genomes/Staphylococcus_aureus_complete_genome.fasta"
mean_read_length = 5000
weight = 12
uneven = false # Optional
[[sample]]
name = "Listeria monocytogenes"
input_genome = "/DATA2/zymo_enrichment/Genomes/Listeria_monocytogenes_complete_genome.fasta"
mean_read_length = 5000
weight = 12
uneven = false # Optional
[[sample]]
name = "Bacillus subtilis"
input_genome = "/DATA2/zymo_enrichment/Genomes/Bacillus_subtilis_complete_genome.fasta"
mean_read_length = 5000
weight = 12
uneven = false # Optional
[[sample]]
name = "Saccharomyces cerevisiae"
input_genome = "/DATA2/zymo_enrichment/Genomes/Saccharomyces_cerevisiae_complete_genome.fasta"
mean_read_length = 5000
weight = 2
uneven = false # Optional
[[sample]]
name = "Cryptococcus neoformans"
input_genome = "/DATA2/zymo_enrichment/Genomes/Cryptococcus_neoformans_complete_genome.fasta"
mean_read_length = 5000
weight = 2
uneven = false # Optional Now, when I run Icarust for 6 hours using this tool file, it only gives 5 fast5 files, each having 4000 reads per file. When comparing it to the normal run, the difference in the throughput is very high. Thank you so much for helping me out. |
Hey @Adoni5, I hope you are doing well. I have a basic question regarding weight distribution. Please forgive me if I understand it incorrectly.
As you mentioned in the Readme that it basically gives the likelihood of taking the read from the given target genome.
The distribution.json file that you added to the repo looks like this:
{"weights": [6264404, 5227293], "names": ["NC_002516.2", "NC_003997.3"]}
I am assuming that since the ratio is ~1.2, if I generate 1,000,000 bp, 454,546 bp will be from NC_003997.3 and 5454,54 bp from NC_002516.2. Please let me know, if I understand correctly.
Secondly, suppose I want to create a mock community like the zymobiomic gut community for which I know the concentration distribution across multiple species. How should I go about simulating this community through Icarust? One idea I have is to create a distribution.json file and add ratios of different genomes.
Can you tell me is it the right approach? If not, please help me how to go about it.
The text was updated successfully, but these errors were encountered: