You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using UMI as they key for the banned dictionary would make it so it can only track one duplication per UMI. There will be millions of lines so the UMIs will be used multiple times. In addition, be sure you have the RAM space for a dictionary as large as you will need.
How will you be finding the 5' leftmost of read? How do we know if there is soft clipping? What about reverse strands?
What is the purpose of the start position dictionary? This will have an entry for almost every line, which will be very memory intensive.
It seems like you are only comparing UMIs after filtering out all reverse strands. You also need to compare start locations. You are using a dictionary again for the UMIs, remember that we are not supposed to be holding records in memory.
Overall, I think you need to rethink your data structure, and be sure you are accounting for all variables of uniqueness. Be sure to also account for reverse strands and not just keep the forward ones. Organising into functions may help. I also would add some more lines to your test file. Be sure you consider every possible way the program could error and put examples in the file.
The text was updated successfully, but these errors were encountered:
Using UMI as they key for the banned dictionary would make it so it can only track one duplication per UMI. There will be millions of lines so the UMIs will be used multiple times. In addition, be sure you have the RAM space for a dictionary as large as you will need.
How will you be finding the 5' leftmost of read? How do we know if there is soft clipping? What about reverse strands?
What is the purpose of the start position dictionary? This will have an entry for almost every line, which will be very memory intensive.
It seems like you are only comparing UMIs after filtering out all reverse strands. You also need to compare start locations. You are using a dictionary again for the UMIs, remember that we are not supposed to be holding records in memory.
Overall, I think you need to rethink your data structure, and be sure you are accounting for all variables of uniqueness. Be sure to also account for reverse strands and not just keep the forward ones. Organising into functions may help. I also would add some more lines to your test file. Be sure you consider every possible way the program could error and put examples in the file.
The text was updated successfully, but these errors were encountered: