Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: sequencing masking #49

Open
a-h-b opened this issue Jan 9, 2020 · 1 comment
Open

Feature request: sequencing masking #49

a-h-b opened this issue Jan 9, 2020 · 1 comment

Comments

@a-h-b
Copy link

a-h-b commented Jan 9, 2020

I'd like Vizbin to recognize masked sequences, i.e. ignore small letters. This would be useful to ignore e.g. 16S regions or other regions that obscure kmer profiles.

Usually, the user would supply the already masked sequence, but if you're mega cool, you could include a module that recognizes highly conserved/structural regions and does the masking internally.

@claczny
Copy link
Owner

claczny commented Jan 9, 2020

Thx for the suggestion.

A fictious example (real sequences would have to be longer of course):

>seq1
AATTCGATTAGaaaaaaaaaaaaaTGCCAGtctctctc
>seq2
tttttttttACGCGATAGATAGCAATTCCGGTTT

In this example, for seq1, aaaaaaaaaaaaaand tctctctc would have to be ignored and k-mers would only be computed for AATTCGATTAGTGCCAG.
For seq2, ttttttttt would have to be ignored and k-mers would only be computed for ACGCGATAGATAGCAATTCCGGTTT.

  • Implement switch (GUI and command-line) to enable this function
  • Parser module should ignore lower-case letters (i.e., masked subsequence) in sequences if switch is enabled. N.B. This will affect also the size-selection part as sequences might become (potentially much) shorter if masked sequences are ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants