Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling N's in the input files #69

Open
jakebiesinger opened this issue Nov 21, 2013 · 2 comments
Open

Handling N's in the input files #69

jakebiesinger opened this issue Nov 21, 2013 · 2 comments
Labels

Comments

@jakebiesinger
Copy link
Contributor

There are going to be N characters in the input file. I think the proper way to handle these would be NOT to include any kmers containing N. As it is now, we throw away the entire read if any of the characters are N.

When we store the read, we could store the entire sequence, N and all, but that would mess up our 4-letter, 2-bit representation. For simplicity, I guess we could throw those reads away from the the ReadHead. But I still think the other non-N kmers should be included in the graph.

@anbangx
Copy link
Collaborator

anbangx commented Nov 21, 2013

What do you mean N characters? Could you give us an example?

On Thu, Nov 21, 2013 at 11:18 AM, Jake Biesinger
[email protected]:

There are going to be N characters in the input file. I think the proper
way to handle these would be NOT to include any kmers containing N. As it
is now, we throw away the entire read if any of the characters are N.

When we store the read, we could store the entire sequence, N and all, but
that would mess up our 4-letter, 2-bit representation. For simplicity, I
guess we could throw those reads away from the the ReadHead. But I still
think the other non-N kmers should be included in the graph.


Reply to this email directly or view it on GitHubhttps://github.com//issues/69
.

Best Regards,

Anbang Xu

@jakebiesinger
Copy link
Contributor Author

Sure. ATAGCTGACTGNNNACTGATCG could be a valid input. We should include all kmers from this sequence that don't include the N's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants