Handling `N`'s in the input files #69

jakebiesinger · 2013-11-21T19:18:38Z

There are going to be N characters in the input file. I think the proper way to handle these would be NOT to include any kmers containing N. As it is now, we throw away the entire read if any of the characters are N.

When we store the read, we could store the entire sequence, N and all, but that would mess up our 4-letter, 2-bit representation. For simplicity, I guess we could throw those reads away from the the ReadHead. But I still think the other non-N kmers should be included in the graph.

The text was updated successfully, but these errors were encountered:

anbangx · 2013-11-21T19:47:11Z

What do you mean N characters? Could you give us an example?

On Thu, Nov 21, 2013 at 11:18 AM, Jake Biesinger
[email protected]:

There are going to be N characters in the input file. I think the proper
way to handle these would be NOT to include any kmers containing N. As it
is now, we throw away the entire read if any of the characters are N.

When we store the read, we could store the entire sequence, N and all, but
that would mess up our 4-letter, 2-bit representation. For simplicity, I
guess we could throw those reads away from the the ReadHead. But I still
think the other non-N kmers should be included in the graph.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/69
.

Best Regards,

Anbang Xu

jakebiesinger · 2013-11-21T19:57:34Z

Sure. ATAGCTGACTGNNNACTGATCG could be a valid input. We should include all kmers from this sequence that don't include the N's.

jakebiesinger mentioned this issue Nov 21, 2013

change the partial members of genomix-data, genomix-hyracks for adapting ray-style #60

Closed

jakebiesinger mentioned this issue Nov 22, 2013

optimizations from profiling build_hyracks #68

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling `N`'s in the input files #69

Handling `N`'s in the input files #69

jakebiesinger commented Nov 21, 2013

anbangx commented Nov 21, 2013

jakebiesinger commented Nov 21, 2013

Handling N's in the input files #69

Handling N's in the input files #69

Comments

jakebiesinger commented Nov 21, 2013

anbangx commented Nov 21, 2013

jakebiesinger commented Nov 21, 2013

Handling `N`'s in the input files #69

Handling `N`'s in the input files #69