Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Total Reads" misleading, how to get the input read count? #85

Open
gpertea opened this issue Jan 30, 2024 · 2 comments
Open

"Total Reads" misleading, how to get the input read count? #85

gpertea opened this issue Jan 30, 2024 · 2 comments

Comments

@gpertea
Copy link

gpertea commented Jan 30, 2024

Using release version 2.4.2
I've been using rnaseqc for a while on aligment files where I kept secondary alignments and also the unmapped reads. I naively assumed that "Total Reads" metric means what the words do: the total number of reads found in the input data (uniquely counted, of course). If the user did not keep the unaligned reads in the alignment files, like I do, I expect the number there to reflect the total number of aligned reads found in the input alignments.

Only today I noticed the large discrepancy between that value and the reads reported in the HISAT2 alignment summary, so I suppose Total Reads means in fact Total read alignments for rnaseqc. And since in my case I kept the secondary alignments, reads can have multiple mappings, that's why the number I see there is inflated.

OK, my mistake for not checking on such a basic metric before. So now I am looking into the metrics.tsv file for any way to get the total number of reads - but I am not able to find it. I see "End 1 Bases" and "End 2 Bases" and "End 1 Mapped Reads" etc. but not "End 1 Reads" (I cannot use End 1/2 Bases and divide by read length to infer the number of reads because the reads have various lengths due to trimming.

The only number that seems to match the total number of reads is surprisingly found in this metric:
Unique Mapping, Vendor QC Passed Reads
But that label is then incorrect because many of the reads are not "uniquely mapping" - and surely did not expect that metric to be the only place to get the number of input reads.

Is there any way to just get the number of reads (not read alignments) in the input alignment data?

@francois-a
Copy link
Collaborator

Hi, thanks for reporting this. The latest commit (a6b85ef) fixes this, returning both total alignments and total reads.

@clariB
Copy link

clariB commented Apr 24, 2024

Can these changes be included in a downloadable static executable? I cloned the repository and the usage doesn't match what's in the readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants