-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p-site offsets and metagene profiles #122
Comments
The output indicates relative offsets between different read lengths with 33nt reads treated as the base. Ribotricer tries to choose relative offsets for merging profiles of different read lengths so to maximize the cross-correlation between them, You can override this behavior by providing custom offsets using the Line 149 in af3bc3c
I am not quite sure why there is bleedthrough over the stop codon - but can likely be arising from annotation issues in the GTF. Can you provide what annotation is this based on (Genome build and GTF version). In our benchmarking, we find our strategy to have both higher sensitivity and specificity, over multiple species. |
Hi Saket, This was generated from human data. Genome used: Here's how I ran
Thanks! |
Thanks, looks correct to me. Actually, it is hard to compare against ribORFs profile without extending ours a few bases beyond the stop codon which is currently 0 ( ribotricer/ribotricer/metagene.py Line 123 in af3bc3c
|
Hello @saketkc, thanks for this very nice tool ! I add a comment in this thread because I do find some strange results in the metagene profile on my side too. I typically observe this kind of plot on my data: For the start it looks nice, I have a peak ~12 nucleotides before the start codon. But it seems the Frame 3 (in blue) is majoritarian. I would have expected the Frame 1 (orange) instead ? For the stop codon plot, the Frame 1 (orange) seems nice. But the Frame 2 (green) is way higher and constant on the all plot. I installed I prepare the ORFS with: ribotricer prepare-orfs --gtf gencode.v19.annotation.gtf \
--fasta GRCh37.p13.genome.fa \
--prefix out \
--min_orf_length 30 \
--start_codons ATG,CTG,GTG ribotricer detect-orfs \
--bam input.bam\
--prefix 'out_' \
--ribotricer_index candidate_orfs.tsv \
--phase_score_cutoff 0.440 \
--report_all Thanks, |
I have also tried with the test data you're providing: https://www.dropbox.com/s/xr0xsdluuni2b95/ribotricer_test_data_tair10.zip I detected the translating ORF with the command: ribotricer detect-orfs --bam bams_unique/SRX219170.bam \
--ribotricer_index index/ribotricer_v44_annotation_longest_candidate_orfs.tsv \
--prefix SRX219170_generated If I understand correctly the code, the read length with the highest number of reads is taken as the reference (p-site shifts will be computed relatively to this reference). In this test example, it is the 29nt long that is majoritarian. The metagene profile looks like this: The triplet periodicity seems good, but there is a bias towards the Frame 2 (in green). I would have expected Frame 1 (orange to be majoritarian) ? When looking at the
Perhaps that is my understanding that is not good enough . Shouldn't we expect that only read length with a good periodicity + majority of reads on Frame 1 to be kept / considered as the reference for p-site alignment ? |
Hi Saket,
Thanks for developing this great tool! I am surprised to get these results for the
psite_offsets.txt
output file:How should I interpret that?
And I find my metagene profiles a bit strange. Especially around the stop codons. See this example:
While using RibORF with the same ribo-seq bam file I get this type of profiles showing a p-site offset of 13 nt for both start and stop codon:
Why do you think this is the case and should I then trust the results of my
translating_ORFs.tsv
output?Thanks!
Pierre
The text was updated successfully, but these errors were encountered: