Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First/Last sample in each grouping always return nan #110

Open
EspressoKris opened this issue Oct 30, 2023 · 4 comments
Open

First/Last sample in each grouping always return nan #110

EspressoKris opened this issue Oct 30, 2023 · 4 comments

Comments

@EspressoKris
Copy link

Hi,

Thanks for the fantastic tool.
I am writing as I noticed that despite not having any NA in my inclusion levels, my first and last samples within each group compared always return 'nan'. If I compare fewer/different samples, I keep seeing this phenomena.
Changing the script code shows that the first and last sample's inclusion levels are not returned, suggesting an indexing issue perhaps?

I am running in base from the local server:
apps/samtools/1.17

and the following from my conda environment:
rmats2sashimiplot 3.0.0 pypi
python 2.7.15 h5a48372_1011_cpython conda-forge

My script is pretty stardard:
rmats2sashimiplot --b1 comparison_group1.txt --b2 comparison_group2.txt --event-type SE -e filtered_rmats_SE_output.jcec --l1 name_group1 --l2 name_group2 --exon_s 1 --intron_s 1 -c gencode.gff3 --group-info grouping.gf -o output_folder

@EricKutschera
Copy link
Contributor

The nan inclusion levels would happen if there's an error reading a value from the -e file: https://github.com/Xinglab/rmats2sashimiplot/blob/v3.0.0/src/rmats2sashimiplot/rmats2sashimiplot.py#L265

The code gets the inclusion levels from certain columns of the -e file based on the --event-type. For SE it uses items[20] and items[21] (IncLevel1 and IncLevel2): https://github.com/Xinglab/rmats2sashimiplot/blob/v3.0.0/src/rmats2sashimiplot/rmats2sashimiplot.py#L502

It expects the number of values in those two columns to match with the number of bam files in --b1 and --b2. The values that end up getting used depend on --group-info

Can you post your --b1, --b2, -e, and --group-info files?

@EspressoKris
Copy link
Author

Hi @EricKutschera,

Thanks for your reply. Below the data you requested:

Data.zip

So IncLevel1/2 don't seem to have any missing float number, except that the first and last number in each are not read when i run the script for some reason. Probably I must have made some mistake in the grouping file?

Thanks in advance for the help

@EricKutschera
Copy link
Contributor

The command you posted before had -e filtered_rmats_SE_output.jcec, but the files you posted had name_group1_vs_name_group2_SE.MATS.JCEC.csv. The -e file is expected to be tab separated so I wouldn't expect the comma separated file to work. I think the issue is that (in the csv file at least) the IncLevel1 and IncLevel2 values are quoted like "0.124,0.076,0.11". Then rmats2sashimiplot would parse the values as float("0.124) float(0.076) and float(0.11") and the " would cause an error

@vandoorslaer
Copy link

Ha! Excel strikes again.
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants