Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sleuth-Error in check_target_mapping #257

Open
nmorf opened this issue May 31, 2021 · 3 comments
Open

Sleuth-Error in check_target_mapping #257

nmorf opened this issue May 31, 2021 · 3 comments

Comments

@nmorf
Copy link

nmorf commented May 31, 2021

Hello,

I'm trying to use bioMart to retrieve the gene names from Apis mellifera from Ensemble. I'm trying to analyze the data generated by Kallisto using Sleuth.

I encounter the error posted in 2017 (link below). I haven't been able to fix it myself. I was wondering if someone could direct me to a possible solution without editing the fasta files?

#111

Here is the error message that I get.

mart <- useMart('metazoa_mart', host = 'metazoa.ensembl.org')
mart <- useDataset('amellifera_eg_gene', mart)

t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id",

  •                                  "external_gene_name"), mart = mart)
    

t2g <- dplyr::rename(t2g, target_id = ensembl_transcript_id,

  •                  ens_gene = ensembl_gene_id, ext_gene = external_gene_name)
    

so <- sleuth_prep(s2c, ~ condition, target_mapping = t2g)
reading in kallisto results
dropping unused factor levels
........................
Error in check_target_mapping(tmp_names, target_mapping, !is.null(aggregation_column)) :
couldn't solve nonzero intersection
In addition: There were 25 warnings (use warnings() to see them)

Thank you,
nm

@gcamprecios
Copy link

Good morning,

This is the first time I use Sleuth after pseudoalignment with kallisto. Quite new to this.
Everything runs well, except for when I try to collapse transcripts to genes with the target_mapping. I get exactly the same error as nmorf above, and I was wondering if it had been solved somewhere else. I can't seem to find an answer, and I've tried to generate all kinds of files to use this function. Here it is the code I am using, which is basically what I see in the walkthroughs and from everybody!
To generate the t2g file:

mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL",
dataset = "hsapiens_gene_ensembl",
host = 'ensembl.org')
t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id","ensembl_transcript_id_version", "ensembl_gene_id",
"ensembl_gene_id_version","external_gene_name","description",
"chromosome_name","start_position",
"end_position","strand",
"entrezgene_id"), mart = mart)
t2g <- dplyr::rename(t2g, target_id = ensembl_transcript_id,
ens_gene = ensembl_gene_id, ext_gene = external_gene_name)

t2g <- dplyr::select(t2g, c('target_id', 'ens_gene', 'ext_gene'))

To run the sleuth_prep function:

so122 <- sleuth_prep (metadata122,
target_mapping = t2g,
aggregation_column = 'ens_gene',
read_bootstrap_tpm = TRUE,
extra_bootstrap_summary = TRUE,
transformation_function = function(x) log2(x + 0.5),
num_cores = 2)

The error I get all the time (no matter how I construct the t2g data.frame):

Warning: It appears that you are running Sleuth from within Rstudio.
Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
If you wish to take advantage of multiple cores, please consider running sleuth from the command line.reading in kallisto results
dropping unused factor levels
Error in check_target_mapping(tmp_names, target_mapping, !is.null(aggregation_column)) :
couldn't solve nonzero intersection

And here I show you he first rows of our .tsv abundance file from kallisto (I use the .h5 for the sleuth_prep:

<style> </style>
target_id               length eff_length est_counts tpm
ENST00000456328.2 ENSG00000223972.5 OTTHUMG00000000961.2 OTTHUMT00000362751.1 DDX11L1-202 DDX11L1 1657 processed_transcript 1657 1453.07 0 0
ENST00000450305.2 ENSG00000223972.5 OTTHUMG00000000961.2 OTTHUMT00000002844.2 DDX11L1-201 DDX11L1 632 transcribed_unprocessed_pseudogene 632 428.3 0 0
ENST00000488147.1 ENSG00000227232.5 OTTHUMG00000000958.1 OTTHUMT00000002839.1 WASH7P-201 WASH7P 1351 unprocessed_pseudogene 1351 1147.07 0 0
ENST00000619216.1 ENSG00000278267.1 - - MIR6859-1-201 MIR6859-1 68 miRNA 68 34.625 0 0
ENST00000473358.1 ENSG00000243485.5 OTTHUMG00000000959.2 OTTHUMT00000002840.1 MIR1302-2HG-202 MIR1302-2HG 712 lncRNA 712 508.07 0 0

@sigusn
Copy link

sigusn commented Aug 4, 2022

Hi, I think there could be some issue with the abundance file. I usually only have one column with "target_id" but you have more columns without headings.
Example of my abundance.tsv
target_id length eff_length est_counts tpm
ENST00000631435.1 12 6.64286 0 0

@gcamprecios
Copy link

HI @sigusn , thanks very much for the response. Indeed, I found another page where all this issue was discussed and solved back in 2019. My problem is that I generated my kallisto with the genomcodev40, and all my abundance files had looong "target_id" names, which made it impossible to match. I used their code to change the names to all the abundance files at once, leaving only the ENS name.

I leave here the page with the discussion and solution.
Thanks!

https://groups.google.com/g/kallisto-and-applications/c/KQ8782UD35E/m/hbqqMOgGBwAJ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants