You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if there is a way to get the isoform counts. I was trying to get the isoform counts following your Nature paper (specifically https://github.com/pachterlab/BYVSTZP_2020).
You mentioned that for the 10xv3 data, "gene-count matrices were made by using the -genecounts flag and TCC matrices were made by omitting it". It works great for the gene-count part with the following command:
I got the cells x genes matrix both in the mtx and h5ad format.
My question is, how to get a cells x transcripts matrix? It does not seem to work by simply adding the "--tcc" to the above command. I can get a cells x tcc mtx, but not the cells x transcripts mtx. Moreover, I don't know how to apply or omit the "--genecounts" flag.
Thank you so much!
P.
The text was updated successfully, but these errors were encountered:
Currently, kb count only does transcript quantification for bulk/smart-seq data (where each sample or cell is in a separate FASTA file).
For 10X type data, kb count stops at the cells x tcc mtx. However, you can run "kallisto quant-tcc" on the cells x tcc mtx to try to get transcript quantification.
I was testing this on the forebrain glutamatergic neuronal lineage data in the KBtools tutorial. The kb count tcc matrix (394,494 x 6,238,208) is huge for the kallisto quant-tcc step. It runs forever even on an HPC cluster node (64 cores, ~ TB memory; 12 hours now, still running). I think probably I should only take the cells according to other studies, such as in the RNA velocity study (only about 1800 cells are kept). Could you please commend on this?
Oh, with such a large matrix, it's computationally intractable. You will definitely need to filter cells.
The EM algorithm (which gives you transcript counts) in quant-tcc only takes a few seconds to run, but if you multiply a few seconds by hundreds of thousands of cells, well, you do the math of how long it'll take to run.
Great tool that integrates lots of functions!
I was wondering if there is a way to get the isoform counts. I was trying to get the isoform counts following your Nature paper (specifically https://github.com/pachterlab/BYVSTZP_2020).
You mentioned that for the 10xv3 data, "gene-count matrices were made by using the -genecounts flag and TCC matrices were made by omitting it". It works great for the gene-count part with the following command:
$ kb count --h5ad -i index.idx -g t2g.txt -x 10xv3 -o XXX -m 64G --workflow standard --filter bustools -t 32
I got the cells x genes matrix both in the mtx and h5ad format.
My question is, how to get a cells x transcripts matrix? It does not seem to work by simply adding the "--tcc" to the above command. I can get a cells x tcc mtx, but not the cells x transcripts mtx. Moreover, I don't know how to apply or omit the "--genecounts" flag.
Thank you so much!
P.
The text was updated successfully, but these errors were encountered: