taxonomy_mapping() does not find taxonomy column names when using latest docker image #30

meghanaturner · 2023-06-26T17:27:01Z

Using the latest scrattch-mapping docker image release leads to an error on line 22 of R/taxonomy_mapping() because colnames(AIT.anndata$uns$clusterInfo) returns NULL.

This issue can be fixed by switching back to the 0.16 version of the docker image. Using 0.16 and the exact same taxonomy h5ad file (//allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/meghanturner/brain3_mapping/taxonomies/AIT17.0.logCPM.sampled100_MERSCOPE_BRAIN3_GENES_dense_for_mapping.h5ad), colnames(AIT.anndata$uns$clusterInfo) returns the expected list of column names and the code runs as it should.

@berl @egelfan2

The text was updated successfully, but these errors were encountered:

UCDNJJ · 2023-06-26T22:30:52Z

Thanks for reporting this error!

Starting with the latest scrattch-mapping docker (bicore/scrattch_mapping:latest). I wasn't able to recreate this issue from our test cases, so you've found some fun edge case. To be complete, I also tried loading AIT17.0.logCPM.sampled100_MERSCOPE_BRAIN3_GENES_dense_for_mapping.h5ad directly with anndata::read_h5ad() and found the column names available for AIT.anndata$uns$clusterInfo.

I'll need some additional info to figure out what's going on. A few questions:

Are you using anndata::read_h5ad() or scrattch_mapping::loadTaxonomy() to load the taxonomy into R?
Can you share the directory containing all the taxonomy files that should have been created with scrattch_mapping::buildTaxonomy()?
Do you mind sharing your script that works only under scrattch_mapping version 0.16?
Can you share the error report?

berl · 2023-06-27T16:31:31Z

FYI @scseeman

meghanaturner · 2023-06-27T16:37:29Z

I'm using anndata::read_h5ad() to load the taxonomy
The taxonomy file wasn't built directly with scrattch_mapping::buildTaxonomy()
/allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/meghanturner/brain3_mapping/scrattch-mapping_batch.R
Error: Error in taxonomy_mapping(AIT.anndata = taxonomy_anndata, query.data = query_data, : Not all label.cols exists in AIT.anndata$uns$clusterInfo

That's interesting that you can find the column names when you load with anndata::read_h5ad() in the latest. For me, there's different behavior in how the column names are accessible between the two versions.

In the latest version:

colnames(anndata$uns$clusterInfo), which is called in line 22 of taxonomy_mapping, returns NULL
whereas, anndata$uns$clusterInfo$columns returns the expected column names:

[1] "sample_id" "cl" "cluster_label"
[4] "Level2_id_label" "Level1_id_label" "supertype_id_label"
[7] "class_id_label" "nt_type_label" "cluster_id.AIT16"
[10] "library_prep" "gene.counts.0" "doublet_score"
[13] "roi" "umi.counts" "qc.score"
[16] "method" "region_label" "region_id"
[19] "sex" "external_donor_name" "age"
[22] "platform" "knn.dist" "knn.dist.z"
[25] "medical_conditions" "broad_region" "cluster_id"
[28] "neighborhood" "batch"

class(taxonomy_anndata$uns$clusterInfo) returns:

[1] "pandas.core.frame.DataFrame" "pandas.core.generic.NDFrame"
[3] "pandas.core.base.PandasObject" "pandas.core.accessor.DirNamesMixin"
[5] "pandas.core.indexing.IndexingMixin" "pandas.core.arraylike.OpsMixin"
[7] "python.builtin.object"

In 0.16 the opposite is true:

anndata$uns$clusterInfo$columns returns NULL
colnames(anndata$uns$clusterInfo), which is called in line 22 of taxonomy_mapping, returns the expected column names:

[1] "sample_id" "cl" "cluster_label"
[4] "Level2_id_label" "Level1_id_label" "supertype_id_label"
[7] "class_id_label" "nt_type_label" "cluster_id.AIT16"
[10] "library_prep" "gene.counts.0" "doublet_score"
[13] "roi" "umi.counts" "qc.score"
[16] "method" "region_label" "region_id"
[19] "sex" "external_donor_name" "age"
[22] "platform" "knn.dist" "knn.dist.z"
[25] "medical_conditions" "broad_region" "cluster_id"
[28] "neighborhood" "batch"

class(taxonomy_anndata$uns$clusterInfo) returns:

[1] "data.frame"

UCDNJJ · 2023-06-27T17:55:47Z

I was able to consistently retrieve anndata$uns$clusterInfo as a data.frame under both scrattch_mapping versions. This has to be an environment issue thats leading to you seeing pandas.core.frame.DataFrame under the latest docker. One last ask: Can you return the sessionInfo() for each scrattch_mapping docker when you are running it.

Somehow the version of anndata (R library) was downgraded in the latest scrattch mapping docker. I suspect this is the culprit:

bicore/scrattch_mapping:latest -- anndata_0.7.5.3
bicore/scrattch_mapping:0.16 -- anndata_0.7.5.6

scseeman · 2023-06-27T18:13:12Z

@meghanaturner @UCDNJJ a couple of weeks ago I was having issues with :latest docker loading at all. I talked with Anish about it and realize that I never actually heard if it got fixed. I've been using the singularity file /allen/programs/celltypes/workgroups/rnaseqanalysis/bicore/singularity/scrattch_mapping_0.2.sif directly instead of the docker and that has worked fine

meghanaturner · 2023-06-27T19:52:59Z

Indeed, 0.16 has anndata="0.7.5.6" and latest has anndata="0.7.5.3"

It looks like R was also downgraded:

0.16 sessionInfo():

R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_4.2.2

latest sessionInfo():

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_4.2.0

UCDNJJ · 2023-07-27T18:08:11Z

Hi @meghanaturner, when you have time can you check that this issues is resolved when using this docker image: docker://njjai/scrattch_mapping:0.4. This docker image has the most up to date versions of the anndata and R packages that the previous latest image was supposed to contain.

singularity shell --cleanenv docker://njjai/scrattch_mapping:0.4

Forewarning, quite a few changes exist in this new update. So if you hit an error let us know.

meghanaturner · 2023-08-01T16:38:50Z

Hi @UCDNJJ, this docker image seems to have fixed the original issue I reported where the column names weren't found

meghanaturner · 2023-08-01T16:38:59Z

@UCDNJJ However, read_h5ad() no longer reads in sparse matrices as the data type that scrattch-mapping is expecting to find. This problem spontaneously showed up in docker://bicore/scrattch_mapping:0.16 and docker://bicore/scrattch_mapping:latest a couple weeks ago, and does not appear to be fixed by docker://njjai/scrattch_mapping:0.4.

"Error caught for Correlation mapping."
<simpleError in validObject(.Object): invalid class “dgCMatrix” object: 'Dim' slot does not have length 2>
Error in rownames<-(*tmp*, value = colnames(query.data)) :
attempt to set 'rownames' on an object with no dimensions
Calls: taxonomy_mapping -> rownames<-

The same error is thrown for a dgCMatrix. The workaround is to only use taxonomy and spatial anndata objects where X is a dense matrix.

As an alternative to read_h5ad, I tried using

loadTaxonomy(taxonomyDir = "//allen/programs/celltypes/workgroups/rnaseqanalysis/mFISH/meghanturner/brain3_mapping/AIT17.0.logCPM.sampled100_MERSCOPE_BRAIN3_GENES_cscSparseX.h5ad",
anndata_file = "AIT17.0.logCPM.sampled100_MERSCOPE_BRAIN3_GENES_cscSparseX.h5ad")

but despite the documentation for the taxonomyDir argument suggesting that it supports direct h5ad files that aren't part of a shiny taxonomy folder, it errors out with: Required files to load Allen Institute taxonomy are missing.

I saw that you split off scrattch-taxonomy, including loadTaxonomy(), from scrattch-mapping into it's own repo. Should I raise this issue over there?

UCDNJJ · 2023-08-01T17:07:30Z

Interesting, we definitely don't want to be using dense matrices all the time! Let's leave this issue here for now.

We need to do a better job with documentation but you should always use loadTaxonomy() since we do some work in that function to make sure the anndata object is initialized for mapping. The anndata_file argument assumes an .h5ad file that was generated with buildTaxonomy() which is why you are seeing that error about missing required files.

I took a quick look and AIT17.0.logCPM.sampled100_MERSCOPE_BRAIN3_GENES_cscSparseX.h5ad doesn't appear to have been setup with buildTaxonomy() so this .h5ad will not work with scrattch.mapping. I would suggest running buildTaxonomy() using the count matrix and metadata from that object, you can follow the steps in this tutorial: build_taxonomy

Also, can see if you can run the tutorial without error: mapping

meghanaturner · 2023-08-01T19:26:39Z

In attempting to follow the build_taxonomy tutorial, I am unable to load the counts matrix from the taxonomy I'm using into R.

I am not familiar with R, so I'm not sure what R's anndata package is expecting to find in an ad.X stored as a CSR sparse matrix. And the tutorial does not provide any suggestions of how to read in counts matrices from other h5ad files (it just does library(tasic2016data); taxonomy.counts = tasic_2016_counts)

# import libraries
library(scrattch.mapping)
library(umap)

# taxonomy I want to use for mapping my spatial data
taxonomy_h5ad_path = "//allen/programs/celltypes/workgroups/rnaseqanalysis/shiny/Taxonomies/AIT17.0_mouse/Prepare/AIT17.0.logCPM.sampled100.h5ad"

# Load taxonomy anndata file
taxonomy_anndata = read_h5ad(taxonomy_h5ad_path )

# Load the count data
taxonomy.counts = taxonomy_anndata$X   # ***this line fails***

Error:

Error in py_ref_to_r(x) : negative length vectors are not allowed
Calls: <Anonymous> ... py_to_r.numpy.ndarray -> NextMethod -> py_to_r.default -> py_ref_to_r

UCDNJJ · 2023-08-01T23:18:06Z

So I also tried running through your code both in a separate R environment and within the scrattch.mapping docker. Both produced the same error.

Could this error be arising due to the dataset size or some change in the .h5ad file that happened a few weeks ago.

I can use the same approach you shared with a dataset or ~340k cells and ~22k genes: /allen/programs/celltypes/workgroups/rnaseqanalysis/shiny/10x_seq/NHP_BG_AIT_115/NHP_BG_AIT115_complete.h5ad. R successfully reads in the anndata$X as a dgR sparse matrix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

taxonomy_mapping() does not find taxonomy column names when using latest docker image #30

taxonomy_mapping() does not find taxonomy column names when using latest docker image #30

meghanaturner commented Jun 26, 2023

UCDNJJ commented Jun 26, 2023 •

edited

Loading

berl commented Jun 27, 2023

meghanaturner commented Jun 27, 2023

UCDNJJ commented Jun 27, 2023

scseeman commented Jun 27, 2023

meghanaturner commented Jun 27, 2023

UCDNJJ commented Jul 27, 2023

meghanaturner commented Aug 1, 2023

meghanaturner commented Aug 1, 2023

UCDNJJ commented Aug 1, 2023

meghanaturner commented Aug 1, 2023

UCDNJJ commented Aug 1, 2023

taxonomy_mapping() does not find taxonomy column names when using latest docker image #30

taxonomy_mapping() does not find taxonomy column names when using latest docker image #30

Comments

meghanaturner commented Jun 26, 2023

UCDNJJ commented Jun 26, 2023 • edited Loading

berl commented Jun 27, 2023

meghanaturner commented Jun 27, 2023

UCDNJJ commented Jun 27, 2023

scseeman commented Jun 27, 2023

meghanaturner commented Jun 27, 2023

UCDNJJ commented Jul 27, 2023

meghanaturner commented Aug 1, 2023

meghanaturner commented Aug 1, 2023

UCDNJJ commented Aug 1, 2023

meghanaturner commented Aug 1, 2023

UCDNJJ commented Aug 1, 2023

UCDNJJ commented Jun 26, 2023 •

edited

Loading