question about batch #5

Flu09 · 2024-08-21T03:51:52Z

Hello, I have 3 studies which I want to annotate using a built reference. I wonder if what I am doing is correct. I label transfered from the built reference for each dataset. I integrated the 3 studies by Seurat and harmony in R using seurat v5. but I started here in symphonypy from counts and followed the tutorial. Should I label transfer for the whole object and not one dataset at at time? would the batch corrected object help at all?

serjisa · 2024-08-21T09:22:56Z

Hi, @Flu09!

First of all, if you're more familiar with R, it's better to use the original Symphony: https://github.com/immunogenomics/symphony

Secondly, you can explicitly put information about batches during label transfer using key argument (it's better to do it this way — and the results should be similar to the label transfer for individual batches):
sp.tl.map_embedding(adata_query=adata_query, adata_ref=adata_ref, key=batch_key)

Overall Symphony performance on Seurat-corrected expressions wasn't benchmarked, so we can't say if it will give some meaningful results.

Flu09 · 2024-08-21T13:01:19Z

I see thank you so much. I have this error. Do you have any suggestions?

sp.tl.map_embedding(adata_query=sample, adata_ref=adata)
538 out of 3000 genes from the reference are missing in the query dataset or have zero std in the reference, their expressions in the query will be set to zero
Traceback (most recent call last):
File "", line 1, in
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/tools.py", line 336, in map_embedding
_map_query_to_ref(
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/_utils.py", line 278, in _map_query_to_ref
t = _adjust_for_missing_genes(
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/_utils.py", line 240, in _adjust_for_missing_genes
X = adata[:, use_genes_list[use_genes_list_present]].X
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/anndata/_core/anndata.py", line 591, in X
_subset(self._adata_ref.X, (self._oidx, self._vidx)),
File "/usr/lib64/python3.9/functools.py", line 888, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/anndata/_core/index.py", line 165, in _subset_spmatrix
return a[subset_idx]
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scipy/sparse/_index.py", line 68, in getitem
return self._get_sliceXarray(row, col)
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scipy/sparse/_csr.py", line 326, in _get_sliceXarray
return self._major_slice(row)._minor_index_fancy(col)
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scipy/sparse/_compressed.py", line 768, in _minor_index_fancy
csr_column_index1(k, idx, M, N, self.indptr, self.indices,
ValueError: Output dtype not compatible with inputs.

potulabe · 2024-08-21T16:07:22Z

Hi @Flu09! I'm so sorry that you are encountering this bug! What's the datatype of your sparse matrix adata_query.X in the example above?

Flu09 · 2024-08-21T17:15:53Z

float 64 for both the reference and the samples. I think they need to be converted to float32 and the column of the celltype to catergory?

print(adata.obs['cell_type_high_resolution'].dtype)
object
adata.X
<1353075x33538 sparse matrix of type '<class 'numpy.float64'>'
with 4457926739 stored elements in Compressed Sparse Row format>
sample.X
<3057x38152 sparse matrix of type '<class 'numpy.float64'>'
with 4187950 stored elements in Compressed Sparse Row format>

potulabe · 2024-08-21T17:18:18Z

Eh, float64 seems to be OK, I was just hoping that it's connected this bug with np.float16:
https://stackoverflow.com/questions/40046118/why-cant-i-assign-data-to-part-of-sparse-matrix-in-the-first-try

potulabe · 2024-08-21T23:59:03Z

@Flu09 Don't you mind sharing the least subsample of data to reproduce the error? Probably it could be a couple of cells per dataset.

potulabe · 2024-08-22T02:50:30Z

Probably related to scverse/anndata#1349?

Flu09 · 2024-08-22T12:22:23Z

I can try preparing some data to share. changing both reference and sample to float32 solved the previous issue.

New error message below

sp.tl.map_embedding(adata_query=sample, adata_ref=adata)
538 out of 3000 genes from the reference are missing in the query dataset or have zero std in the reference, their expressions in the query will be set to zero
>>> 
>>> # Mapping UMAP coordinates
>>> sp.tl.ingest(adata_query=sample, adata_ref=adata)
/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/umap/umap_.py:1943: UserWarning: n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism.
  warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")
TypeError: float() argument must be a string or a number, not 'csr_matrix'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/tools.py", line 238, in ingest
    ing.map_embedding(method)
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scanpy/tools/_ingest.py", line 499, in map_embedding
    self._obsm['X_umap'] = self._umap_transform()
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scanpy/tools/_ingest.py", line 488, in _umap_transform
    return self._umap.transform(self._obsm['rep'])
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/umap/umap_.py", line 3028, in transform
    indices, dists = self._knn_search_index.query(
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/pynndescent/pynndescent_.py", line 1696, in query
    query_data = np.asarray(query_data).astype(np.float32, order="C")
ValueError: setting an array element with a sequence.
>>> 
>>> # Labels prediction
>>> sp.tl.transfer_labels_kNN(
...     adata_query=sample,
...     adata_ref=adata,
...     ref_labels=["leiden", "cell_type_high_resolution"],
... )
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/tools.py", line 411, in transfer_labels_kNN
    knn.fit(adata_ref.obsm[ref_basis], adata_ref.obs[ref_labels])
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/anndata/_core/aligned_mapping.py", line 196, in __getitem__
    return self._data[key]
KeyError: 'X_pca_harmony'
>>>

potulabe · 2024-08-24T16:09:16Z

@Flu09 I'm so sorry, could you please share a small subset of your data :(

potulabe · 2024-08-24T16:14:34Z

And the versions of anndata and scanpy packages which you are using

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about batch #5

question about batch #5

Flu09 commented Aug 21, 2024

serjisa commented Aug 21, 2024

Flu09 commented Aug 21, 2024 •

edited

Loading

potulabe commented Aug 21, 2024

Flu09 commented Aug 21, 2024 •

edited

Loading

potulabe commented Aug 21, 2024

potulabe commented Aug 21, 2024

potulabe commented Aug 22, 2024

Flu09 commented Aug 22, 2024 •

edited

Loading

potulabe commented Aug 24, 2024

potulabe commented Aug 24, 2024

question about batch #5

question about batch #5

Comments

Flu09 commented Aug 21, 2024

serjisa commented Aug 21, 2024

Flu09 commented Aug 21, 2024 • edited Loading

potulabe commented Aug 21, 2024

Flu09 commented Aug 21, 2024 • edited Loading

potulabe commented Aug 21, 2024

potulabe commented Aug 21, 2024

potulabe commented Aug 22, 2024

Flu09 commented Aug 22, 2024 • edited Loading

potulabe commented Aug 24, 2024

potulabe commented Aug 24, 2024

Flu09 commented Aug 21, 2024 •

edited

Loading

Flu09 commented Aug 21, 2024 •

edited

Loading

Flu09 commented Aug 22, 2024 •

edited

Loading