Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about batch #5

Open
Flu09 opened this issue Aug 21, 2024 · 10 comments
Open

question about batch #5

Flu09 opened this issue Aug 21, 2024 · 10 comments

Comments

@Flu09
Copy link

Flu09 commented Aug 21, 2024

Hello, I have 3 studies which I want to annotate using a built reference. I wonder if what I am doing is correct. I label transfered from the built reference for each dataset. I integrated the 3 studies by Seurat and harmony in R using seurat v5. but I started here in symphonypy from counts and followed the tutorial. Should I label transfer for the whole object and not one dataset at at time? would the batch corrected object help at all?

@serjisa
Copy link
Collaborator

serjisa commented Aug 21, 2024

Hi, @Flu09!

First of all, if you're more familiar with R, it's better to use the original Symphony: https://github.com/immunogenomics/symphony

Secondly, you can explicitly put information about batches during label transfer using key argument (it's better to do it this way — and the results should be similar to the label transfer for individual batches):
sp.tl.map_embedding(adata_query=adata_query, adata_ref=adata_ref, key=batch_key)

Overall Symphony performance on Seurat-corrected expressions wasn't benchmarked, so we can't say if it will give some meaningful results.

@Flu09
Copy link
Author

Flu09 commented Aug 21, 2024

I see thank you so much. I have this error. Do you have any suggestions?

sp.tl.map_embedding(adata_query=sample, adata_ref=adata)
538 out of 3000 genes from the reference are missing in the query dataset or have zero std in the reference, their expressions in the query will be set to zero
Traceback (most recent call last):
File "", line 1, in
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/tools.py", line 336, in map_embedding
_map_query_to_ref(
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/_utils.py", line 278, in _map_query_to_ref
t = _adjust_for_missing_genes(
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/_utils.py", line 240, in _adjust_for_missing_genes
X = adata[:, use_genes_list[use_genes_list_present]].X
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/anndata/_core/anndata.py", line 591, in X
_subset(self._adata_ref.X, (self._oidx, self._vidx)),
File "/usr/lib64/python3.9/functools.py", line 888, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/anndata/_core/index.py", line 165, in _subset_spmatrix
return a[subset_idx]
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scipy/sparse/_index.py", line 68, in getitem
return self._get_sliceXarray(row, col)
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scipy/sparse/_csr.py", line 326, in _get_sliceXarray
return self._major_slice(row)._minor_index_fancy(col)
File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scipy/sparse/_compressed.py", line 768, in _minor_index_fancy
csr_column_index1(k, idx, M, N, self.indptr, self.indices,
ValueError: Output dtype not compatible with inputs.

@potulabe
Copy link
Owner

Hi @Flu09! I'm so sorry that you are encountering this bug! What's the datatype of your sparse matrix adata_query.X in the example above?

@Flu09
Copy link
Author

Flu09 commented Aug 21, 2024

float 64 for both the reference and the samples. I think they need to be converted to float32 and the column of the celltype to catergory?

print(adata.obs['cell_type_high_resolution'].dtype)
object
adata.X
<1353075x33538 sparse matrix of type '<class 'numpy.float64'>'
with 4457926739 stored elements in Compressed Sparse Row format>
sample.X
<3057x38152 sparse matrix of type '<class 'numpy.float64'>'
with 4187950 stored elements in Compressed Sparse Row format>

@potulabe
Copy link
Owner

Eh, float64 seems to be OK, I was just hoping that it's connected this bug with np.float16:
https://stackoverflow.com/questions/40046118/why-cant-i-assign-data-to-part-of-sparse-matrix-in-the-first-try

@potulabe
Copy link
Owner

@Flu09 Don't you mind sharing the least subsample of data to reproduce the error? Probably it could be a couple of cells per dataset.

@potulabe
Copy link
Owner

Probably related to scverse/anndata#1349?

@Flu09
Copy link
Author

Flu09 commented Aug 22, 2024

I can try preparing some data to share. changing both reference and sample to float32 solved the previous issue.

New error message below

sp.tl.map_embedding(adata_query=sample, adata_ref=adata)
538 out of 3000 genes from the reference are missing in the query dataset or have zero std in the reference, their expressions in the query will be set to zero
>>> 
>>> # Mapping UMAP coordinates
>>> sp.tl.ingest(adata_query=sample, adata_ref=adata)
/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/umap/umap_.py:1943: UserWarning: n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism.
  warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")
TypeError: float() argument must be a string or a number, not 'csr_matrix'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/tools.py", line 238, in ingest
    ing.map_embedding(method)
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scanpy/tools/_ingest.py", line 499, in map_embedding
    self._obsm['X_umap'] = self._umap_transform()
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/scanpy/tools/_ingest.py", line 488, in _umap_transform
    return self._umap.transform(self._obsm['rep'])
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/umap/umap_.py", line 3028, in transform
    indices, dists = self._knn_search_index.query(
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/pynndescent/pynndescent_.py", line 1696, in query
    query_data = np.asarray(query_data).astype(np.float32, order="C")
ValueError: setting an array element with a sequence.
>>> 
>>> # Labels prediction
>>> sp.tl.transfer_labels_kNN(
...     adata_query=sample,
...     adata_ref=adata,
...     ref_labels=["leiden", "cell_type_high_resolution"],
... )
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/symphonypy/tools.py", line 411, in transfer_labels_kNN
    knn.fit(adata_ref.obsm[ref_basis], adata_ref.obs[ref_labels])
  File "/home/x/.virtualenvs/r-reticulate/lib64/python3.9/site-packages/anndata/_core/aligned_mapping.py", line 196, in __getitem__
    return self._data[key]
KeyError: 'X_pca_harmony'
>>> 

@potulabe
Copy link
Owner

@Flu09 I'm so sorry, could you please share a small subset of your data :(

@potulabe
Copy link
Owner

And the versions of anndata and scanpy packages which you are using

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants