Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No module named 'anndata' error #8

Open
gianfilippo opened this issue May 7, 2024 · 12 comments
Open

No module named 'anndata' error #8

gianfilippo opened this issue May 7, 2024 · 12 comments

Comments

@gianfilippo
Copy link

Hi,

I tried both the conda and docker (using singularity) versions
I run the following
MATK -t SuperCell -i data/cd34_multiome_rna.h5ad -o MATK_output/SuperCell/cd34/ -n 50 -f 2000 -k 30 -g 75 -s seurat
and
singularity run --bind $(pwd) matk_v1.0.sif MATK -t SuperCell -i data/cd34_multiome_rna.h5ad -o MATK_output/SuperCell/cd34/ -n 50 -f 2000 -k 30 -g 75 -s seurat

and I get the error below. Can you please help ?

Thanks

Error in py_module_import(module, convert = convert) :
ModuleNotFoundError: No module named 'anndata'
Run reticulate::py_last_error() for details.
Calls: -> -> py_module_import
Execution halted

@aurelieGabriel
Copy link
Collaborator

aurelieGabriel commented May 7, 2024

Dear Gianfilippo,

Thank you for your interest in MetacellAnalysisToolkit.

Using the following command lines, we obtained no error:

singularity pull docker://agabriel/matk:v1.0
singularity run --bind $(pwd) matk_v1.0.sif MATK -t SuperCell -i data/cd34_multiome_rna.h5ad -o MATK_output/SuperCell/cd34/ -n 50 -f 2000 -k 30 -g 75 -s seurat

We suspect that we are not using the same file as input. Unfortunately, the link to download the cd34_multiome_rna.h5ad initially provided in our README seems to be corrupted at the moment. We apologize for the inconvenience and have updated the link. Could you please download again the data and check that you have the following md5: 4cd8d82adfe267f54e13d8a383918fd0 by running: md5sum data/cd34_multiome_rna.h5ad.

We can also run a test on another dataset. After cloning/pulling the current MetacellAnalysisToolkit repository you can try the following command lines:

  • pull container:
    singularity pull docker://agabriel/matk:v1.0
  • retrieve pbmc dataset (h5ad file):
    singularity run --bind $(pwd) matk_v1.0.sif python get_data/get_PBMC_dataset.py
  • retrieve pbmc dataset (rdsfile):
    singularity run --bind $(pwd) matk_v1.0.sif Rscript get_data/get_PBMC_rds.R
  • run MATK on the h5ad file:
    singularity run --bind $(pwd) matk_v1.0.sif MATK -t SuperCell -i get_data/pbmc.h5ad -o MATK_output/SuperCell/pbmc/ -n 50 -f 2000 -k 30 -g 75 -s seurat
  • run MATK on the rds file:
    singularity run --bind $(pwd) matk_v1.0.sif MATK -t SuperCell -i get_data/pbmc.rds -o MATK_output/SuperCell/pbmc/ -n 50 -f 2000 -k 30 -g 75 -s seurat

Finally, for future usage, please note that matk:v1.0 is based on Seurat V4 and matk:v1.1 on Seurat V5. The command lines described above should run with both docker environments.

Best wishes.

@gianfilippo
Copy link
Author

Hi,

thanks, but I still get the same error. I think both the conda version and the Docker version look into my local Python path.

What do you suggest ?

Thanks

@aurelieGabriel
Copy link
Collaborator

Hi,

Thanks for the feedback, I agree that it could be the case considering the error message. I am surprised though that this happens in the docker container.

Could you try to identify which python is used, running:
singularity run --bind $(pwd) matk_v1.0.sif which python
In my case, I obtain the following: /opt/conda/envs/MetacellAnalysisToolkit/bin/python

I think that singularity has a strange behaviour and mounts also the HOME directory, can you provide the output of the following command:
singularity run --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config()"

and then do the same adding the --no-home option:
singularity run --no-home --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config()"

I suspect that adding --no-home could solve your issue.

Note that I had to update the docker containers, please pull again the containers using singularity before running your tests.

Let me know if this helps and I will update the README accordingly.

Best wishes,
Aurélie

@gianfilippo
Copy link
Author

Hi,

thanks. I am puzzled as well.

Anyway,

  1. singularity run --bind $(pwd) matk_v1.0.sif which python
    /opt/conda/envs/MetacellAnalysisToolkit/bin/python

  2. singularity run --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config()"
    python: /home/XXX/.conda/envs/r-reticulate/bin/python
    libpython: /home/XXX/.conda/envs/r-reticulate/lib/libpython3.10.so
    pythonhome: /home/XXX/.conda/envs/r-reticulate:/home/XXX/.conda/envs/r-reticulate
    version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
    numpy: [NOT FOUND]

  3. singularity run --no-home --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config()"
    python: /home/XXX/.conda/envs/r-reticulate/bin/python
    libpython: /home/XXX/.conda/envs/r-reticulate/lib/libpython3.10.so
    pythonhome: /home/XXX/.conda/envs/r-reticulate:/home/XXX/.conda/envs/r-reticulate
    version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
    numpy: [NOT FOUND]

I should also mention that the singularity run command looks into my R_LIBS_USER even if I add the "--no-home" flag and exits with a different error code early in the process
Error: package or namespace load failed for ‘Seurat’ in dyn.load(file, DLLpath = DLLpath, ...):

If I unset R_LIBS_USER, then I am back to the error I reported, also using the "--no-home" flag.

The python path seems to be correct, so I do not understand why i am getting the error.

What do you think ?

Best

@aurelieGabriel
Copy link
Collaborator

Hi,

Sorry for the delay, I was unavailable during the past week.

Could you please provide the output of the following command:

singularity run --no-home --cleanenv --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config(); .libPaths()"

And then:

singularity run --no-home --cleanenv --env R_LIBS_USER=/opt/conda/envs/MetacellAnalysisToolkit/lib/R/library --bind $(pwd) matk_v1.0.sif Rscript -e "reticulate::py_config(); .libPaths()"

Additionally, could you let me know the version of Singularity you are using? I would like to try to reproduce the error.

Finally, something that could help us debug would be to check the environment variables:

singularity exec --no-home --cleanenv --env R_LIBS_USER=/opt/conda/envs/MetacellAnalysisToolkit/lib/R/library --bind $(pwd) matk_v1.0.sif env

Best,

Aurélie

@gianfilippo
Copy link
Author

Hi,

thanks for looking into this!
The output of the first command:
python: /opt/conda/envs/MetacellAnalysisToolkit/bin/python3
libpython: /opt/conda/envs/MetacellAnalysisToolkit/lib/libpython3.9.so
pythonhome: /opt/conda/envs/MetacellAnalysisToolkit:/opt/conda/envs/MetacellAnalysisToolkit
version: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03) [GCC 11.3.0]
numpy: /opt/conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/numpy
numpy_version: 1.24.4

NOTE: Python version was forced by PATH

python versions found:
/opt/conda/envs/MetacellAnalysisToolkit/bin/python3
/opt/conda/envs/MetacellAnalysisToolkit/bin/python
[1] "/opt/conda/envs/MetacellAnalysisToolkit/lib/R/library"

I am really using apptainer 1.2.5-1.el8

The output from the second command:
APPTAINER_APPNAME=
APPTAINER_BIND=/home/$USERID/scripts/MetacellAnalysisToolkit
APPTAINER_COMMAND=exec
APPTAINER_CONTAINER=/home/$USERID/scripts/MetacellAnalysisToolkit/matk_v1.0.sif
APPTAINER_ENVIRONMENT=/.singularity.d/env/91-environment.sh
APPTAINER_NAME=matk_v1.0.sif
HOME=/home/$USERID
LANG=C.UTF-8
LC_ALL=C.UTF-8
LD_LIBRARY_PATH=/.singularity.d/libs
PATH=/opt/conda/envs/MetacellAnalysisToolkit/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/MetacellAnalysisToolkit/cli/
PROMPT_COMMAND=PS1="Apptainer> "; unset PROMPT_COMMAND
PS1=Apptainer>
PWD=/gpfs/ycga/pi/coppola/SamKatz/scripts/MetacellAnalysisToolkit
R_LIBS_USER=/opt/conda/envs/MetacellAnalysisToolkit/lib/R/library
SINGULARITY_BIND=/home/$USERID/scripts/MetacellAnalysisToolkit
SINGULARITY_CONTAINER=/home/$USERID/scripts/MetacellAnalysisToolkit/matk_v1.0.sif
SINGULARITY_ENVIRONMENT=/.singularity.d/env/91-environment.sh
SINGULARITY_NAME=matk_v1.0.sif
TERM=xterm-256color

Best
Gianfilippo

@aurelieGabriel
Copy link
Collaborator

Hello,

Based on these outputs, to me it seems that the paths inside the container are correct with --cleanenv, what is your error running the matk command including the cleanenv option?

singularity run --no-home --cleanenv --bind $(pwd) matk_v1.0.sif MATK -t SuperCell -i data/cd34_multiome_rna.h5ad -o MATK_output/SuperCell/cd34/ -n 50 -f 2000 -k 30 -g 75 -s seurat

or

singularity run --no-home --cleanenv --env R_LIBS_USER=/opt/conda/envs/MetacellAnalysisToolkit/lib/R/library --bind $(pwd) matk_v1.0.sif MATK -t SuperCell -i data/cd34_multiome_rna.h5ad -o MATK_output/SuperCell/cd34/ -n 50 -f 2000 -k 30 -g 75 -s seurat

I had some issues installing your version of apptainer, I will come back to you when it is solved.
Best,
Aurélie

@gianfilippo
Copy link
Author

Hi,

sorry about the delay.

I just tried the last two commands and the runs completed without errors.

I did not change anything on the cluster or my account settings. So I do not know why things are working now.
I will try with my data and see.

Thanks

@ksoleary
Copy link

ksoleary commented Jun 4, 2024

Just want to add for people (not sure if this could be causing any of the problems in this thread) that Seurat files should be v3/4 and file name should end in .rds not .RDS (case sensitive, sometimes people use all caps for file suffix). Great tool! It's working really well for me.

@gianfilippo
Copy link
Author

Hi,
I tried running on my data and on the example data.
SuperCell seems to work, but if I try SEACells or MetaCell, I get the following error
File "$HOME/bin/MetacellAnalysisToolkit/cli/MetaCell2CL.py", line 312, in
main(sys.argv[1:])
File "$HOME/bin/MetacellAnalysisToolkit/cli/MetaCell2CL.py", line 175, in main
ro.r(f'sobj <- readRDS("{input_file}")')
File "$HOME/.conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/rpy2/robjects/init.py", line 459, in call
res = self.eval(p)
File "$HOME/.conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/rpy2/robjects/functions.py", line 208, in call
return (super(SignatureTranslatedFunction, self)
File "$HOME/.conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/rpy2/robjects/functions.py", line 131, in call
res = super(Function, self).call(*new_args, **new_kwargs)
File "$HOME/.conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
cdata = function(*args, **kwargs)
File "$HOME/.conda/envs/MetacellAnalysisToolkit/lib/python3.9/site-packages/rpy2/rinterface.py", line 817, in call
raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Error in gzfile(file, "rb") : cannot open the connection

I also get a warning before:
WARNING: The R package "reticulate" only fixed recently
an issue that caused a segfault when used with rpy2:
rstudio/reticulate#1188
Make sure that you use a version of that package that includes
the fix.

I did install the latest reticulate, but the error persists

What do you think ?

@aurelieGabriel
Copy link
Collaborator

Hi @gianfilippo,

Thank you for your feedbacks. If I understand correctly you also have this error on the example data when using SEACells and MetaCell, could you please provide us the command line which led to this error?

Also, it seems that the error occurs when running readRDS(input_file), could you give us more information on how you built the input file?

Note that for SEACell, we recently fixed an issue that was arising when a seurat object without pca embedding was provided as input. If you are in this configuration, please make sure to pull the last changes of the github repo and if needed pull again the docker containers (container with SeuratV5: agabriel/matk:SeuratV5 and container with SeuratV4: agabriel/matk:SeuratV4).

Best,

@gianfilippo
Copy link
Author

Hi,

I tried it again, without making any changes and it works now. I do get some warnings with MetaCell, but it seems ok.
Then problem with the test data was the wrong input file.
The problem with my own data is unclear, as I did not change anything. I should probably just take a break :)

Thanks again for your input.

Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants