Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cytotable convert fails to reload config file outside for loop #90

Closed
jenna-tomkinson opened this issue Aug 21, 2023 · 8 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@jenna-tomkinson
Copy link
Member

Problem

When creating a parsl config variable outside of a for loop running convert on multiple SQLite files, it is not able to reload the config when running on the next file producing this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File ~/mambaforge/envs/durbin_lab_env/lib/python3.8/site-packages/cytotable/convert.py:1338, in convert(source_path, dest_path, dest_datatype, source_datatype, metadata, compartments, identifying_columns, concat, join, joins, chunk_columns, chunk_size, infer_common_schema, drop_null, data_type_cast_map, preset, parsl_config, **kwargs)
   1336     else:
   1337         # else we attempt to load the given parsl configuration
-> 1338         parsl.load(parsl_config)
   1339 except RuntimeError as runtime_exc:
   1340     # catch cases where parsl has already been loaded and defer to
   1341     # previously loaded configuration with a warning

File ~/mambaforge/envs/durbin_lab_env/lib/python3.8/site-packages/typeguard/__init__.py:1033, in typechecked..wrapper(*args, **kwargs)
   1032 check_argument_types(memo)
-> 1033 retval = func(*args, **kwargs)
   1034 try:

File ~/mambaforge/envs/durbin_lab_env/lib/python3.8/site-packages/parsl/dataflow/dflow.py:1423, in DataFlowKernelLoader.load(cls, config)
   1422 if cls._dfk is not None:
-> 1423     raise RuntimeError('Config has already been loaded')
   1425 if config is None:

RuntimeError: Config has already been loaded

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
...
    339                                start_method=self.start_method)
    340 self.launch_cmd = l_cmd
    341 logger.debug("Launch command: {}".format(self.launch_cmd))

KeyError: 'block_id'

I am using this code as an example for getting this error:

# parsl workflow executer to run as multi-processing instead of multi-threaded (found in cytotable)
local_htex = Config(
    executors=[
        HighThroughputExecutor(
            label="htex_Local",
            worker_debug=True,
            cores_per_worker=1,
            provider=LocalProvider(
                channel=LocalChannel(),
                init_blocks=1,
                max_blocks=1,
            ),
        )
    ],
    strategy=None,
    )

for plate_folder in sqlite_dir.iterdir():
    output_path = pathlib.Path(f"{output_dir}/{plate_folder.stem}_converted.parquet")
    # merge single cells and output as parquet file
    convert(
        source_path=str(plate_folder),
        dest_path=str(output_path),
        dest_datatype=dest_datatype,
        metadata=["image"],
        compartments=["nuclei"],
        identifying_columns=["ImageNumber"],
        joins=preset_join,
        parsl_config=local_htex,
        chunk_size=10000
    )
@d33bs d33bs added the bug Something isn't working label Aug 22, 2023
@d33bs d33bs self-assigned this Aug 22, 2023
@d33bs d33bs moved this to In Progress in SET Projects Aug 22, 2023
d33bs added a commit to d33bs/CytoTable that referenced this issue Aug 22, 2023
@d33bs
Copy link
Member

d33bs commented Aug 22, 2023

Thanks @jenna-tomkinson for raising this issue! I was able to reproduce your findings. My analysis found there may be a bug with Parsl (or perhaps how Parsl is implemented by CytoTable). As a result I raised the following issue with the Parsl project to resolve the bug or gather more information on how best to use the tool in this case: Parsl/parsl#2871 .

As a workaround for the time being, I recommend creating new Parsl configurations when implementing CytoTable in a loop. For example:

import pathlib

import parsl
from parsl.config import Config
from parsl.executors import HighThroughputExecutor

from cytotable import convert

# other work here ...

for plate_folder in sqlite_dir.iterdir():
    output_path = pathlib.Path(f"{output_dir}/{plate_folder.stem}_converted.parquet")
    # merge single cells and output as parquet file
    convert(
        source_path=str(plate_folder),
        dest_path=str(output_path),
        dest_datatype=dest_datatype,
        metadata=["image"],
        compartments=["nuclei"],
        identifying_columns=["ImageNumber"],
        joins=preset_join,
        parsl_config=Config(
            executors=[HighThroughputExecutor()],
        ),
        chunk_size=10000,
    )

d33bs added a commit to d33bs/CytoTable that referenced this issue Aug 24, 2023
d33bs added a commit that referenced this issue Aug 30, 2023
* detect parsl load and reuse existing config

seeking to address #90

* attempt gc

* release unused for arrow test

* modify comment about warning

* add note to docstring

* isort linting

* raise other runtimeerrors

* Apply suggestions from code review

Co-authored-by: Gregory Way <[email protected]>

---------

Co-authored-by: Gregory Way <[email protected]>
@d33bs
Copy link
Member

d33bs commented Sep 8, 2023

Thanks again @jenna-tomkinson for raising this issue. I'm marking this as closed with the recent additions from #93 to address this issue. Moving forward, when using CytoTable in a loop it will use the first configuration passed to it, emitting a warning message when reusing an existing configuration to make sure this is observable. I recommend forming any custom parsl_config outside of for-loops so as to not recreate objects that won't be used past the first loop iteration. Please don't hesitate to let me know if this is still causing issues and we can re-open or create new issues as needed.

@d33bs d33bs closed this as completed Sep 8, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in SET Projects Sep 8, 2023
@sbamin
Copy link

sbamin commented Sep 19, 2023

Hi @d33bs , following a fix for #102 , I have updated CytoTable and trying to run a tutorial

from cytotable import convert

# using a local path with cellprofiler csv presets
convert(
    source_path="./tests/data/cellprofiler/ExampleHuman",
    source_datatype="csv",
    dest_path="ExampleHuman.parquet",
    dest_datatype="parquet",
    preset="cellprofiler_csv",
)

and getting this error, likely related to current issue #90.

Traceback (most recent call last):
  File "/vast/palmer/scratch/verhaak/foo/tmp/CytoTable/cytotable_test.py", line 5, in <module>
    convert(
  File "/usr/local/lib/python3.10/site-packages/cytotable/convert.py", line 1374, in convert
    if not _parsl_loaded():
  File "/usr/local/lib/python3.10/site-packages/cytotable/utils.py", line 109, in _parsl_loaded
    parsl.dfk()
  File "/usr/local/lib/python3.10/site-packages/parsl/dataflow/dflow.py", line 1441, in dfk
    raise ConfigurationError('Must first load config')
parsl.errors.ConfigurationError: Must first load config

Is there a specific ? parsl-compliant config I needed to pass for plate level csv files I have on my local disk?

@d33bs
Copy link
Member

d33bs commented Sep 19, 2023

Hi @sbamin, thanks for reaching out about this! This is occurring due to release 2023.9.18 for Parsl which was published yesterday and related merge Parsl/parsl#2878 . Dependencies in CytoTable currently automatically attempt to install the "caret" version from Parsl ^2023.9.05 (here). Currently, CytoTable is checking for a Parsl Exception which now no longer occurs in the latest Parsl release.

As a workaround, you could pin your Parsl installation to <=2023.9.11 where CytoTable is installed. This will enable you to move forward now.

As a fix for CytoTable, I'll create a PR for this to avoid the unexpected exception with this new version of Parsl.

Related: #94

@sbamin
Copy link

sbamin commented Sep 19, 2023

Thanks and downgrading parsl to <=2023.9.11 worked. Side note: error persisted when I rebuild pkg per pull request change on line 11.

@d33bs
Copy link
Member

d33bs commented Sep 19, 2023

Thank you for confirming and the follow up @sbamin ! I'm unsure, but wonder if the persisted error you mentioned may have come from environment Parsl version discrepancies with the new changes (#109 will now require Parsl version >=2023.09.18). On fetching the updates from #109 I recommend performing a reinstall of dependencies based on the updated pyproject.toml file. Also, could you help us understand if line 11 is a reference to CytoTable code in a specific module?

@sbamin
Copy link

sbamin commented Sep 19, 2023

My mistake! I did some quick and dirty sed based replacement of line 114 from str(rte) to str(pce) but did not change other files (forgot to look further in PR that other files may have changed too!).

Anyways, for now, I have apptainer image based on older version of parsl and that is able to run example convert code. I will update to new release once PR is merged.

Which line 11 you are referencing too?

Bootstrap: docker
From: python:3.10.11
Stage: build

%files
	/vast/palmer/scratch/tmp/CytoTable /opt/apps/CytoTable

%environment
    export my_appimg="cytotable_0.0.1p1"

%labels
    Maintainer sbamin
    Image CytoTable
    Version v0.0.1p1
    Source https://github.com/cytomining/CytoTable

%post
	cd /opt/apps/CytoTable
	pip install .
	my_build_date="$(date)"
	echo "export my_build_date=\"${my_build_date}\"" >> "${APPTAINER_ENVIRONMENT}"

%test
    echo 'Looking for CytoTable...'
    ( pip freeze | grep -Ei 'cytotable' ) && echo 'Success! Installed CytoTable' || echo "ERROR: Could not locate CytoTable in the container."

%help
    CytoTable Apptainer container. To start: singularity run cytotable_0.0.1p1.sif
    See https://cytomining.github.io/CytoTable/

@d33bs
Copy link
Member

d33bs commented Sep 19, 2023

Thank you for the clarification @sbamin ! When convenient, please update to the newly merged work on main to resolve the issue you mentioned and don't hesitate to let us know if you find any barriers to accomplishing your work via CytoTable. Please disregard my mention of line 11, I think the dependency mismatch was the primary cause of the challenges you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants