Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pydra tasks, workflow and update CLI #57

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

maestroque
Copy link
Contributor

Closes #

Proposed Changes

  • Create two pydra tasks, one to compute metrics and one to export metrics
  • Integrate them in a single pydra workflow, utilizing pydra tasks from physutils
  • Inspiration is taken from the current CLI implementation, aiming to replace and enhance it

Change Type

  • bugfix (+0.0.1)
  • minor (+0.1.0)
  • major (+1.0.0)
  • refactoring (no version update)
  • test (no version update)
  • infrastructure (no version update)
  • documentation (no version update)
  • other

Checklist before review

  • I added everything I wanted to add to this PR.
  • [Code or tests only] I wrote/updated the necessary docstrings.
  • [Code or tests only] I ran and passed tests locally.
  • [Documentation only] I built the docs locally.
  • My contribution is harmonious with the rest of the code: I'm not introducing repetitions.
  • My code respects the adopted style, especially linting conventions.
  • The title of this PR is explanatory on its own, enough to be understood as part of a changelog.
  • I added or indicated the right labels.
  • I added information regarding the timeline of completion for this PR.
  • Please, comment on my PR while it's a draft and give me feedback on the development!

@maestroque maestroque requested review from smoia, m-miedema and me-pic July 23, 2024 10:46
@maestroque maestroque self-assigned this Jul 23, 2024
@github-actions github-actions bot added Testing This is for testing features, writing tests or producing testing code. Internal Changes affect the internal API. It doesn't increase the version, but produces a changelog labels Jul 23, 2024
@maestroque
Copy link
Contributor Author

Note that the PR is currently stemming from integrate-physutils

@maestroque maestroque added Minormod-breaking For development only, this PR increments the minor version (0.+1.0) but breaks compatibility and removed Internal Changes affect the internal API. It doesn't increase the version, but produces a changelog labels Jul 23, 2024
@maestroque
Copy link
Contributor Author

Also I would like you to enlighten me on retroicor if possible. In the current workflow implementation, the metrics are exported as such:

for metric in metrics:
        if metric == "retroicor_card":
            args = select_input_args(retroicor, kwargs)
            args["card"] = True
            retroicor_regrs = retroicor(physio, **args)
            for vslice in range(len(args["slice_timings"])):
                for harm in range(args["n_harm"]):
                    key = f"rcor-card_s-{vslice}_hrm-{harm}"
                    regr[f"{key}_cos"] = retroicor_regrs[vslice][:, harm * 2]
                    regr[f"{key}_sin"] = retroicor_regrs[vslice][:, harm * 2 + 1]
        elif metric == "retroicor_resp":
            # etc. etc.

Shall I keep it this way or is more research needed? I am not familiar with how retroicor is used.
@m-miedema @me-pic @smoia

@m-miedema
Copy link
Member

Also I would like you to enlighten me on retroicor if possible. In the current workflow implementation, the metrics are exported as such:

for metric in metrics:
        if metric == "retroicor_card":
            args = select_input_args(retroicor, kwargs)
            args["card"] = True
            retroicor_regrs = retroicor(physio, **args)
            for vslice in range(len(args["slice_timings"])):
                for harm in range(args["n_harm"]):
                    key = f"rcor-card_s-{vslice}_hrm-{harm}"
                    regr[f"{key}_cos"] = retroicor_regrs[vslice][:, harm * 2]
                    regr[f"{key}_sin"] = retroicor_regrs[vslice][:, harm * 2 + 1]
        elif metric == "retroicor_resp":
            # etc. etc.

Shall I keep it this way or is more research needed? I am not familiar with how retroicor is used. @m-miedema @me-pic @smoia

Keep it this way for now -- the handling of these derivatives is something we'll need to improve but we can circle back after the workflow is in place!

@github-actions github-actions bot added the Internal Changes affect the internal API. It doesn't increase the version, but produces a changelog label Aug 25, 2024
@maestroque maestroque removed the Internal Changes affect the internal API. It doesn't increase the version, but produces a changelog label Aug 25, 2024
Copy link

codecov bot commented Aug 25, 2024

Codecov Report

Attention: Patch coverage is 0% with 48 lines in your changes missing coverage. Please review.

Project coverage is 49.84%. Comparing base (f074cad) to head (d150e76).
Report is 2 commits behind head on master.

Files Patch % Lines
phys2denoise/tasks.py 0.00% 48 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #57      +/-   ##
==========================================
- Coverage   53.85%   49.84%   -4.02%     
==========================================
  Files           8        9       +1     
  Lines         596      644      +48     
==========================================
  Hits          321      321              
- Misses        275      323      +48     
Files Coverage Δ
phys2denoise/workflow.py 0.00% <ø> (ø)
phys2denoise/tasks.py 0.00% <0.00%> (ø)

@maestroque
Copy link
Contributor Author

I have discovered a discrepancy about using loguru within the pydra tasks. E.g. when running this

@pydra.mark.task
def compute_metrics(phys, metrics):
    if isinstance(metrics, list) or isinstance(metrics, str):
        for metric in metrics:
            if metric not in _available_metrics:
                # print(f"Metric {metric} not available. Skipping")
                logger.warning(f"Metric {metric} not available. Skipping")
                continue

            args = select_input_args(metric, {})
            phys = globals()[metric](phys, **args)
            logger.info(f"Computed {metric}")
    return phys

When including the loguru logs, when defining the task as in task2 = compute_metrics(phys=fake_physio, metrics=["respiratory_variance"]), it throws the following error:

>       task2 = compute_metrics(phys=fake_physio, metrics=["respiratory_variance"])

phys2denoise/tests/test_tasks.py:15:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pydra-p2d/lib/python3.10/site-packages/pydra/mark/functions.py:47: in decorate
    return FunctionTask(func=func, **kwargs)
pydra-p2d/lib/python3.10/site-packages/pydra/engine/task.py:146: in __init__
    fields.append(("_func", attr.ib(default=cp.dumps(func), type=bytes)))
pydra-p2d/lib/python3.10/site-packages/cloudpickle/cloudpickle.py:1479: in dumps
    cp.dump(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <cloudpickle.cloudpickle.Pickler object at 0x7fe5ea17e140>, obj = <function compute_metrics at 0x7fe5e9fb3910>

    def dump(self, obj):
        try:
>           return super().dump(obj)
E           TypeError: cannot pickle 'EncodedFile' object

pydra-p2d/lib/python3.10/site-packages/cloudpickle/cloudpickle.py:1245: TypeError

While when not including such calls it is not raised.
That might have to do with something related to the parallel handling of tasks within pydra. So we could either not use logs at all within tasks (which tbh is not a viable workaround imo), or delve deeper. I'm trying to find online resources about pydra + loguru but there is no luck up to this point.

The good thing is that this is only specific to loguru as it seems, because stdlib logging calls seem to work. I'm planning to work with those for now and see how we'll move on.
@smoia @me-pic @m-miedema

@github-actions github-actions bot added the Internal Changes affect the internal API. It doesn't increase the version, but produces a changelog label Aug 25, 2024
@maestroque maestroque changed the title WIP: Add pydra tasks and workflow Add pydra tasks, workflow and update CLI Aug 28, 2024
@m-miedema
Copy link
Member

m-miedema commented Aug 29, 2024

Currently needed to manually install nest_asyncio for testing, but after that the tests pass with the exception of the caplog logging assertion (this is also true for me in physiopy/physutils#7). Still taking a closer look at the rest, will have a full review for you tomorrow!

phys2denoise/workflow.py Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can help with setting things up, but it would be FAR better if we upload all the data in our OSF repository and remove it from here.
Do you need help with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I just used the files from peakdet, because in the tests there these files are also uploaded. I mostly use the fake_physio.phys for the tests here which is created on the spot using a util function. I can delete ECG.csv if you want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transfer the main workflow in workflow.py as that's the entry point of the CLI. This file should only contain the parser

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @smoia

@m-miedema
Copy link
Member

@m-miedema what failed tests are you getting? Everything should pass now

I am getting the attached failure:
image

Let me know if there's something you can point me to that's wrong on my end!

@maestroque
Copy link
Contributor Author

@m-miedema what failed tests are you getting? Everything should pass now

I am getting the attached failure: image

Let me know if there's something you can point me to that's wrong on my end!

@m-miedema It seems that the code is the same version, and it passes locally I cannot recreate it. Also we cannot see the CI yet before physiopy/physutils#7 merges and releases

Could you try to add the following to test/__init__.py to see if this fixes it?

from loguru import logger
import sys

logger.add(sys.stderr)

@me-pic
Copy link
Contributor

me-pic commented Sep 3, 2024

@m-miedema can't replicate the error you are getting either

@me-pic
Copy link
Contributor

me-pic commented Sep 3, 2024

@maestroque Not sure if that should even be addressed in this PR, but in the chest_belt.py script, we are using the np.math function which I believe is deprecated and might eventually cause some issue eventually...

Copy link
Contributor

@me-pic me-pic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maestroque Thank for your work on that PR ! Good job overall 🎉 🎉

Would it be possible to add some tests to cover the CLI ? If you need some examples, you could refer to the tests in the giga_connectome package.

@maestroque
Copy link
Contributor Author

@maestroque Not sure if that should even be addressed in this PR, but in the chest_belt.py script, we are using the np.math function which I believe is deprecated and might eventually cause some issue eventually...

Yes, you are right! There is this open issue about numpy v2 compatibility opened for that #62

@maestroque
Copy link
Contributor Author

@maestroque Thank for your work on that PR ! Good job overall 🎉 🎉

Would it be possible to add some tests to cover the CLI ? If you need some examples, you could refer to the tests in the giga_connectome package.

Sure, on it

Comment on lines 258 to 273
# def phys2denoise(
# filename,
# outdir=".",
# metrics=[
# crf,
# respiratory_pattern_variability,
# respiratory_variance,
# respiratory_variance_time,
# rrf,
# "retroicor_card",
# "retroicor_resp",
# ],
# debug=False,
# quiet=False,
# **kwargs,
# ):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete what is commented ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yeah I thought I did

@m-miedema
Copy link
Member

@m-miedema what failed tests are you getting? Everything should pass now

I am getting the attached failure: image
Let me know if there's something you can point me to that's wrong on my end!

@m-miedema It seems that the code is the same version, and it passes locally I cannot recreate it. Also we cannot see the CI yet before physiopy/physutils#7 merges and releases

Could you try to add the following to test/__init__.py to see if this fixes it?

from loguru import logger
import sys

logger.add(sys.stderr)

This does not fix it - I'm not sure what's going on but since you and @me-pic are not able to replicate it, I continued on to test the rest of the CLI (see my next comment).

@m-miedema
Copy link
Member

m-miedema commented Sep 3, 2024

I'm having trouble using the CLI to calculate metrics when using a .phys object as my input. For example, I thought I'd run a quick test using the physio objects generated in the OHBM tutorial here but the CLI returns e.g. "Metric rv not computed. Skipping" without more useful information (even when I run in --debug mode - which as a side-note, doesn't actually change the output for me). Has anyone else successfully output metrics in this case?

@maestroque
Copy link
Contributor Author

maestroque commented Sep 3, 2024

I'm having trouble using the CLI to calculate metrics when using a .phys object as my input. For example, I thought I'd run a quick test using the physio objects generated in the OHBM tutorial here but the CLI returns e.g. "Metric rv not computed. Skipping" without more useful information (even when I run in --debug mode - which as a side-note, doesn't actually change the output for me). Has anyone else successfully output metrics in this case?

@m-miedema I need you to provide the precise logs you get in order to understand the problem. I haven't had this issue personally

@m-miedema
Copy link
Member

m-miedema commented Sep 4, 2024

I'm having trouble using the CLI to calculate metrics when using a .phys object as my input. For example, I thought I'd run a quick test using the physio objects generated in the OHBM tutorial here but the CLI returns e.g. "Metric rv not computed. Skipping" without more useful information (even when I run in --debug mode - which as a side-note, doesn't actually change the output for me). Has anyone else successfully output metrics in this case?

@m-miedema I need you to provide the precise logs you get in order to understand the problem. I haven't had this issue personally

Certainly! If you've been able to get the CLI to run on a Physio object with peaks/troughs, could you share the object and the call with me and I can try it? So far I've tried a few different ways, but in general this is the type of call and output:

phys2denoise -in '.\sub-007_ses-05_task-rest_run-01_resp_peaks.phys' -e 'rv' -nscans 400 -tr 1.5 -sr 40 -lags 0 -win 6 --debug -out .

2024-09-04 09:32:47.905 | INFO     | phys2denoise.workflow:phys2denoise:208 - Running phys2denoise version: 0+untagged.276.gabd4c93.dirty
2024-09-04 09:32:47.914 | DEBUG    | phys2denoise.workflow:phys2denoise:233 - Metrics: []
2024-09-04 09:32:47.914 | DEBUG    | phys2denoise.workflow:phys2denoise:233 - Metrics: []
2024-09-04 09:32:48.866 | DEBUG    | physutils.tasks:wrapped_func:27 - Creating pydra task for transform_to_physio
2024-09-04 09:32:48.866 | DEBUG    | physutils.tasks:wrapped_func:27 - Creating pydra task for transform_to_physio
2024-09-04 09:32:51.227 | DEBUG    | physutils.io:load_physio:185 - Instantiating Physio object from a file
2024-09-04 09:32:51.227 | DEBUG    | physutils.physio:__init__:293 - Initializing new Physio object
Metric rv not computed. Skipping

@m-miedema
Copy link
Member

m-miedema commented Sep 4, 2024

One thing I think we should strongly consider as a future direction for the CLI is to set up a heuristic file with more metric specific parameters. For example, here we can calculate different metrics, but not specific different window sizes for each in the same call. I'm putting this comment along with a new issue here not to lose track of it - if others think this is a useful idea I can follow up in the future :)

@m-miedema
Copy link
Member

m-miedema commented Sep 4, 2024

I'm having trouble using the CLI to calculate metrics when using a .phys object as my input. For example, I thought I'd run a quick test using the physio objects generated in the OHBM tutorial here but the CLI returns e.g. "Metric rv not computed. Skipping" without more useful information (even when I run in --debug mode - which as a side-note, doesn't actually change the output for me). Has anyone else successfully output metrics in this case?

@m-miedema I need you to provide the precise logs you get in order to understand the problem. I haven't had this issue personally

Certainly! If you've been able to get the CLI to run on a Physio object with peaks/troughs, could you share the object and the call with me and I can try it? So far I've tried a few different ways, but in general this is the type of call and output:

phys2denoise -in '.\sub-007_ses-05_task-rest_run-01_resp_peaks.phys' -e 'rv' -nscans 400 -tr 1.5 -sr 40 -lags 0 -win 6 --debug -out .

2024-09-04 09:32:47.905 | INFO     | phys2denoise.workflow:phys2denoise:208 - Running phys2denoise version: 0+untagged.276.gabd4c93.dirty
2024-09-04 09:32:47.914 | DEBUG    | phys2denoise.workflow:phys2denoise:233 - Metrics: []
2024-09-04 09:32:47.914 | DEBUG    | phys2denoise.workflow:phys2denoise:233 - Metrics: []
2024-09-04 09:32:48.866 | DEBUG    | physutils.tasks:wrapped_func:27 - Creating pydra task for transform_to_physio
2024-09-04 09:32:48.866 | DEBUG    | physutils.tasks:wrapped_func:27 - Creating pydra task for transform_to_physio
2024-09-04 09:32:51.227 | DEBUG    | physutils.io:load_physio:185 - Instantiating Physio object from a file
2024-09-04 09:32:51.227 | DEBUG    | physutils.physio:__init__:293 - Initializing new Physio object
Metric rv not computed. Skipping

@maestroque I think it would be very helpful to make the "Metric rv not computed. Skipping" message slightly more verbose so that the user knows it is stemming from the export argument, rather than the computational argument. As it stands it's quite confusing! E.g. "Metric X not computed, skipping the export of metric X." or even better to throw a warning when a metric is provided as an export argument but not a computational argument.

@m-miedema
Copy link
Member

As well, I opened a new issue to address this, but I'm finding that the number of time points in the exported resampled metric files don't match the -nscans argument in the CLI. I think users would expect this to be the case, so it's something we should dig into another time.

Copy link
Member

@m-miedema m-miedema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking forward to seeing more documentation and to resolving some of the logging integration with pydra (I will open an issue if I'm still having logging-related failures in physutils and phys2denoise local testing following the merge). Please be sure to provide an example of running the CLI and the expected outputs in the documentation, including logs. Other than my minor point about the calculated vs. exported metric message, I won't suggest any other changes to address at this point. Thanks for all your hard work @maestroque !

@maestroque
Copy link
Contributor Author

Cleaned up, this should be ready to merge once physutils is

@m-miedema
Copy link
Member

@maestroque thanks for updating this one!

wf.set_output([("result", wf.compute_metrics.lzout.out)])

with Submitter(plugin="cf") as sub:
sub(wf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is failing here. I'm getting the following error:
File "/home/user/Documents/physio/phys2denoise/phys2denoise/metrics/chest_belt.py", line 288, in respiratory_variance data = physio.check_physio(data, ensure_fs=True, copy=True) File "/home/user/Documents/physio/phys2denoise/env/lib/python3.9/site-packages/physutils/physio.py", line 149, in check_physio if ensure_fs and np.isnan(data.fs): TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I've tried to investigate a bit more, and turn out that the value of data.fs when it's crashing is NOTHING:

(Pdb) data
Physio(size=18750, fs=_Nothing.NOTHING)
(Pdb) data.fs
NOTHING
(Pdb) type(data.fs)
<enum '_Nothing'>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this issue, however afaik it was fixed. Can you try replacing the isnan check with pandas.isna()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @maestroque for your comment ! pandas.isna() is indeed able to evaluate the NOTHING value rather than just throwing an error. However, pandas.isna(NOTHING) returns False so the value error is not raised.. That causes other problem afterward, for example when instantiating the Physio object:

File "/home/user/Documents/physio/phys2denoise/env/lib/python3.9/site-packages/physutils/physio.py", line 305, in __init__
    self._fs = np.float64(fs)
TypeError: float() argument must be a string or a number, not '_Nothing'

Copy link
Contributor

@me-pic me-pic Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up question: Are we expecting the value of data.fs to be NOTHING. Just wondering if the problem might comes from something before e.g. Workflow or generate_physio are called.

Copy link
Contributor

@me-pic me-pic Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error seems to come from generate_physio in physutils. Specifically, in that function, when load_physio is called, it specifies fs=fs, which is unecessary and causes issue. Will open an issue + PR in physutils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests are passing with the changes made in physutils PR #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internal Changes affect the internal API. It doesn't increase the version, but produces a changelog Minormod-breaking For development only, this PR increments the minor version (0.+1.0) but breaks compatibility Testing This is for testing features, writing tests or producing testing code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants