Call _validate_steps in test_validate_steps #1307

danilobellini · 2023-07-15T17:50:30Z

This is a required step as Pipeline._validate_steps method is not called by scikit-learn during construction, but it's expected by tests. I tried running it with scikit-learn v1.2.2 and scikit-learn v1.3.0. I found that in commit 0110921 for v1.1.0 that validation was removed upstream from Pipeline.__init__.

I'm currently the maintainer of the yellowbrick's AUR package and I'm having some trouble to update the package to v1.5.0 because of tests. The mentioned test is one of the failing tests, I brought it here in this pull request because it was simple to fix.

I've called black and pytest for the resulting file.

This is a required step as that method is not called by scikit-learn during pipeline construction

lwgray · 2023-07-20T11:40:30Z

@bbengfort Will we need to upgrade to scikit v1.3.0. I worry about calling ._validate_steps()

The _validate_steps method in a scikit-learn Pipeline is a private method used to check whether the steps of the pipeline are defined correctly. In the pipeline, the steps should be structured such that all steps up to the final one should be transformers (i.e., they should have a fit and transform method), and the final step should be an estimator (i.e., it should have a fit method).

Calling _validate_steps() explicitly in your test cases will make sure that this validation is performed at the moment you define the pipeline, rather than later when you try to fit or transform data with the pipeline.

In @danilobellini 's code, adding _validate_steps() after the Pipeline or VisualPipeline instantiation will cause the validation to happen immediately. This means that if there's a problem with the steps (e.g., a non-transformer object in an intermediate step, or a non-estimator object as the final step), a TypeError will be raised immediately, rather than later on when you try to use the pipeline.

This could make the tests clearer and more direct, as he is specifically testing the validation of the pipeline steps, and it's useful to have that validation happen as explicitly and immediately as possible. However, I'm aware that _validate_steps is a private method (indicated by the leading underscore), which means that it's not part of the public API of the Pipeline class and could potentially change in future versions of scikit-learn. Using private methods can sometimes lead to less stable code, as they're not guaranteed to stay the same in the way that public methods are.

Call _validate_steps in pipeline tests

6fc2c3d

This is a required step as that method is not called by scikit-learn during pipeline construction

lwgray assigned danilobellini Jul 20, 2023

lwgray requested a review from bbengfort July 29, 2023 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call _validate_steps in test_validate_steps #1307

Call _validate_steps in test_validate_steps #1307

danilobellini commented Jul 15, 2023

lwgray commented Jul 20, 2023

Call _validate_steps in test_validate_steps #1307

Are you sure you want to change the base?

Call _validate_steps in test_validate_steps #1307

Conversation

danilobellini commented Jul 15, 2023

lwgray commented Jul 20, 2023