You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fromfklearn.training.regressionimportlinear_regression_learnerfromfklearn.training.pipelineimportbuild_pipelinefromfklearn.training.imputationimportimputer# This example will try to create a linear regressor...defload_sample_data():
importpandasaspdfromsklearnimportdatasetsx,y=datasets.load_diabetes(as_frame=True, return_X_y=True)
returnpd.concat([x, y], axis=1)
# ...but pipelines can't be used as individual learnersnested_learner=build_pipeline(
build_pipeline(
imputer(columns_to_impute=["age"], impute_strategy='median'),
),
linear_regression_learner(features=["age"], target=["target"])
)
fn, df_result, logs=nested_learner(load_sample_data())
'''Output:ValueError: Predict function of learner pipeline contains variable length arguments: kwargsMake sure no predict function uses arguments like *args or **kwargs.'''
Problem description
This is a problem because (taken from the docs) "pipelines (should) behave exactly as individual learner functions". That is, pipelines should be consistent with L in SOLID, but that is not happening.
The main benefit of supporting nested pipelines is that you can produce more maintainable code as packing complex operations in one big step like:
That is, if a pipeline is a type of learner, it should be possible to put it in place of any other learner.
Possible solutions
A workaround is proposed in #145 , but it only works if you already have a DataFrame (which doesn't happen in this scenario). Would like to hear if someone has investigated this or knows what we need to change to support this, but in the meantime I propose adding a note to the docs to prevent someone else from trying to do this.
The text was updated successfully, but these errors were encountered:
Hi. I think this would be useful as well when you want to split data-preprocessing from the model. From what I understand the problem is that the predict_fn of the pipeline takes kwargs, determines where that kwarg belongs and assigns it to the corresponding function. So if you have two pipelines I think it gets hard to determine which pipeline the kwarg belongs to.
One possible solution I see to this is taking the approach that sklearn takes when passing parameters to the estimators in a pipeline:
Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p. source
So I think it would be possible to have nested pipelines and then calling the predict_fn with kwargs using this kind of prefixing, i.e. if we have the following:
pipe1=build_pipeline([f(a=1), g(a=2)])
pipe2=build_pipeline([f(a=2), g(a=3)])
pipe3=build_pipeline([pipe1, pipe2])
predict_fn, data, logs=pipe3(df)
# the following would set the argument a to 3 in the f function of the first pipelinepredict_fn(df2, pipeline0__f__a=3)
# the following would set the argument a to 2 in the g function of the second pipelinepredict_fn(df2, pipeline1__g__a=2)
We would of course have the possibility of setting custom names to each pipeline and function so the argument could be like preprocessing__scaler__column='something'
Code sample
This script is complete, it should run "as is"
Problem description
This is a problem because (taken from the docs) "pipelines (should) behave exactly as individual learner functions". That is, pipelines should be consistent with L in SOLID, but that is not happening.
The main benefit of supporting nested pipelines is that you can produce more maintainable code as packing complex operations in one big step like:
Is cleaner and more readable.
Expected behavior
The "Code sample" should produce the same output as if
nested_learner
was defined as:That is, if a pipeline is a type of learner, it should be possible to put it in place of any other learner.
Possible solutions
A workaround is proposed in #145 , but it only works if you already have a
DataFrame
(which doesn't happen in this scenario). Would like to hear if someone has investigated this or knows what we need to change to support this, but in the meantime I propose adding a note to the docs to prevent someone else from trying to do this.The text was updated successfully, but these errors were encountered: