Reduce number of 3rd party packages required for a prediction-only setup #594

a-recknagel · 2023-01-11T22:33:50Z

My use case is that I'm running a trained causalml model in a server. I'm done with analysis, hyperopt, visualization, ... all that isn't necessary any more. So I pickled my model and moved it to a designated production environment which I configured in a way that it can unpickle the model and run predictions on it.

But the way causalml is set up, many of those "non-core" packages that deal with training and analysis are still hard runtime-dependencies, even if I were to install causalm with --no-deps (as suggested here #250 (comment), which I'd really like to avoid). Just to show an example, the model I'm using is causalml.inference.tree.causal.causalforest.CausalRandomForestRegressor, and in causalml.inference.tree.__init__.py all of the local modules are imported as well (e.g. causalml.inference.tree.plot, leading to a number of the 3rd part imports that I have an issue with, like seaborn, matplotlib, pydotplus, ...).

Would it be possible to separate every dependency that isn't necessary to run predictions into extras? Or at least, restructure the code in a way where a manual install of the actual runtime-dependencies won't lead to unrelated 3rd party package imports? I realize this is a massive ask, but it's a serious problem for me that I can't solve without forking your project and run my own builds (which I'd really, really like to avoid).

Just to give an idea of why it's an issue:

base/Dockerfile

# need a builder since no wheels are released to pypi, except for a single pyhton3.8 mac build?
FROM python:3.10-slim as builder

RUN apt-get update && \
    apt-get -y install build-essential
RUN pip install setuptools>=18.0 wheel cython numpy "scikit-learn<=1.0.2"
RUN pip install causalml --no-deps
RUN pip wheel -w wheels causalml --no-deps

FROM python:3.10-slim

COPY --from=builder wheels wheels
RUN pip install "scikit-learn<=1.0.2" packaging forestci tqdm pathos && \ 
    pip install wheels/causalml* --no-deps

This image contains the core set of 3rd party packages necessary to predict with a CausalRandomForestRegressor. I didn't investigate what other models would need, but numerical computation libraries don't have a massive disk footprint any way -- the whole image is 507MB big, which is reasonable for a simple ML backend.

actual/Dockerfile

FROM python:3.10-slim as builder

RUN apt-get update && \
    apt-get -y install build-essential
RUN pip install setuptools>=18.0 wheel cython numpy "scikit-learn<=1.0.2"
RUN pip install causalml --no-deps
RUN pip wheel -w wheels causalml --no-deps

FROM python:3.10-slim

COPY --from=builder wheels wheels
RUN pip install wheels/causalml*

This is the whole package, and visualization libs do tend to eat up a fair share of disk space. Plus torch. The image clocks in at 6.54GB, so a difference of ~6GB which I do not need.

My CI/CD straight up refuses to run this build for me because it doesn't support artifacts of this size. I didn't even know that could happen.

The text was updated successfully, but these errors were encountered:

a-recknagel · 2023-01-11T22:37:50Z

I couldn't find similar issues in the tracker, apologies if I just missed them. In case I didn't I'd be surprised though, am I actually the first user who has this issue? Is dockerizing / running causalml in a server a strange thing to do?

Regarding PRs, I might be able to write one, but wouldn't start unless the issue itself is green-flagged by the maintainers.

jeongyoonlee · 2023-01-20T18:53:10Z

Thanks for submitting this, @a-recknagel. Addressing this will help many others who'd like to deploy the causalml models. Can you take a stab at it?

A couple of things I can think of are:

Removing plot.* from all __init__.py
Making pytorch an optional dependency similar to make tensorflow dependency optional #343

a-recknagel · 2023-01-22T22:31:15Z

Ok, that's good to know, I'd love to try. I hope to keep the changes to these two domains, changing import paths and writing extra groups, but either of these I'd consider a breaking change. Not that that'll stop me, and the project is still in zero_ver so it won't matter much, but I guess I want to ask how careful I should be. Should I read up on custom importer overloads to try and keep existing import paths working, or would that be a wasted effort?

Also, I'll probably touch most files in the project due to moving folders. Are there any particular WIPs or branches that I should consider or wait for before starting? The merge conflicts would be spectacularly bad.

jeongyoonlee · 2024-10-01T16:56:13Z

Hi @a-recknagel, In the latest v0.15.2 release, we made the torch optional. Can you check if this change addressed the issue?

a-recknagel · 2024-10-01T17:12:26Z

Will do, thanks for the update.

a-recknagel added the enhancement New feature or request label Jan 11, 2023

a-recknagel changed the title ~~Reduce number of dependencies for a prediction-only setup~~ Reduce number of 3rd party packages required for a prediction-only setup Jan 12, 2023

demetd mentioned this issue Sep 3, 2024

Make torch an optional dependency #789

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce number of 3rd party packages required for a prediction-only setup #594

Reduce number of 3rd party packages required for a prediction-only setup #594

a-recknagel commented Jan 11, 2023 •

edited

Loading

a-recknagel commented Jan 11, 2023

jeongyoonlee commented Jan 20, 2023

a-recknagel commented Jan 22, 2023 •

edited

Loading

jeongyoonlee commented Oct 1, 2024

a-recknagel commented Oct 1, 2024

Reduce number of 3rd party packages required for a prediction-only setup #594

Reduce number of 3rd party packages required for a prediction-only setup #594

Comments

a-recknagel commented Jan 11, 2023 • edited Loading

a-recknagel commented Jan 11, 2023

jeongyoonlee commented Jan 20, 2023

a-recknagel commented Jan 22, 2023 • edited Loading

jeongyoonlee commented Oct 1, 2024

a-recknagel commented Oct 1, 2024

a-recknagel commented Jan 11, 2023 •

edited

Loading

a-recknagel commented Jan 22, 2023 •

edited

Loading