-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
databricksruntime/python:12.2-LTS: installed packages do no match the ones documented in the databricks runtime #150
Comments
This is by design. DCS is not a DBR replicate, it provides a way to customize container environment, hence we only install required packages for running DBR and users can install other packages in their own customization layer on top if they want. |
This approach causes issues like #148 of #131, as the packages installed on DCS appear untested. Users of DCS need to: If 1 to 3 were in the design scope, that would reduce the amount of work placed on the customers of Databricks. |
To best serve all above needs, we only provide minimal set of packages in DCS so that in most cases, customer won't have concerns as long as they don't touch the pre-installed packages in DCS. DCS provides flexibility for customers to customize the environment and we are responsible for providing a start point (the example images) for customer to start their customization. However, customers are always responsible for testing their customization, because even adding a new package can introduce upgrade on another package and may cause conflicts. (In your case, if you just add the package of the same version from DBR, yes, you don't need to test that part) |
I would also like to consider the following case
The screenshot attached to the initial issue description shows otherwise: customers must reinstall some packages in an DCS image in order for the packages to match the versions in DBR.
I would like to avoid those risks. I would expect the packages in the DCS base image to correspond exactly to the versions in DBR, so customers don't need to attempt overriding them to match the versions present in DBR, as this may introduce or leave in place other, incompatible package versions as dependencies. Some suggestions for improving the quality of the images produced in this repo:
|
I think I didn't express clearly enough. I thought you mentioned two issues:
|
I'll link here also an older issue about versions mismatch between DCS and DBR on LTS #87, to be closed after a general fix is implemented.
If the design is open for improvements, then yes, I think point 2. above would be an optimal starting point for the DCS images. I understand it is a lot of work, and may even be infeasible unless all versions of packages match, by using |
I think I see what you mean and I agree to some extent that having the flexibility is great. However, I think it would be very useful to have the option for an image that matches DBR. You have a minimal image, then one that installs python. Why not also have one that is a match to the applicable DBR? The benefit would be that it'd be easy for customers to also grab the requirements file, make whatever adjustments they need, and create their own version from the base python image. I also find it confusing that the repository is named "databricksruntime" but it doesn't actually match the DBR. I know it's too late to change that. Just confusing. |
I'm comparing the packages installed on
databricksruntime/python:12.2-LTS
with the list of packages at https://docs.databricks.com/en/release-notes/runtime/12.2lts.html extracted asruntime.txt
I'm not providing the full diff, but it's visible on the screenshot that some package versions do not match, and the container image contains a smaller number of packages compared to the official runtime documentation.
The text was updated successfully, but these errors were encountered: