Add pandas and numpy >= 2 compatibility #287

mkopec87 · 2024-12-06T16:14:19Z

align requirements.txt with pyproject.toml
remove calls to np.string_ not existing in numpy >= 2.0.0
remove calls to pd._testing.makeMixedDataFrame not existing in new pandas versions
fix install and test commands in documentation for developers
replace np.mean with column-wise version
drop pandas dependency constraint <2
require Python 3.9 in pyproject.toml
add PySpark 3.5.3 to test pipeline matrix
update test pipeline matrix: exclude Python 3.8, include Python 3.12

mkopec87 · 2024-12-16T10:17:57Z

Still 5 failing tests :(

FAILED tests/popmon/notebooks/test_notebooks.py::test_notebook_advanced - nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
FAILED tests/popmon/pipeline/test_metrics.py::test_hists_stability_metrics - pandas.errors.DataError: Cannot aggregate non-numeric type: object
FAILED tests/popmon/pipeline/test_report.py::test_df_stability_report_external - TypeError: float() argument must be a string or a real number, not 'CategorizeHistogramMethods'
FAILED tests/popmon/pipeline/test_report.py::test_df_stability_report_rolling - pandas.errors.DataError: Cannot aggregate non-numeric type: object
FAILED tests/popmon/pipeline/test_report.py::test_df_stability_report_expanding - pandas.errors.DataError: Cannot aggregate non-numeric type: object

BTW. do we need 'requirements.txt' for anything?

- align requirements.txt with pyproject.toml - remove calls to np.string_ not existing in numpy >= 2.0.0 - remove calls to pd._testing.makeMixedDataFrame not existing in new pandas versions - fix install and test commands in documentation for developers - replace np.mean with column-wise version - drop pandas dependency constraint <2 - require Python 3.9 in pyproject.toml - add PySpark 3.5.3 to test pipeline matrix - update test pipeline matrix: exclude Python 3.8, include Python 3.12 - add test notebook output to .gitignore - switch to importlib from pkg_resources - install project dependencies after pyspark in spark build tests

sbrugman · 2024-12-20T10:13:45Z

popmon/analysis/profiling/pull_calculator.py

@@ -233,7 +238,7 @@ def __init__(
        :param kwargs: (dict, optional): residual kwargs passed on to mean and std functions
        """
        super().__init__(
-            np.mean,
+            ReferencePullCalculator.mean,


functools.partial could be used here instead of staticmethod

sbrugman · 2024-12-20T10:15:07Z

tests/popmon/analysis/profiling/test_apply_func.py

@@ -13,6 +13,11 @@
 from popmon.base import Pipeline


+def mean(x):
+    """ "Column-wise mean version,"""
+    return np.mean(x, axis=0)


from functools import partial partial(np.mean, axis=0)

mkopec87 mentioned this pull request Dec 6, 2024

Support Numpy and Pandas >= 2.0.0 #286

Open

mkopec87 force-pushed the feature/update-pandas branch 2 times, most recently from c32e4ad to b5f7f11 Compare December 16, 2024 10:02

mkopec87 force-pushed the feature/update-pandas branch 3 times, most recently from ecd46d1 to 6e0e1e6 Compare December 16, 2024 11:25

mkopec87 force-pushed the feature/update-pandas branch from 6e0e1e6 to cfc85d1 Compare December 16, 2024 17:59

sbrugman reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pandas and numpy >= 2 compatibility #287

Add pandas and numpy >= 2 compatibility #287

mkopec87 commented Dec 6, 2024 •

edited

Loading

mkopec87 commented Dec 16, 2024 •

edited

Loading

sbrugman Dec 20, 2024

sbrugman Dec 20, 2024

Add pandas and numpy >= 2 compatibility #287

Are you sure you want to change the base?

Add pandas and numpy >= 2 compatibility #287

Conversation

mkopec87 commented Dec 6, 2024 • edited Loading

mkopec87 commented Dec 16, 2024 • edited Loading

sbrugman Dec 20, 2024

Choose a reason for hiding this comment

sbrugman Dec 20, 2024

Choose a reason for hiding this comment

mkopec87 commented Dec 6, 2024 •

edited

Loading

mkopec87 commented Dec 16, 2024 •

edited

Loading