Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pandas and numpy >= 2 compatibility #287

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mkopec87
Copy link

@mkopec87 mkopec87 commented Dec 6, 2024

  • align requirements.txt with pyproject.toml
  • remove calls to np.string_ not existing in numpy >= 2.0.0
  • remove calls to pd._testing.makeMixedDataFrame not existing in new pandas versions
  • fix install and test commands in documentation for developers
  • replace np.mean with column-wise version
  • drop pandas dependency constraint <2
  • require Python 3.9 in pyproject.toml
  • add PySpark 3.5.3 to test pipeline matrix
  • update test pipeline matrix: exclude Python 3.8, include Python 3.12

@mkopec87 mkopec87 force-pushed the feature/update-pandas branch 2 times, most recently from c32e4ad to b5f7f11 Compare December 16, 2024 10:02
@mkopec87
Copy link
Author

mkopec87 commented Dec 16, 2024

Still 5 failing tests :(

FAILED tests/popmon/notebooks/test_notebooks.py::test_notebook_advanced - nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
FAILED tests/popmon/pipeline/test_metrics.py::test_hists_stability_metrics - pandas.errors.DataError: Cannot aggregate non-numeric type: object
FAILED tests/popmon/pipeline/test_report.py::test_df_stability_report_external - TypeError: float() argument must be a string or a real number, not 'CategorizeHistogramMethods'
FAILED tests/popmon/pipeline/test_report.py::test_df_stability_report_rolling - pandas.errors.DataError: Cannot aggregate non-numeric type: object
FAILED tests/popmon/pipeline/test_report.py::test_df_stability_report_expanding - pandas.errors.DataError: Cannot aggregate non-numeric type: object

BTW. do we need 'requirements.txt' for anything?

@mkopec87 mkopec87 force-pushed the feature/update-pandas branch 3 times, most recently from ecd46d1 to 6e0e1e6 Compare December 16, 2024 11:25
- align requirements.txt with pyproject.toml
- remove calls to np.string_ not existing in numpy >= 2.0.0
- remove calls to pd._testing.makeMixedDataFrame not existing in new pandas versions
- fix install and test commands in documentation for developers
- replace np.mean with column-wise version
- drop pandas dependency constraint <2
- require Python 3.9 in pyproject.toml
- add PySpark 3.5.3 to test pipeline matrix
- update test pipeline matrix: exclude Python 3.8, include Python 3.12
- add test notebook output to .gitignore
- switch to importlib from pkg_resources
- install project dependencies after pyspark in spark build tests
@mkopec87 mkopec87 force-pushed the feature/update-pandas branch from 6e0e1e6 to cfc85d1 Compare December 16, 2024 17:59
@@ -233,7 +238,7 @@ def __init__(
:param kwargs: (dict, optional): residual kwargs passed on to mean and std functions
"""
super().__init__(
np.mean,
ReferencePullCalculator.mean,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functools.partial could be used here instead of staticmethod

@@ -13,6 +13,11 @@
from popmon.base import Pipeline


def mean(x):
""" "Column-wise mean version,"""
return np.mean(x, axis=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from functools import partial

partial(np.mean, axis=0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants