Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Fixups #997

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

CI Fixups #997

wants to merge 1 commit into from

Conversation

TomAugspurger
Copy link
Member

  • numpy 2 combat
  • changed error messgae

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Jul 20, 2024

There's still one error I haven't been able to fix:

================================================================================ test session starts ================================================================================
platform darwin -- Python 3.11.0, pytest-8.3.1, pluggy-1.5.0
rootdir: /Users/tom/gh/dask/dask-ml
configfile: pyproject.toml
plugins: cov-5.0.0, mock-3.14.0
collected 1 item

tests/test_incremental_pca.py F                                                                                                                                               [100%]

===================================================================================== FAILURES ======================================================================================
_______________________________________________________________________________ test_whitening[auto] ________________________________________________________________________________

svd_solver = 'auto'

    @pytest.mark.parametrize("svd_solver", ["full", "auto", "randomized"])
    @pytest.mark.filterwarnings("ignore:invalid value:RuntimeWarning")
    def test_whitening(svd_solver):
        # Test that PCA and IncrementalPCA transforms match to sign flip.
        X = datasets.make_low_rank_matrix(
            1000, 10, tail_strength=0.0, effective_rank=2, random_state=1999
        )
        X = da.from_array(X, chunks=[200, -1])
        prec = 3
        n_samples, n_features = X.shape
        for nc in [None, 9]:
            pca = PCA(whiten=True, n_components=nc, svd_solver=svd_solver).fit(X.compute())
            ipca = IncrementalPCA(
                whiten=True, n_components=nc, batch_size=250, svd_solver=svd_solver
            ).fit(X)

            Xt_pca = pca.transform(X)
            Xt_ipca = ipca.transform(X)
>           assert_almost_equal(np.abs(Xt_pca), np.abs(Xt_ipca), decimal=prec)

tests/test_incremental_pca.py:454:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../mambaforge/envs/python=3.11/lib/python3.11/contextlib.py:81: in inner
    return func(*args, **kwds)
../../../mambaforge/envs/python=3.11/lib/python3.11/contextlib.py:81: in inner
    return func(*args, **kwds)
.direnv/python-3.11/lib/python3.11/site-packages/numpy/_utils/__init__.py:85: in wrapper
    return fun(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (<function assert_array_almost_equal.<locals>.compare at 0x123d02160>, array([[3.46374689e-01, 6.42854227e-01, 1.28803...2.04242514e+05]]), dask.array<absolute, shape=(1000, 10), dtype=float64, chunksize=(200, 10), chunktype=numpy.ndarray>)
kwds = {'err_msg': '', 'header': 'Arrays are not almost equal to 3 decimals', 'precision': 3, 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError:
E           Arrays are not almost equal to 3 decimals
E
E           Mismatched elements: 1430 / 10000 (14.3%)
E           Max absolute difference among violations: 874440.31622524
E           Max relative difference among violations: 14845029.47333545
E            ACTUAL: array([[3.464e-01, 6.429e-01, 1.288e+00, ..., 8.527e-01, 4.654e-01,
E                   2.602e+05],
E                  [9.195e-02, 6.557e-01, 1.029e+00, ..., 8.861e-01, 3.697e-01,...
E            DESIRED: array([[0.346, 0.643, 1.288, ..., 0.853, 0.464, 1.238],
E                  [0.092, 0.656, 1.029, ..., 0.886, 0.369, 0.19 ],
E                  [0.092, 1.329, 1.784, ..., 0.104, 0.395, 0.606],...

../../../mambaforge/envs/python=3.11/lib/python3.11/contextlib.py:81: AssertionError
============================================================================== short test summary info ==============================================================================
FAILED tests/test_incremental_pca.py::test_whitening[auto] - AssertionError:
================================================================================= 1 failed in 0.42s =================================================================================


The only thing I've found so far are that the components_ are different when whiten=True

@TomAugspurger
Copy link
Member Author

cc @fujiisoup in case you have a chance to look (no worries if not)

@fujiisoup
Copy link
Contributor

Hi @TomAugspurger

Do you know when the test starts failing?
This PR does not seem relevant.

@fujiisoup
Copy link
Contributor

I tried an investigation, and seems like an upstream issue. Rose an issue (there)[https://github.com/scikit-learn/scikit-learn/issues/29534].

With numpy==2.0, it seems like that sklearn.decomposition.PCA is unstable, sometimes giving strange values.

@TomAugspurger
Copy link
Member Author

Thanks for looking into it. I've subscribed to the upstream issue in scikit-learn and will skip or adjust this test as needed with NumPy 2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants