Improve the tests, like, a lot #109

Datseris · 2022-08-19T14:31:45Z

One of the things the Good Scientific Code Workshop teaches is writing good unit tests. I have to admit, this repository suffers tremendously from really bad tests when it comes to the delay embedding tests. Practically all tests test if the output of the functions matchjes the value of the output of the same functions in some pre-existing data. I am pasting here the slide with "good advice on writing tests":

Actually unit: test atomic, self-contained functions. Each test must test only one thing, the unit. When a test fails, it should pinpoint the location of the problem. Testing entire processing pipelines (a.k.a. integration tests) should be done only after units are covered, and only if resources/time allow for it!
Known output / Deterministic: tests defined through minimal examples that their result is known analytically are the best tests you can have! If random number generation is necessary, either test valid output range, or use seed for RNG
Robust: Test that the expected outcome is met, not the implementation details. Test that the target functionality is met without utilizing knowledge about the internals. Also, never use internal functions in the test suite.
High coverage: the more functionality of the code is tested, the better
Clean slate: each test file should be runnable by itself, and not rely on previous test files
Fast: use the minimal amount of computations to test what is necessary
Regression: Whenever a bug is fixed, a test is added for this case
Input variety: attempt to cover a wide gambit of input types

One doesn't have to worry about re-writing all tests. In fact, a PR "correcting" a single test file is already very much welcomed!

The best place to start is re-writting delay embedding tests so that they are more flexible (and not test whether the found delay time is e.g., exactly 42), and to be analytically resolvable, e.g., test things that we know for sure what the outcome should be, like a dataset with cosine and sine as the timeseries (the embedding dimension here is clearly 2). Also separate the files to be individually runnable and not rely on global state. ANother analytic test: get the Lorenz96 and generate timeseries with 4 and with 6 oscillators. We do not know analytically the fractal dimension of the 4 and 6 case, but we do know analytically that the 6 case has larger fractal dimension. Hence, our embedding must be higher dimensional in the 6 over the 4 case.

Datseris added help wanted Extra attention is needed good first issue Good for newcomers tests Related with the testing suite labels Aug 19, 2022

Datseris pinned this issue Aug 19, 2022

kahaaga mentioned this issue Oct 17, 2022

Hcat multiple datasets #113

Merged

Datseris mentioned this issue Feb 20, 2023

Next major release: Rework specification of recurrences + update to DynamicalSystems.jl v3 JuliaDynamics/RecurrenceAnalysis.jl#135

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the tests, like, a lot #109

Improve the tests, like, a lot #109

Datseris commented Aug 19, 2022 •

edited

Loading

Improve the tests, like, a lot #109

Improve the tests, like, a lot #109

Comments

Datseris commented Aug 19, 2022 • edited Loading

Datseris commented Aug 19, 2022 •

edited

Loading