Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a more robust test framework #20

Open
sd2k opened this issue Jun 5, 2023 · 2 comments
Open

Create a more robust test framework #20

sd2k opened this issue Jun 5, 2023 · 2 comments

Comments

@sd2k
Copy link
Collaborator

sd2k commented Jun 5, 2023

The current tests of algorithm implementations are pretty ad-hoc; I basically took some datasets from the original papers/notebooks, ran them in R/Python, and copied the expected values into our tests. It'd be much better if we had a way to automatically generate the test cases and results somehow.

We probably don't need to go as far as running the R/Python algorithms every time, but we should at least have a script or notebook to generate the expected test results so we can update it as required.

@shenxiangzhuang
Copy link

Hi @sd2k, thank you for sharing the awesome augurs library firstly! There is a sugguestion about this issue. I think we could test the rust implementation by comparing the python binding's output with the original python implementation. I use this method in my little project bleuscore and it works fine.

In short, it use hypothesis library to do property-like testing, which will generate many test cases automatically to test the equality of the two implementations.

@sd2k
Copy link
Collaborator Author

sd2k commented Sep 25, 2024

@shenxiangzhuang Thanks for the links, and sorry for the delay in getting back to you, I was away for quite a lot of the summer.

That is a nice idea yeah - we could do something similar and run those tests in CI to make sure we get matching results, at least for algorithms with matching Python implementations. Part of the problem is that some of the algorithms don't perfectly match the Python implementations so we'd need some kind of acceptable tolerance in each case. We could also provide a way to run benchmarks for augurs implementations vs Python implementations which might be a good way to convince people to actually use this library 😅

I'm not really sure what to do about R though. It's a much bigger effort so maybe we should just stick with comparison vs Python libraries for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants