-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate Python based anndata testfiles #170
Conversation
71a4291
to
b4d8dfd
Compare
I think we should still do round-trip tests, but the difference is that our current round-trip tests currently:
(the reverse round-trip is also tested) While this PR will enable tests:
What I'm currently wondering about is how to compare the original data in Python to the resulting data in R. Do we introduce a simple JSON format to store which assertions to make (i.e. let the python function write out which assertions R should be making to check whether the anndata was created correctly)? Or do we let the Python functions generate data in a predictable manner so that we can reproduce the same results in R? |
I agree.
After discussion we settled on generating the same dataset in both Python and R, and checking if there are differences when reading in both datasets, and reporting differences using h5diffs. |
For documentation purposes: the python anndata generator functions have moved to a separate package: dummy-anndata |
* re-enable matrices with NAs tests in X and layers * one more * Ensure that matrices are never written as nullables Take care when using reticulate for testing: before writing, convert NA to NaN * Fix Windows writing NA error * Remove commented code since no longer needed * remove commented code (no longer needed) --------- Co-authored-by: Louise Deconinck <[email protected]>
* Update write_h5ad_categorical * fix styling * Update write_h5ad_categorical * Adjust H5AD categorical write test * Add write_h5ad_attributes function Replace repeated code in individual writers * ignore cyclomatic complexity warning for `write_h5ad_element` warning * formatting changes * in write_h5ad_attributes, allow file to be an open hdf5 file * Add write_h5ad_boolean_attribute() * Add write_h5ad_boolean_array() Helper function for writing ENUM boolean arrays * Remove compression argument from write_h5ad_boolean_array Don't think it is possible to write compressed data using the workaround and ENUM format should be fairly space efficient anyway * Correctly read categorical levels from H5AD Fixes array when there are more levels than values * Fix writing scalar H5AD attributes Correctly check the is_scalra argument * add lintr exceptions * fix nolint --------- Co-authored-by: Robrecht Cannoodt <[email protected]>
* Update write_h5ad_categorical * fix styling * Update write_h5ad_categorical * Adjust H5AD categorical write test * Add write_h5ad_attributes function Replace repeated code in individual writers * ignore cyclomatic complexity warning for `write_h5ad_element` warning * formatting changes * in write_h5ad_attributes, allow file to be an open hdf5 file * wip * wip * substitute mentions of rhdf5 with hdf5r * strip obs_names and var_names from framework * update * fix tests and finalize * remove mentions of obs_names and var_names in the constructor * make sure filenames are always unique * add mode to various functions * manually close anndatas in tests (where needed) * only close when pointer is valid * move match * use $close() instead of $close_all() * switch to different branch * simplify test * gc afterclosing the adata in write_h5ad * guess the dtype and the space * update docs * use hhoeflin's remote * bugfix in hdf5r has been released * update: nevermind, the fix wasn't included in the release yet * minor fixes * bump version number * remove remotes * remove references to rhdf5 * fix attributes * style * fix write h5ad helpers * fix unit tests * fix linting issues * move hdf5 helpers * reuse existing functionality * add test (this seems to have been fixed at some point) * improve guessing of dtype when storing a logical vector * fix styling * reenable more tests --------- Co-authored-by: Luke Zappia <[email protected]>
* Tidy user interface Co-authored-by: Luke Zappia <[email protected]> * Update docs Co-authored-by: Luke Zappia <[email protected]> * update docs * fix linting issues --------- Co-authored-by: Luke Zappia <[email protected]>
Superseded by #207 and dummy-anndata |
Instead of relying on round-trip tests, we also want to be able to read in all possible Python arrays, matrices and dataframes.