Generate Python based anndata testfiles #170

LouiseDck · 2024-07-03T18:56:06Z

Instead of relying on round-trip tests, we also want to be able to read in all possible Python arrays, matrices and dataframes.

…A or None

rcannood · 2024-07-08T06:40:23Z

Instead of relying on round-trip tests, we also want to be able to read in all possible Python arrays, matrices and dataframes.

I think we should still do round-trip tests, but the difference is that our current round-trip tests currently:

use R functions to generate data
use CRAN anndata to move R data into Python with reticulate
use Python anndata to write to data to disk
use anndataR to read data from disk
compare original data to resulting data

(the reverse round-trip is also tested)

While this PR will enable tests:

use reticulate to call Python functions which generate data
use Python anndata to write data to disk
use anndataR to read data from disk
compare original data to resulting data

What I'm currently wondering about is how to compare the original data in Python to the resulting data in R. Do we introduce a simple JSON format to store which assertions to make (i.e. let the python function write out which assertions R should be making to check whether the anndata was created correctly)? Or do we let the Python functions generate data in a predictable manner so that we can reproduce the same results in R?

LouiseDck · 2024-07-18T19:25:00Z

I think we should still do round-trip tests, but the difference is that our current round-trip tests currently:

use R functions to generate data
use CRAN anndata to move R data into Python with reticulate
use Python anndata to write to data to disk
use anndataR to read data from disk
compare original data to resulting data

I agree.

What I'm currently wondering about is how to compare the original data in Python to the resulting data in R. Do we introduce a simple JSON format to store which assertions to make (i.e. let the python function write out which assertions R should be making to check whether the anndata was created correctly)? Or do we let the Python functions generate data in a predictable manner so that we can reproduce the same results in R?

After discussion we settled on generating the same dataset in both Python and R, and checking if there are differences when reading in both datasets, and reporting differences using h5diffs.

LouiseDck · 2024-09-20T11:37:59Z

For documentation purposes: the python anndata generator functions have moved to a separate package: dummy-anndata

* re-enable matrices with NAs tests in X and layers * one more * Ensure that matrices are never written as nullables Take care when using reticulate for testing: before writing, convert NA to NaN * Fix Windows writing NA error * Remove commented code since no longer needed * remove commented code (no longer needed) --------- Co-authored-by: Louise Deconinck <[email protected]>

* Update write_h5ad_categorical * fix styling * Update write_h5ad_categorical * Adjust H5AD categorical write test * Add write_h5ad_attributes function Replace repeated code in individual writers * ignore cyclomatic complexity warning for `write_h5ad_element` warning * formatting changes * in write_h5ad_attributes, allow file to be an open hdf5 file * Add write_h5ad_boolean_attribute() * Add write_h5ad_boolean_array() Helper function for writing ENUM boolean arrays * Remove compression argument from write_h5ad_boolean_array Don't think it is possible to write compressed data using the workaround and ENUM format should be fairly space efficient anyway * Correctly read categorical levels from H5AD Fixes array when there are more levels than values * Fix writing scalar H5AD attributes Correctly check the is_scalra argument * add lintr exceptions * fix nolint --------- Co-authored-by: Robrecht Cannoodt <[email protected]>

* port rownames-related changes from #166 and #169 * run styler * fix test * style * style * fix docs * fix documentation * simplify helper functions * simplify test * add more documentation to AnnData * fix docs

* Update write_h5ad_categorical * fix styling * Update write_h5ad_categorical * Adjust H5AD categorical write test * Add write_h5ad_attributes function Replace repeated code in individual writers * ignore cyclomatic complexity warning for `write_h5ad_element` warning * formatting changes * in write_h5ad_attributes, allow file to be an open hdf5 file * wip * wip * substitute mentions of rhdf5 with hdf5r * strip obs_names and var_names from framework * update * fix tests and finalize * remove mentions of obs_names and var_names in the constructor * make sure filenames are always unique * add mode to various functions * manually close anndatas in tests (where needed) * only close when pointer is valid * move match * use $close() instead of $close_all() * switch to different branch * simplify test * gc afterclosing the adata in write_h5ad * guess the dtype and the space * update docs * use hhoeflin's remote * bugfix in hdf5r has been released * update: nevermind, the fix wasn't included in the release yet * minor fixes * bump version number * remove remotes * remove references to rhdf5 * fix attributes * style * fix write h5ad helpers * fix unit tests * fix linting issues * move hdf5 helpers * reuse existing functionality * add test (this seems to have been fixed at some point) * improve guessing of dtype when storing a logical vector * fix styling * reenable more tests --------- Co-authored-by: Luke Zappia <[email protected]>

* Tidy user interface Co-authored-by: Luke Zappia <[email protected]> * Update docs Co-authored-by: Luke Zappia <[email protected]> * update docs * fix linting issues --------- Co-authored-by: Luke Zappia <[email protected]>

LouiseDck · 2024-12-12T19:58:55Z

Superseded by #207 and dummy-anndata

LouiseDck added 3 commits June 29, 2024 22:52

Start on generate vector in python

b46dcbb

Eliminate most randomness, check what happens when using np.nan, pd.N…

b9b9685

…A or None

Generate all manner of matrics

40decbf

LouiseDck self-assigned this Jul 3, 2024

LouiseDck added 2 commits July 4, 2024 20:22

bugfix

311f896

Generate dataframe

b4d8dfd

LouiseDck force-pushed the dataset-generator branch from 71a4291 to b4d8dfd Compare July 4, 2024 18:23

LouiseDck added 2 commits July 4, 2024 20:25

documentation

11850b2

generate dict and start of dataset

11a0a67

rcannood mentioned this pull request Jul 8, 2024

Create a list of known incompatibilities #173

Open

LouiseDck added 2 commits July 18, 2024 21:07

Generate dataset

185ec3f

Black formatting

a13874d

LouiseDck added 3 commits July 18, 2024 22:02

Remove randomness

bd50473

Fix writing pd.NA in uns and then failing to write

9359252

Remove dummy-anndata files

de9767a

rcannood and others added 12 commits October 3, 2024 20:48

empty commit to trigger ci

c8c77b1

clean up funding

0a76134

add list as param

71d5a4a

Make rownames part of obs and var (#171)

961c105

* port rownames-related changes from #166 and #169 * run styler * fix test * style * style * fix docs * fix documentation * simplify helper functions * simplify test * add more documentation to AnnData * fix docs

add dependabot to the repo (#176)

9856e72

bump version requirements for reticulate and rhdf5 (#174)

fff14f5

update actions (#181)

99f77fb

Tidy user interface #2 (#180)

d46edb6

* Tidy user interface Co-authored-by: Luke Zappia <[email protected]> * Update docs Co-authored-by: Luke Zappia <[email protected]> * update docs * fix linting issues --------- Co-authored-by: Luke Zappia <[email protected]>

Verbose h5diff testing

e6f8a33

LouiseDck and others added 6 commits October 4, 2024 10:44

Add processx to description

03eb07e

Require processx

534ee90

lintr

3c60c3e

Merge remote-tracking branch 'origin/main' into dataset-generator

0d67580

Start on diffing h5ad files

2ef2d2b

Basic matrix tests

b9eb120

LouiseDck mentioned this pull request Nov 12, 2024

experiment with new dummy anndata #193

Merged

Systematise testing

99fd648

LouiseDck closed this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate Python based anndata testfiles #170

Generate Python based anndata testfiles #170

LouiseDck commented Jul 3, 2024

rcannood commented Jul 8, 2024

LouiseDck commented Jul 18, 2024

LouiseDck commented Sep 20, 2024

LouiseDck commented Dec 12, 2024

Generate Python based anndata testfiles #170

Generate Python based anndata testfiles #170

Conversation

LouiseDck commented Jul 3, 2024

rcannood commented Jul 8, 2024

LouiseDck commented Jul 18, 2024

LouiseDck commented Sep 20, 2024

LouiseDck commented Dec 12, 2024