Just a few thoughts / comments on code structure. #72

ChrisBarker-NOAA · 2022-08-05T23:31:27Z

I really wish gitHub had an interface for general code review, as oppsed to only PRs and commits.

Anyway:

With regrd to numba -- it may be stable and happy enough to just use it -- we'll see how it goes on our production environment, but as far as I could tell, it's used (so far) only for

@njit
def index_of_sorted(
    haystack: np.array, values: np.array
) -> np.array:  # pragma: no cover
    """Return an array of indexes for each value in values found in haystack.

    This function uses binary search on haystack to find each value in values and returns an array
    of indices or -1 if an exact value is not identified. This function behaves similarly to
    np.searchsorted but will return -1 if there is no exact value.

So: 1) this is a really good candidate for Cython -- though of course that introduced the need for compiling ahead of time :-( -- not so bad with conda, but still ...

Would it be too slow / painful to simply use searchsorted() and then check for the non-exact match?

I see code differences on: grid_type == "fvcom" ["selfe"] -- this is exactly the kind of thing we were trying to avoid with gridded. Alternatively, you load the grid into an object with known structure, doing any name mangling or changing you need to do on the way in, and then do all your operations on that structure.

I'm not entirely sure how to make that really work with an xarray dataset, but I imagine it wouldn't be that hard.

At this point in the game, not bad to just get things working so you can test the algorithms, but it would be nice to abstract this before we get too much farther into it.

Take a look at the UGrid object in gridded here:

https://github.com/NOAA-ORR-ERD/gridded/blob/master/gridded/pyugrid/ugrid.py

In gridded, all those attributes are "numpy-array-like" -- but they are guaranteed to be in the correct form -- e.g. the correct shape, zero-indexed, etc.

And I think it would not be a huge change to make them all xarray variables instead, which would buy us a lot.

Alternatively, one could use subclassing to make model-specific versions of teh code -- whenever I see code like this:

    def subset(
        self, ds: xr.Dataset, bbox: BBOXType, grid_type: "GridType"
    ) -> xr.Dataset:
        """Returns a subsetted dataset."""
        if grid_type == "fvcom":
            return self._subset_fvcom(ds, bbox)
        if grid_type == "selfe":
            return self._subset_selfe(ds, bbox)
        raise ValueError(f"Unsupported grid type {grid_type}")

That makes me think there could be a grid class with a subset method -- and, if necessary different subclasses for different grids.

In fact, that, again, is the large goal of gridded -- the base Grid class would have a subset method, and that method would invoke different code depending on what type of grid it is -- rectangular, curvilinear, trinagular mesh, etc....

Anyway, these are all code structure ideas -- the algorithm work looks really nice!

The text was updated successfully, but these errors were encountered:

lukecampbell added the tech-debt Technical Debt label Aug 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Just a few thoughts / comments on code structure. #72

Just a few thoughts / comments on code structure. #72

ChrisBarker-NOAA commented Aug 5, 2022

Just a few thoughts / comments on code structure. #72

Just a few thoughts / comments on code structure. #72

Comments

ChrisBarker-NOAA commented Aug 5, 2022