Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Just a few thoughts / comments on code structure. #72

Open
ChrisBarker-NOAA opened this issue Aug 5, 2022 · 0 comments
Open

Just a few thoughts / comments on code structure. #72

ChrisBarker-NOAA opened this issue Aug 5, 2022 · 0 comments
Labels
tech-debt Technical Debt

Comments

@ChrisBarker-NOAA
Copy link
Contributor

I really wish gitHub had an interface for general code review, as oppsed to only PRs and commits.

Anyway:

With regrd to numba -- it may be stable and happy enough to just use it -- we'll see how it goes on our production environment, but as far as I could tell, it's used (so far) only for

@njit
def index_of_sorted(
    haystack: np.array, values: np.array
) -> np.array:  # pragma: no cover
    """Return an array of indexes for each value in values found in haystack.

    This function uses binary search on haystack to find each value in values and returns an array
    of indices or -1 if an exact value is not identified. This function behaves similarly to
    np.searchsorted but will return -1 if there is no exact value.

So: 1) this is a really good candidate for Cython -- though of course that introduced the need for compiling ahead of time :-( -- not so bad with conda, but still ...

Would it be too slow / painful to simply use searchsorted() and then check for the non-exact match?

  1. I see code differences on: grid_type == "fvcom" ["selfe"] -- this is exactly the kind of thing we were trying to avoid with gridded. Alternatively, you load the grid into an object with known structure, doing any name mangling or changing you need to do on the way in, and then do all your operations on that structure.

I'm not entirely sure how to make that really work with an xarray dataset, but I imagine it wouldn't be that hard.

At this point in the game, not bad to just get things working so you can test the algorithms, but it would be nice to abstract this before we get too much farther into it.

Take a look at the UGrid object in gridded here:

https://github.com/NOAA-ORR-ERD/gridded/blob/master/gridded/pyugrid/ugrid.py

In gridded, all those attributes are "numpy-array-like" -- but they are guaranteed to be in the correct form -- e.g. the correct shape, zero-indexed, etc.

And I think it would not be a huge change to make them all xarray variables instead, which would buy us a lot.

Alternatively, one could use subclassing to make model-specific versions of teh code -- whenever I see code like this:

    def subset(
        self, ds: xr.Dataset, bbox: BBOXType, grid_type: "GridType"
    ) -> xr.Dataset:
        """Returns a subsetted dataset."""
        if grid_type == "fvcom":
            return self._subset_fvcom(ds, bbox)
        if grid_type == "selfe":
            return self._subset_selfe(ds, bbox)
        raise ValueError(f"Unsupported grid type {grid_type}")

That makes me think there could be a grid class with a subset method -- and, if necessary different subclasses for different grids.

In fact, that, again, is the large goal of gridded -- the base Grid class would have a subset method, and that method would invoke different code depending on what type of grid it is -- rectangular, curvilinear, trinagular mesh, etc....

Anyway, these are all code structure ideas -- the algorithm work looks really nice!

@lukecampbell lukecampbell added the tech-debt Technical Debt label Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tech-debt Technical Debt
Projects
None yet
Development

No branches or pull requests

2 participants