Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] add missing stats #660

Open
13 tasks
adebardo opened this issue Nov 21, 2024 · 3 comments
Open
13 tasks

[POC] add missing stats #660

adebardo opened this issue Nov 21, 2024 · 3 comments
Labels
[POC] Conception To review Tickets needs approval about it conception

Comments

@adebardo
Copy link

adebardo commented Nov 21, 2024

Context

It is essential for current users of demcompare, such as QI or IGN, to have a quick and concise way of obtaining a wide range of basic metrics when performing DEM analysis. This ticket aims to address this need.

Although it is clear that, thanks to its ability to work with rasters and its coupling with NumPy for basic metric calculations, we must ensure that these metrics are easy to manipulate, retrieve, and potentially process, especially for automation tasks like implementing a CLI.

Implementation

⚠️ This ticket will be created under the XDEM project, but the developments will be carried out in geoutils.

We propose creating a get_metrics() function in the raster module.

get_metrics(self, metrics_list: list = None):

If None: 
    Returns the full dictionary of global metrics.
If metrics_list: 
    Iterate through it and build the dictionary accordingly.

return metrics_dict

Example usage:

dem = load(dem)
metrics_dict = dem.get_metrics(["mean", "nmad", "rmse"])

⚠️ This function should be callable within the .info() method of the DEM.

List of metrics:

  • Mean
  • Median
  • Max
  • Min
  • Sum
  • Sum of squares
  • 90th percentile
  • NMAD
  • RMSE
  • Standard deviation

⚠️ We need to discuss the NMAD implementation in xDEM.

Tests

Add tests for the get_metrics() function:

  • A test with None that returns all metrics.
  • A test with a valid list.
  • A test with an invalid list.

Additionally, verify the behavior of the .info() method.

Documentation

Update the documentation for both geoutils and xDEM to include this new functionality.

Internal Testing

Ensure that it works seamlessly with xDEM.

Estimation

3 days

@adebardo adebardo added [POC] Conception stuck The issue conception is stopped [POC] Conception To review Tickets needs approval about it conception and removed [POC] Conception stuck The issue conception is stopped labels Nov 21, 2024
@adebardo
Copy link
Author

@rhugonnet @duboise-cnes :)

@rhugonnet
Copy link
Member

rhugonnet commented Nov 28, 2024

Looks good! Link to our previous discussion: GlacioHack/geoutils#602 (forgot it was in GeoUtils).

I only have a couple main remarks:

  • The term "metric" is more commonly used for geometric settings (metric space) or for unit VS unitless description. For this type of statistical function, the most common terms would be "statistic" or "estimator". For instance SciPy uses "statistic" as argument name for this exact purpose: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html, which would be my preferred choice here! 🙂
  • This issue joins the more generic planned main feature of "Raster zonal/binned statistics" for GeoUtils, which can be computed not only on all the Raster but for any Mask/Vector input (or potentially classified Raster input?). We mentioned for instance inspiring ourselves from the small https://pythonhosted.org/rasterstats/ package in one of our past meetings, (or using https://github.com/corteva/geocube once we have the Xarray accessor). We don't need to add all of this now. But maybe it will eventually makes sense to have all these features grouped in the same function, which could take a "zonal" and "binning" argument, and the default "zonal" argument would be to use all the Raster, and the default "binning" to not bin, which would give the same result as the proposed function in this issue? This is linked to what we discussed for classification: [POC]: Classification notebook #620.

Other small remarks:

  • It would be good and should be simple to accept any reducer callable (reduces array to float) in addition to strings as a statistic, to allow users to pass anything!
  • For the NMAD function of xDEM: we had planned to move it to GeoUtils into a stats/ submodule: Re-structure spatialstats.py #378, and to deprecate our custom nmad function into using SciPy's equivalent: Deprecate nmad and move to scipy.stats.median_abs_deviation? #349 (The SciPy one is the one to use, both NumPy and pandas have declared they don't want to support it directly, and pandas has deprecated their old mad function). This would be the perfect occasion to do so! 😄

And I really like the idea of this function behind called in info() for consistency!

@duboise-cnes
Copy link
Member

Not so much to add to @rhugonnet comments, already large ;)
Ok for statistic vs metric naming, or even stat would do for me if consistent with naming conventions in xdem. No problem to get something more accurate and expressive.

Other question: where is the status of more complex demcompare metric or stats for non scalar element such as hillshade, svf . I remember the discussion that a structure in xdem would be better to automate more "stats" possibilities than only scalar and to put the metric structure in xdem then.
My question: will it be possible to automate other type of "metric"/stat such as PDF, CDF, hillshide through a pipeline with a common unified interface ? afterwards but to be sure nothing will block.
Hoping I am clear.

thanks for the work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[POC] Conception To review Tickets needs approval about it conception
Projects
None yet
Development

No branches or pull requests

3 participants