From 242ceda69e176e99262fb3a5bf0b07776e0e1d7b Mon Sep 17 00:00:00 2001 From: Virginia Morales Date: Thu, 1 Aug 2024 16:24:25 +0200 Subject: [PATCH] Update documentation. Add guides about: - Adding new data. - Doing calculations via request. - The general workflow. Signed-off-by: Virginia Morales --- docs/user-guide.rst | 3 + docs/user-guide/add_new_data.ipynb | 47 +++++ .../calculations_via_requests.ipynb | 117 ++++++++++++ docs/user-guide/general_workflow.ipynb | 172 ++++++++++++++++++ docs/user-guide/introduction.ipynb | 7 + 5 files changed, 346 insertions(+) create mode 100644 docs/user-guide/add_new_data.ipynb create mode 100644 docs/user-guide/calculations_via_requests.ipynb create mode 100644 docs/user-guide/general_workflow.ipynb diff --git a/docs/user-guide.rst b/docs/user-guide.rst index 62820d72..fb82ef57 100644 --- a/docs/user-guide.rst +++ b/docs/user-guide.rst @@ -11,3 +11,6 @@ The following sections document the structures and conventions of Physrisk and a user-guide/introduction user-guide/vulnerability_config + user-guide/general_workflow + user-guide/add_new_data + user-guide/calculations_via_request diff --git a/docs/user-guide/add_new_data.ipynb b/docs/user-guide/add_new_data.ipynb new file mode 100644 index 00000000..74947770 --- /dev/null +++ b/docs/user-guide/add_new_data.ipynb @@ -0,0 +1,47 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# HOW TO ADD NEW DATA\n", + "\n", + "## Adding new hazards\n", + "\n", + "In `src/physrisk/kernel/hazards.py`, all hazards are cataloged, classified as ACUTE, CHRONIC, or UNKNOWN, and designated as either parameter-based or event-based. To add a new hazard, create a new class within this file and specify its type.\n", + "\n", + "Additionally, complete the onboarding process for the new hazard in the hazard program. This step ensures that the hazard and its data are collected in the bucket and included in the inventory used by PhysRisk to calculate impacts and risk measures.\n", + "\n", + "## Adding new vulnerability models\n", + "\n", + "In the `src/physrisk/vulnerability_models` folder, files correspond to each vulnerability model (e.g., `chronic_heat_models`, `power_generating_asset_models`, `real_estate_models`, `thermal_power_generation_models`). Each file contains classes for each type of hazard, with separate classes for default and stress test calculations. The `DictBasedVulnerabilityModelsFactory`, which inherits from the `VulnerabilityModelsFactory` class in `src/physrisk/kernel/vulnerability_model.py`, is used to handle the different vulnerability models, whether they are default or stress test models.\n", + "\n", + "In the `DictBasedVulnerabilityModelsFactory` class, there is a `vulnerability_models` method that retrieves the corresponding vulnerability models using the methods `get_default_vulnerability_models` and `get_stress_test_vulnerability_models`, implemented in `src/physrisk/kernel/calculation.py`.\n", + "\n", + "To add a vulnerability model for a new asset type, create a new file with the corresponding vulnerability classes for each hazard type. Additionally, create a JSON file in the `src/physrisk/datas/static/vulnerability` folder. This JSON file should include the vulnerability curves for these models, detailing `impact_mean`, `impact_std`, `impact_type`, `impact_units`, `intensity`, `intensity_units`, and `location` for each type of event (hazard) and asset.\n", + "\n", + "For adding a vulnerability model for an existing asset, create a new class in the relevant file. All classes must inherit from either a class in `src/physrisk/kernel/vulnerability_model.py` or another class in the same file that inherits from a class in `vulnerability_model.py`. These classes must include at least a constructor, a `get_data_requests` method, and a `get_impact` method.\n", + "\n", + "- The `get_data_requests` method returns an `HazardDataRequest` object, which stores all necessary information to request data from the bucket for calculating impacts and risk measures.\n", + "- The `get_impact` method returns an `ImpactDistrib` object, which contains \"Impact distributions specific to an asset\" and is used for calculating `impact_bins_explicit`, `mean_impact`, `stddev_impact`, `above_mean_stddev_impact`, and `to_exceedance_curve`.\n", + "\n", + "To include the new vulnerability model, update either `get_default_vulnerability_models` or `get_stress_test_vulnerability_models` as appropriate. If introducing a new calculation method, add a new method and integrate it into `DictBasedVulnerabilityModelsFactory`. \n", + "\n", + "## Adding new risk models\n", + "\n", + "In `src/physrisk/risk_models/risk_models.py`, there are three classes that implement risk models: `RealEstateToyRiskMeasures` (which calculates risk measures using exceedance curves), `ThermalPowerPlantsRiskMeasures` (which calculates risk measures using mean intensity and percentiles). Additionally, in `src/physrisk/risk_models/generic_risk_model.py`, the `GenericScoreBasedRiskMeasures` class showcases how different approaches can be combined for calculating risk scores, using vulnerability models for some hazards and direct hazard indicators for others. This generic implementation serves as an example, blending elements from both real estate and Jupiter exposure calculations, without a predefined use case for the measures.\n", + "\n", + "Moreover, similar to the vulnerability models, a factory class `DefaultMeasuresFactory` has been implemented in `src/physrisk/kernel/calculation.py` to select the appropriate risk model based on the `use_case_id`. In this class, there is a method called `calculators` which makes use of `get_default_risk_measure_calculators`, `get_stress_test_risk_measure_calculators`, and `get_default_risk_measure_calculators`, implemented in `src/physrisk/kernel/calculation.py`.\n", + "\n", + "To add new risk models, you need to create a new class in the risk_models file that implements the calculations for the new model.\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/user-guide/calculations_via_requests.ipynb b/docs/user-guide/calculations_via_requests.ipynb new file mode 100644 index 00000000..18ce7865 --- /dev/null +++ b/docs/user-guide/calculations_via_requests.ipynb @@ -0,0 +1,117 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# CONTAINER AND REQUEST USAGE GUIDE\n", + "\n", + "In addition to invoking methods directly, as detailed in the `general_workflow.md` guide, you can perform various actions or calculations through requests. For examples of this approach, see `tests/risk_models/risk_models_AK_test.py` and `tests/risk_models/risk_models_test.py`.\n", + "\n", + "## Process\n", + "\n", + "1. **Create a Container**\n", + "\n", + " First, create a `Container` object and configure it by overriding its default providers with custom ones. Here’s an example:\n", + "\n", + " ```python\n", + " # Define custom factories for hazard and vulnerability models\n", + " class TestHazardModelFactory(HazardModelFactory):\n", + " def hazard_model(self, interpolation: str = \"floor\", provider_max_requests: Dict[str, int] = ...):\n", + " return ZarrHazardModel(\n", + " source_paths=get_default_source_paths(), reader=reader\n", + " )\n", + "\n", + " class TestVulnerabilityModelFactory(VulnerabilityModelsFactory):\n", + " def vulnerability_models(self):\n", + " return DictBasedVulnerabilityModels(\n", + " {\n", + " ThermalPowerGeneratingAsset: [\n", + " ThermalPowerGenerationAqueductWaterStressModel()\n", + " ]\n", + " }\n", + " )\n", + "\n", + " # Register custom providers in the container\n", + " container.override_providers(\n", + " hazard_model_factory=providers.Factory(TestHazardModelFactory)\n", + " )\n", + " container.override_providers(\n", + " config=providers.Configuration(default={\"zarr_sources\": [\"embedded\"]})\n", + " )\n", + " container.override_providers(inventory_reader=ZarrReader())\n", + " container.override_providers(zarr_reader=ZarrReader())\n", + " container.override_providers(\n", + " vulnerability_models_factory=providers.Factory(TestVulnerabilityModelFactory)\n", + " )\n", + "\n", + " ``` \n", + " You can include any list of vulnerability models in the configuration. If none are provided, default models will be used.\n", + "\n", + "\n", + "2. **Create a Requester**\n", + "\n", + " After setting up the container, call ``container.requester()`` to obtain an instance of ``Requester``. This object includes the following attributes configured from the container:\n", + " ``hazard_model_factory: HazardModelFactory, vulnerability_models_factory: VulnerabilityModelsFactory, inventory: Inventory, inventory_reader: InventoryReader, reader: ZarrReader, colormaps: Colormaps`` and a ``measures_factory: RiskMeasuresFactory``\n", + " \n", + "\n", + "3. **Call the Method and Obtain a Response**\n", + "\n", + " The `Requester` class has a main method that calls different methods based on the `request_id` provided.\n", + "\n", + " Here is an example of how to call a method using the `get` method:\n", + "\n", + " ```python\n", + " res = requester.get(request_id=\"get_asset_impact\", request_dict=request_dict)\n", + " ```\n", + "\n", + " You can assign the following values to `request_id`:\n", + "\n", + " - `get_hazard_data`: Returns intensity curves for the selected hazards, years, and scenarios.\n", + " - `get_hazard_availability`: Returns the hazards stored in the inventory.\n", + " - `get_hazard_description`: Returns the description assigned to a specific hazard.\n", + " - `get_asset_exposure`: Calculates the exposure of a given asset for a hazard, exposure measure, scenario, and year.\n", + " - `get_asset_impact`: Returns risk measures or impacts based on parameters provided in `request_dict`.\n", + " - `get_example_portfolios`: Returns a JSON with assets and their respective details such as class, type, location, latitude, and longitude.\n", + "\n", + " The structure of `request_dict` depends on the method you are calling. For example, for the `get_asset_impact` method, `request_dict` might look like this:\n", + "\n", + " ```python\n", + " def create_assets_json(assets: Sequence[ThermalPowerGeneratingAsset]):\n", + " assets_dict = {\n", + " \"items\": [\n", + " {\n", + " \"asset_class\": type(asset).__name__,\n", + " \"type\": asset.type,\n", + " \"location\": asset.location,\n", + " \"longitude\": asset.longitude,\n", + " \"latitude\": asset.latitude,\n", + " }\n", + " for asset in assets\n", + " ],\n", + " }\n", + " return assets_dict\n", + "\n", + " request_dict = {\n", + " \"assets\": create_assets_json(assets=assets),\n", + " \"include_asset_level\": False,\n", + " \"include_measures\": True,\n", + " \"include_calc_details\": False,\n", + " \"model_kind\": ModelKind.STRESS_TEST,\n", + " \"years\": years,\n", + " \"scenarios\": scenarios,\n", + " }\n", + " ```\n", + "\n", + " Finally, the ``get`` method calls the appropriate methods corresponding to the ``request_id`` with the necessary parameters and returns the response as a JSON object with the result." + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/user-guide/general_workflow.ipynb b/docs/user-guide/general_workflow.ipynb new file mode 100644 index 00000000..73320076 --- /dev/null +++ b/docs/user-guide/general_workflow.ipynb @@ -0,0 +1,172 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# GENERAL WORKFLOW\n", + "\n", + "## Impact calculation process\n", + "\n", + "To calculate impacts for a specific asset, scenario, year, vulnerability model, and hazard, use the following function:\n", + "`calculate_impacts(assets, hazard_model, vulnerability_model, scenario, year) -> Dict[ImpactKey, List[AssetImpactResult]]`\n", + "\n", + "### First part: calculating intensity curves\n", + "(Calculate intensities and return periods for acute hazards, parameters, and definitions for chronic hazards)\n", + "\n", + "1. **Request Hazard Data**:\n", + "\n", + " For each asset, generate requests for hazard data and obtain responses:\n", + "\n", + " ```python\n", + " asset_requests, responses = _request_consolidated(hazard_model, model_asset, scenario, year)\n", + " ```\n", + "\n", + " ```python\n", + " HazardDataRequest(self.hazard_type, asset.longitude, asset.latitude, scenario=scenario, year=year, indicator_id=self.indicator_id)\n", + " ```\n", + " responses are obtained from:\n", + "\n", + " ```python\n", + " hazard_model.get_hazard_events(requests)\n", + " ```\n", + "\n", + "\n", + "2. **Process Hazard Events**:\n", + "\n", + " If hazards are acute (events) or chronic (parameters), the responses are processed differently:\n", + " Acute Hazards: Responses include periods, intensities, units, and paths.\n", + " Chronic Hazards: Responses include parameters, definitions, units, and paths.\n", + "\n", + "\n", + "3. **Retrieve Data**:\n", + "\n", + " For Acute Hazards:\n", + "\n", + " ```python\n", + " hazard_data_provider = self.hazard_data_providers[hazard_type]\n", + " intensities, return_periods, units, path = hazard_data_provider.get_data(longitudes, latitudes, indicator_id, scenario, year, hint, buffer)\n", + " ```\n", + " For Chronic Hazards:\n", + "\n", + " ```python\n", + " hazard_data_provider = self.hazard_data_providers[hazard_type]\n", + " parameters, definitions, units, path = hazard_data_provider.get_data(longitudes, latitudes, indicator_id, scenario, year, hint, buffer)\n", + " ```\n", + "\n", + " ```python\n", + " get_data(self, longitudes: List[float], latitudes: List[float], *, indicator_id: str, scenario: str, year: int, hint: Optional[HazardDataHint] = None, buffer: Optional[int] = None)\n", + " ```\n", + "\n", + " The ``get_data`` method retrieves hazard data for given coordinates.\n", + "\n", + "4. **Determine Data Path**:\n", + "\n", + " Build the path for data retrieval:\n", + "\n", + " ```python\n", + " path = self._get_source_path(indicator_id=indicator_id, scenario=scenario, year=year, hint=hint)\n", + " ```\n", + "\n", + " get_source_path(SourcePath) provides the source path mappings.\n", + "\n", + "5. **Retrieve Curves**:\n", + "\n", + " If buffer is None, use:\n", + "\n", + " ```python\n", + " values, indices, units = self._reader.get_curves(path, longitudes, latitudes, self._interpolation)\n", + " ```\n", + "\n", + " If buffer is specified (The ``buffer`` variable is used to specify an area of a given size, as indicated by this variable, instead of using a single point):\n", + "\n", + " ```python\n", + " values, indices, units = self._reader.get_max_curves(\n", + " path,\n", + " [\n", + " (\n", + " Point(longitude, latitude)\n", + " if buffer == 0\n", + " else Point(longitude, latitude).buffer(\n", + " ZarrReader._get_equivalent_buffer_in_arc_degrees(latitude, buffer)\n", + " )\n", + " )\n", + " for longitude, latitude in zip(longitudes, latitudes)\n", + " ],\n", + " self._interpolation\n", + " )\n", + " ```\n", + "\n", + "6. **Data Retrieval Functions**:\n", + "\n", + " ```python\n", + " get_curves(self, set_id, longitudes, latitudes, interpolation=\"floor\")\n", + " ```\n", + "\n", + " Get Curves: Retrieves intensity curves for each coordinate pair. Returns intensity curves, return periods, and units.\n", + "\n", + " First, it constructs the path used to select the corresponding data in the bucket. From this data, it extracts the transformation matrix, coordinate system, data units, and return periods or indices (``index_values``). Next, it converts the geographic coordinates to image coordinates. Then, it interpolates the data based on the specified interpolation method.\n", + "\n", + " If the interpolation method is ``\"floor\"``, it converts ``image_coords`` to integer values using the floor function and adjusts coordinates for wrapping around the dataset dimensions. It retrieves the data values using ``z.get_coordinate_selection``, then reshapes and returns the data along with ``index_values`` and ``units``.\n", + "\n", + " For other interpolation methods (``\"linear\"``, ``\"max\"``, ``\"min\"``), it calls ``_linear_interp_frac_coordinates`` to perform the specified interpolation. Finally, it returns the interpolated results along with ``index_values`` and ``units``.\n", + "\n", + " ```python\n", + " get_max_curves(self, set_id, shapes, interpolation=\"floor\")\n", + " ```\n", + "\n", + " Get Max Curves: Retrieves the maximum intensity curves for given geometries. Returns maximal intensity curves, return periods, and units.\n", + "\n", + " First, it constructs the path used to locate the corresponding data in the bucket, similar to the ``get_curves`` method. From this data, it extracts the transformation matrix, coordinate system, data units, and index values (``index_values``). It then computes the inverse of the affine transformation matrix and applies it to the input geometries, transforming them into the coordinate system of the dataset.\n", + "\n", + " Next, it generates a ``MultiPoint`` for each shape by creating a grid of points within the shape's bounding box and intersecting these points with the shape to retain only those points that lie within the shape. If the intersection of a shape with the grid points is empty, it falls back to using the centroid of the shape as a single point.\n", + "\n", + " For the ``\"floor\"`` interpolation method, it converts the transformed coordinates to integer values using the floor method, retrieves the corresponding data values, and reshapes the data. For other interpolation methods (``\"linear\"``, ``\"max\"``, ``\"min\"``), it combines the transformed shapes with the multipoints and computes the fractional coordinates for interpolation.\n", + "\n", + " Finally, it calculates the maximum intensity values for each shape by grouping the points corresponding to each shape and finding the maximum value for each return period. The method then returns the maximum intensity curves, return periods, and units.\n", + "\n", + "### Second part: applying a vulnerability model to obtain impacts\n", + "\n", + "When applying a chronic-type vulnerability model, the impact is calculated using the model's `get_impact` method. This method will return an `ImpactDistrib` object, which includes `impact_bins`, `impact_type`, `path`, and `prob` (i.e., it provides the impact distribution along with the hazard data used to infer it). This result is then stored in an `AssetImpactResult` object, together with the hazard_data (which consists of the intensity curves obtained previously). The `AssetImpactResult` is subsequently saved in the results dictionary, associated with an `ImpactKey` that comprises the `asset`, `hazard_type`, `scenario`, and `year`.\n", + "\n", + "On the other hand, for acute-type vulnerability models, the impact is calculated using the `get_impact_details` method of the model. This method returns an `ImpactDistrib` object, a `VulnerabilityDistrib` object (which includes `impact_bins`, `intensity_bins`, and `prob_matrix`), and a `HazardEventDistrib` object (which contains `intensity_bin_edges` and `prob`). In other words, it provides the impact distribution along with the vulnerability and hazard event distributions used to infer it. This information is stored in an `AssetImpactResult` object, which is then added to the results dictionary with an `ImpactKey`.\n", + "\n", + "## Risk measures calculation process\n", + "\n", + "To calculate risk measures for a specific asset, scenario, year, vulnerability model, and hazard, use the following function:\n", + "`def calculate_risk_measures(self, assets: Sequence[Asset], prosp_scens: Sequence[str], years: Sequence[int]):`\n", + "\n", + "1. **Calculate all impacts**\n", + "\n", + " First, using the `_calculate_all_impacts` method, the impacts for the specific hazard, asset, and vulnerability model are calculated for all the years and scenarios. This method uses `_calculate_single_impact`, which calculates each impact using the `calculate_impacts` method previously described.\n", + "\n", + "2. **Calculate risk measure**\n", + "\n", + " For each asset, scenario, year, and hazard, the corresponding impact is used to determine the risk measures according to the selected calculation method.\n", + "\n", + " The impact of the historical scenario is chosen as the `base impact`, and `risk measures` are calculated using the `calc_measure` function.\n", + "\n", + " In the default use case, the `calc_measure` method defined in the `RealEstateToyRiskMeasures` class performs calculations differently depending on whether the hazard is chronic heat or another type. The difference between the two methods is that `calc_measure_cooling` uses `mean impacts` for calculations, while `calc_measure_acute` uses `exceedance curves`. In both cases, a `Measure` object is returned, which contains a `score` (REDFLAG, HIGH, MEDIUM, LOW), `measures_0` (future_loss), and a `definition`.\n", + "\n", + " - **For cooling hazards**: It calculates the change in mean impact between historical and future scenarios. It assigns a risk score based on the future cooling levels and the change compared to predefined thresholds, returning a `Measure` object with the assigned score and future cooling value.\n", + " \n", + " - **For acute hazards**: It calculates the potential loss based on a 100-year return period by comparing historical and future loss values derived from exceedance curves. It assigns a risk score based on future loss levels and the change in loss relative to predefined thresholds, returning a `Measure` object with the assigned score and future loss value.\n", + "\n", + " - **For the stress_test use case**: The `calc_measure` function in the `ThermalPowerPlantsRiskMeasures` class creates a `StressTestImpact` object to obtain the percentiles (norisk, p50, p75, p90), which are used to evaluate the impact based on its `mean_intensity`. This method also returns a `Measure` object with a `score` (HIGH, MEDIUM, LOW, NORISK, NODATA), `measures_0` (mean_intensity), and a `definition`.\n", + "\n", + " - **For the generic use case**: In the `GenericScoreBasedRiskMeasures` class, the `calc_measure` method calculates risk scores differently based on whether the impact distribution is necessary or if underlying hazard data can be used instead. To generate the scores, bounds are defined for each hazard type.\n", + "\n", + " - **When using hazard data**: It compares hazard parameters to the predefined threshold bounds. It returns a score based on the severity of the hazard, or NODATA if the parameter is invalid.\n", + " \n", + " - **Otherwise**: The method calculates two impact measures from historical and future data. It then determines the score category based on whether these measures fall within predefined ranges and returns a `Measure` object with the score and the first measure value.\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/user-guide/introduction.ipynb b/docs/user-guide/introduction.ipynb index b68f5c8c..5cbb7264 100644 --- a/docs/user-guide/introduction.ipynb +++ b/docs/user-guide/introduction.ipynb @@ -42,6 +42,13 @@ "| Heat | mean_degree_days/above/index (P) | Mean mean-temperature degree days per year above a set of temperature threshold indices. |\n", "| Drought | months/spei/12m/below/index (P) | Mean months per year where the 12 month SPEI index is below a set of indices. | \n", "| Wind | max_speed (A) | Maximum 1 minute sustained wind speed for available return periods. |\n", + "| Subsidence | susceptability (P) | Score (1-5) based on soils’ clay content. |\n", + "| Subsidence | land_subsidence_rate (P) | Land subsidence rate (millimetres/year). |\n", + "| Landslide | susceptability (P) | Score (1-5) based on characteristics of the terrain combined with daily maximum precipitation (per return period). |\n", + "| WaterStress | water_stress (P) | Ratio of water demand and water supply. |\n", + "| HighFire | fwiX (P) | Daily probabilities of high forest fire danger in Europe. |\n", + "| ChronicWind | windX (P) | Gridde annual probability of severe / extreme convective windstorms (defined as wind gusts \\> X m/s). |\n", + "\n", "\n", "## Event based modelling\n", "The most common use case for physrisk at time of writing is to perform analyses of portfolios but with asset impacts treated separately or with heuristics defining the dependence of asset impacts. In such cases it is possible, and indeed efficient, to derive the marginal probability distribution of the impact of the asset, the `ImpactDistrib`. However to capture more realistic dependence of impacts, event-based modelling is needed. Here the `HazardModel` additionally supplies an array of simulated hazard indicator values to the `VulnerabilityModel` for each asset location, which in turn samples an array of impacts from the vulnerability function. At time of writing, the event-based functionality is not merged into main. Although specific types of `HazardModel` and `VulnerabilityModel` are needed for the event-based case, these do not replace the existing calculation which is appropriate for the separate-asset case: event-based analyses are typically much more computationally intensive. \n",