Skip to content

Commit

Permalink
Merge pull request #7 from arfogg/code_tidying
Browse files Browse the repository at this point in the history
Code tidying
  • Loading branch information
08walkersj authored Aug 13, 2024
2 parents c82b838 + d2c9e7c commit 7dafc93
Show file tree
Hide file tree
Showing 14 changed files with 1,713 additions and 927 deletions.
136 changes: 87 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,26 +11,13 @@
[![Issues](https://img.shields.io/github/issues/arfogg/bi_multi_variate_eva.svg)](https://github.com/arfogg/bi_multi_variate_eva/issues)
![Contributions welcome](https://img.shields.io/badge/contributions-welcome-orange.svg)




Code to run bivariate and multivariate extreme value analysis on generic data - work in progress.
Python package to run bivariate and multivariate extreme value analysis on generic data - work in progress.

**License:** CC0-1.0

**Support:** please [create an issue](https://github.com/arfogg/bi_multi_variate_eva/issues) or contact [arfogg](https://github.com/arfogg) directly. Any input on the code / issues found are greatly appreciated and will help to improve the software.

## Ongoing tasks
Code Improvement
- [ ] Make an overarching function to run all commands
- [ ] Make code PEP 8 compliant
- [ ] Make code importable in one statement e.g. import "bi_multi_variate_eva as BiEVA"
- [x] Create fit_params class

Organisation
- [ ] Make subfolders for Bi- and Multi- EVA
- [ ] Create requirements.txt with packages and versions
- [ ] Note requirements.txt in README
- [ ] Include Dáire acknowledgement statement

## Table of Contents
Expand All @@ -42,26 +29,38 @@ Organisation
* [(3) Extract extrema](#3-extract-extrema)
* [(4) Fit a model to the extrema](#4-fit-a-model-to-the-extrema)
* [(5) Transform extrema data to uniform margins](#5-transform-extrema-data-to-uniform-margins)
* [(6) Fit a copula to both sets of extrema](#6-fit-a-copula-to-both-sets-of-extrema)
* [(7) Take a sample from the copula](#7-take-a-sample-from-the-copula)
* [(8) Plot diagnostic to assess copula fit](#8-plot-diagnostic-to-assess-copula-fit)
* [(9) Plot return period as a function of two variables](#9-plot-return-period-as-a-function-of-two-variables)
* [(6) Bootstrapping the extrema](#6-bootstrapping-the-extrema)
* [(7) Fit a copula to both sets of extrema](#7-fit-a-copula-to-both-sets-of-extrema)
* [(8) Take a sample from the copula](#8-take-a-sample-from-the-copula)
* [(9) Plot diagnostic to assess copula fit](#9-plot-diagnostic-to-assess-copula-fit)
* [(10) Plot return period as a function of two variables](#10-plot-return-period-as-a-function-of-two-variables)
- [Multivariate Analysis](#multivariate-analysis)
- [Acknowledgements](#acknowledgements)

## Required Packages

scipy, numpy, matplotlib, pandas, copulas

Install copulas using pip or see documentation [here](https://pypi.org/project/copulas/)
See [environment.yml](environment.yml) for details.

Install copulas using pip or see documentation [here](https://pypi.org/project/copulas/).

## Installing the code
## Using the code

First, the code must be downloaded using `git clone https://github.com/arfogg/bi_multi_variate_eva`
Download the code:
```python
git clone https://github.com/arfogg/bi_multi_variate_eva
```

Importing:
```python
from bi_multi_variate_eva import *
```

## Bivariate Analysis

An example walkthrough of running the Bivariate Extreme Value Analysis on two variables x and y. For theory, a recommended text is: Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer.

#### (1) Getting your data ready

Make sure there are no datagaps in your timeseries. You can either remove the rows, or interpolate. This requires an data-expert user decision so is **not** included in this package. You must do this before using the package. For some functions, the data is parsed as a pandas.DataFrame.
Expand All @@ -71,93 +70,132 @@ Make sure there are no datagaps in your timeseries. You can either remove the ro
The function `plot_extremal_dependence_coefficient` within `determine_AD_AI` creates a diagnostic plot to examine asymptotic dependence/independence.

For example:

```python
determine_AD_AI.plot_extremal_dependence_coefficient(x, y, "X", "Y", "(units)", "(units)")
fig, ax_data, ax_data_unif, ax_edc, chi, chi_lower_q, chi_upper_q = \
determine_AD_AI.plot_extremal_dependence_coefficient(x, y, x_bs_um, y_bs_um, n_bootstrap,
"X", "Y", "(units)", "(units)")
```

Timeseries (`np.array` or `pd.Series`) of x and y are parsed, with their bootstraps (transformed to uniform margins), number of bootstraps and strings for plot labels.

#### (3) Extract extrema

Extract extremes for both X and Y using `detect_extremes.find_block_maxima`. Analysis on points above threshold maxima yet to be implemented.
Extract extremes for both X and Y using `detect_extremes.find_joint_block_maxima`. Analysis on points above threshold maxima yet to be implemented.

For example:
```python
x_extremes_df=detect_extremes.find_block_maxima(df,'x',df_time_tag='datetime',block_size=block_size,extremes_type='high')
y_extremes_df=detect_extremes.find_block_maxima(df,'y',df_time_tag='datetime',block_size=block_size,extremes_type='high')
empty_blocks, x_extreme_t, x_extreme, y_extreme_t, y_extreme = \
detect_extremes.find_joint_block_maxima(data_df, 'x', 'y')

x_extremes_df = pd.DataFrame({'datetime':x_extreme_t, 'extreme':x_extreme})
y_extremes_df = pd.DataFrame({'datetime':y_extreme_t, 'extreme':y_extreme})
```

A dataframe of evenly sampled x and y are parsed, with their respective dataframe column names. These are transformed to individual parameter DataFrames.


#### (4) Fit a model to the extrema

Fit a GEVD or Gumbel distribution to both sets of extrema (i.e. for x and y) using `fit_model_to_extremes.fit_gevd_or_gumbel`.
Fit a GEVD or Gumbel distribution to both sets of extrema (i.e. for x and y) using `gevd_fitter` class.

For example:
```python
x_gevd_fit_params=fit_model_to_extremes.fit_gevd_or_gumbel(x_extremes_df, 'BM', 'high','extreme',df_time_tag='datetime',fitting_type='Emcee', block_size=block_size)
y_gevd_fit_params=fit_model_to_extremes.fit_gevd_or_gumbel(y_extremes_df, 'BM', 'high','extreme',df_time_tag='datetime',fitting_type='Emcee', block_size=block_size)
x_gevd_fit = gevd_fitter(x_extremes_df.extreme)
y_gevd_fit = gevd_fitter(y_extremes_df.extreme)
```

By initialising the `gevd_fitter` class, a GEVD or Gumbel model is fit to the extrema. Fitting information is stored in the object.

#### (5) Transform extrema data to uniform margins

Transform x and y extrema from data scale (as it looks on the instrument) to uniform margins empirically using `transform_uniform_margins.transform_from_data_scale_to_uniform_margins_empirically` or using the cumulative distribution function with `transform_uniform_margins.transform_from_data_scale_to_uniform_margins_using_CDF`.
Transform x and y extrema from data scale (as it looks on the instrument) to uniform margins. This happens within the `gevd_fitter` class.

You can plot a diagnostic about the transformation of one of the variables using `transform_uniform_margins.plot_diagnostic`.

#### (6) Bootstrapping the extrema

To facilitate error calculation, we bootstrap the extrema. For each of the N bootstraps, a random selection of indices from between 0 to n_extrema-1 is chosen (where n_extrema is the number of extrema in each dataset). This set of indices is used to select points from both x and y. This ensures joint selection, so we retain the physical link between x and y.

For example:
```python
# Empirically
x_extremes_unif_empirical=transform_uniform_margins.transform_from_data_scale_to_uniform_margins_empirically(x_extremes_df.extreme)
y_extremes_unif_empirical=transform_uniform_margins.transform_from_data_scale_to_uniform_margins_empirically(y_extremes_df.extreme)
# Using cumulative distribution function
x_extremes_unif=transform_uniform_margins.transform_from_data_scale_to_uniform_margins_using_CDF(x_extremes_df.extreme, x_gevd_fit_params,distribution=x_gevd_fit_params.distribution_name[0])
y_extremes_unif=transform_uniform_margins.transform_from_data_scale_to_uniform_margins_using_CDF(y_extremes_df.extreme, y_gevd_fit_params,distribution=y_gevd_fit_params.distribution_name[0])
x_bootstrap = np.full((n_extrema, N), np.nan)
y_bootstrap = np.full((n_extrema, N), np.nan)

for i in range(N):
# Select indices to get bootstraps from
ind = np.random.choice(np.linspace(0, n_extrema-1, n_extrema), n_extrema)
x_bootstrap[:, i] = x_extremes_df.extreme.iloc[ind]
y_bootstrap[:, i] = y_extremes_df.extreme.iloc[ind]
```

You can plot a diagnostic about the transformation of one of the variables using `transform_uniform_margins.plot_diagnostic`:
By then using a `bootstrap_gevd_fit` object, GEVD or Gumbel fits are estimated for each bootstrap.

```python
fig_um_x,ax_um_x=transform_uniform_margins.plot_diagnostic(x_extremes_df.extreme, x_extremes_unif_empirical, x_extremes_unif, x_gevd_fit_params, 'X')
x_bs_gevd_fit = bootstrap_gevd_fit(x_bootstrap, x_gevd_fit)
y_bs_gevd_fit = bootstrap_gevd_fit(y_bootstrap, y_gevd_fit)
```

#### (6) Fit a copula to both sets of extrema
#### (7) Fit a copula to both sets of extrema

Fit a copula to x and y extrema using `fit_copula_to_extremes.fit_copula_bivariate`.

For example:
```python
copula=fit_copula_to_extremes.fit_copula_bivariate(x_extremes_unif, y_extremes_unif, 'X', 'Y')
copula = fit_copula_to_extremes.fit_copula_bivariate(x_extremes_unif, y_extremes_unif, 'X', 'Y')
```

#### (7) Take a sample from the copula

Using your copula from (6), extract a sample, e.g.: `copula_sample=copula.sample(100)`.
#### (8) Take a sample from the copula

Using your copula from (6), extract a sample, e.g.: `copula_sample = copula.sample(100)`.

Transform that sample back to data scale:
```python
x_sample_in_data_scale=transform_uniform_margins.transform_from_uniform_margins_to_data_scale(copula_sample[:,0], x_gevd_fit_params)
y_sample_in_data_scale=transform_uniform_margins.transform_from_uniform_margins_to_data_scale(copula_sample[:,0], y_gevd_fit_params)
x_sample = transform_uniform_margins.\
transform_from_uniform_margins_to_data_scale(copula_sample[:, 0], x_gevd_fit)
y_sample = transform_uniform_margins.\
transform_from_uniform_margins_to_data_scale(copula_sample[:, 0], y_gevd_fit)
```

#### (8) Plot diagnostic to assess copula fit
#### (9) Plot diagnostic to assess copula fit

To plot histograms of the copula in data scale (with GEVD/Gumbel fitted to observed extrema overplotted) and on uniform margins, use `transform_uniform_margins.plot_copula_diagnostic`.

For example:
```python
fig_copula_1d,ax_copula_1d=transform_uniform_margins.plot_copula_diagnostic(copula_sample[:,0], copula_sample[:,1], x_sample_in_data_scale, y_sample_in_data_scale, x_gevd_fit_params, y_gevd_fit_params, 'X', 'Y')
fig_copula_1d, ax_copula_1d = transform_uniform_margins.\
plot_copula_diagnostic(copula_sample[:, 0], copula_sample[:, 1],
x_sample, y_sample, x_gevd_fit, y_gevd_fit,
'X', 'Y')
```

Alternatively, to compare the 2D distributions of the observed extrema and copula sample, use `fit_copula_to_extremes.qualitative_copula_fit_check_bivariate`.

For example:
```python
fig_copula_2d,ax_copula_2d=fit_copula_to_extremes.qualitative_copula_fit_check_bivariate(x_extremes_df.extreme, y_extremes_df.extreme, x_sample_in_data_scale, y_sample_in_data_scale, 'X', 'Y')
fig_copula_2d, ax_copula_2d = fit_copula_to_extremes.\
qualitative_copula_fit_check_bivariate(x_extremes_df.extreme, y_extremes_df.extreme,
x_sample, y_sample, 'X', 'Y')
```

#### (9) Plot return period as a function of two variables
#### (10) Plot return period as a function of two variables

To plot the return period as a function of x and y, with standard contours.

For example:
```python
fig_return_period,ax_return_period=calculate_return_periods_values.plot_return_period_as_function_x_y(copula,np.nanmin(x_extremes_df.extreme),np.nanmax(x_extremes_df.extreme),np.nanmin(y_extremes_df.extreme),np.nanmax(y_extremes_df.extreme),'X','Y', x_gevd_fit_params, y_gevd_fit_params, 'X (units)', 'Y (units)', n_samples=1000,block_size=block_size)
fig_rp, ax_rp = calculate_return_periods_values.\
plot_return_period_as_function_x_y(copula,
np.nanmin(x_extremes_df.extreme),
np.nanmax(x_extremes_df.extreme),
np.nanmin(y_extremes_df.extreme),
np.nanmax(y_extremes_df.extreme),
'X', 'Y', 'X (units)', 'Y (units)',
bs_copula_arr, N)
```

Where bs_copula_arr is a list of copulae fit to each bootstrap, which is used to calculate confidence intervals.

## Multivariate Analysis

To be completed
Expand Down
3 changes: 3 additions & 0 deletions __init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,11 @@
Created on Tue Aug 6 11:36:17 2024
@author: A R Fogg
Initialisation file which imports code.
"""

# Import package code
from .src.bivariate import bootstrap_data
from .src.bivariate import calculate_return_periods_values
from .src.bivariate import detect_extremes
Expand Down
9 changes: 9 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
dependencies:
- python=3.8.8
- pip
- scipy=1.10.1
- numpy=1.23.5
- matplotlib=3.7.5
- pandas=1.5.2
- pip:
- copulas=0.9.0
Loading

0 comments on commit 7dafc93

Please sign in to comment.