Python package to run bivariate and multivariate extreme value analysis on generic data - work in progress. Bivariate analysis is described in detail by Fogg et al (2024) currently under review at AGU journal Space Weather. Preprint available:
Fogg, A. R., D. Healy, C. M. Jackman, et al (2024). Bivariate Extreme Value Analysis for Space Weather Risk Assessment: solar wind - magnetosphere driving in the terrestrial system. ESS Open Archive. doi: 10.22541/essoar.172612544.43585872/v1
License: CC0-1.0
Support: please create an issue or contact arfogg directly. Any input on the code / issues found are greatly appreciated and will help to improve the software.
- Required Packages
- Installing the code
- Bivariate Analysis
- (1) Getting your data ready
- (2) Checking for Asymptotic Dependence
- (3) Extract extrema
- (4) Fit a model to the extrema
- (5) Transform extrema data to uniform margins
- (6) Bootstrapping the extrema
- (7) Fit a copula to both sets of extrema
- (8) Take a sample from the copula
- (9) Plot diagnostic to assess copula fit
- (10) Plot return period as a function of two variables
- Multivariate Analysis
- Acknowledgements
scipy, numpy, matplotlib, pandas, copulas
See environment.yml for details.
Install copulas using pip or see documentation here.
Download the code:
git clone https://github.com/arfogg/bi_multi_variate_eva
Importing:
from bi_multi_variate_eva import *
An example walkthrough of running the Bivariate Extreme Value Analysis on two variables x and y. For theory, a recommended text is: Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer.
Make sure there are no datagaps in your timeseries. You can either remove the rows, or interpolate. This requires an data-expert user decision so is not included in this package. You must do this before using the package. For some functions, the data is parsed as a pandas.DataFrame.
The function plot_extremal_dependence_coefficient
within determine_AD_AI
creates a diagnostic plot to examine asymptotic dependence/independence.
For example:
fig, ax_data, ax_data_unif, ax_edc, chi, chi_lower_q, chi_upper_q = \
determine_AD_AI.plot_extremal_dependence_coefficient(x, y, x_bs_um, y_bs_um, n_bootstrap,
"X", "Y", "(units)", "(units)")
Timeseries (np.array
or pd.Series
) of x and y are parsed, with their bootstraps (transformed to uniform margins), number of bootstraps and strings for plot labels.
Extract extremes for both X and Y using detect_extremes.find_joint_block_maxima
. Analysis on points above threshold maxima yet to be implemented.
For example:
empty_blocks, x_extreme_t, x_extreme, y_extreme_t, y_extreme = \
detect_extremes.find_joint_block_maxima(data_df, 'x', 'y')
x_extremes_df = pd.DataFrame({'datetime':x_extreme_t, 'extreme':x_extreme})
y_extremes_df = pd.DataFrame({'datetime':y_extreme_t, 'extreme':y_extreme})
A dataframe of evenly sampled x and y are parsed, with their respective dataframe column names. These are transformed to individual parameter DataFrames.
Fit a GEVD or Gumbel distribution to both sets of extrema (i.e. for x and y) using gevd_fitter
class.
For example:
x_gevd_fit = gevd_fitter(x_extremes_df.extreme)
y_gevd_fit = gevd_fitter(y_extremes_df.extreme)
By initialising the gevd_fitter
class, a GEVD or Gumbel model is fit to the extrema. Fitting information is stored in the object.
Transform x and y extrema from data scale (as it looks on the instrument) to uniform margins. This happens within the gevd_fitter
class.
You can plot a diagnostic about the transformation of one of the variables using transform_uniform_margins.plot_diagnostic
.
To facilitate error calculation, we bootstrap the extrema. For each of the N bootstraps, a random selection of indices from between 0 to n_extrema-1 is chosen (where n_extrema is the number of extrema in each dataset). This set of indices is used to select points from both x and y. This ensures joint selection, so we retain the physical link between x and y.
For example:
x_bootstrap = np.full((n_extrema, N), np.nan)
y_bootstrap = np.full((n_extrema, N), np.nan)
for i in range(N):
# Select indices to get bootstraps from
ind = np.random.choice(np.linspace(0, n_extrema-1, n_extrema), n_extrema)
x_bootstrap[:, i] = x_extremes_df.extreme.iloc[ind]
y_bootstrap[:, i] = y_extremes_df.extreme.iloc[ind]
By then using a bootstrap_gevd_fit
object, GEVD or Gumbel fits are estimated for each bootstrap.
x_bs_gevd_fit = bootstrap_gevd_fit(x_bootstrap, x_gevd_fit)
y_bs_gevd_fit = bootstrap_gevd_fit(y_bootstrap, y_gevd_fit)
Fit a copula to x and y extrema using fit_copula_to_extremes.fit_copula_bivariate
.
For example:
copula = fit_copula_to_extremes.fit_copula_bivariate(x_extremes_unif, y_extremes_unif, 'X', 'Y')
Using your copula from (6), extract a sample, e.g.: copula_sample = copula.sample(100)
.
Transform that sample back to data scale:
x_sample = transform_uniform_margins.\
transform_from_uniform_margins_to_data_scale(copula_sample[:, 0], x_gevd_fit)
y_sample = transform_uniform_margins.\
transform_from_uniform_margins_to_data_scale(copula_sample[:, 0], y_gevd_fit)
To plot histograms of the copula in data scale (with GEVD/Gumbel fitted to observed extrema overplotted) and on uniform margins, use transform_uniform_margins.plot_copula_diagnostic
.
For example:
fig_copula_1d, ax_copula_1d = transform_uniform_margins.\
plot_copula_diagnostic(copula_sample[:, 0], copula_sample[:, 1],
x_sample, y_sample, x_gevd_fit, y_gevd_fit,
'X', 'Y')
Alternatively, to compare the 2D distributions of the observed extrema and copula sample, use fit_copula_to_extremes.qualitative_copula_fit_check_bivariate
.
For example:
fig_copula_2d, ax_copula_2d = fit_copula_to_extremes.\
qualitative_copula_fit_check_bivariate(x_extremes_df.extreme, y_extremes_df.extreme,
x_sample, y_sample, 'X', 'Y')
To plot the return period as a function of x and y, with standard contours.
For example:
fig_rp, ax_rp = calculate_return_periods_values.\
plot_return_period_as_function_x_y(copula,
np.nanmin(x_extremes_df.extreme),
np.nanmax(x_extremes_df.extreme),
np.nanmin(y_extremes_df.extreme),
np.nanmax(y_extremes_df.extreme),
'X', 'Y', 'X (units)', 'Y (units)',
bs_copula_arr, N)
Where bs_copula_arr is a list of copulae fit to each bootstrap, which is used to calculate confidence intervals.
To be completed
ARF gratefully acknowledges the support of Irish Research Council Government of Ireland Postdoctoral Fellowship GOIPD/2022/782.
CMJ, MJR, SCM, and SJW were supported by Science Foundation Ireland award 18/FRL/6199.