The output of climate simulations are data sets of relevant atmospheric and surface variables. Using these data for various purposes, e.g., for analysis of processes, as input for impact models or for estimating projected changes in order to define adaptation pathways, requires comprehensive analysis and understanding of the data, their validity and representativeness. Some issues are explained in the CORDEX terms of use (see also http://www.data.euro-cordex.net), others are described in the Chapter Interpreting regional climate projections. Helpful advice on how to use climate model output can also be found in :cite:`Kreienkamp2012`, where the guidelines ('Leitlinien') are elaborated from a German federal state expert discussion (http://klimawandel.hlug.de/?id=448). The discussion paper is available in both German and English.
The World Meteorological Organization states that climate can be defined as the statistical description in terms of the mean and variability of relevant quantities over a period of time. This period of time has typically been defined as 30 years. (http://www.wmo.int/pages/prog/wcp/ccl/faqs.php#q1). Therefore, climate is the statistical description of weather at a location and describes the likelihoods for a range of states and phenomena. Examples for statistical quantities related to climate are mean or standard deviation, but also return-periods and intensity-duration-frequency are frequently used to provide a picture of extreme events.
According to the WMO, climate change refers to a statistically significant variation in either the mean state of the climate or in its variability, persisting for an extended period (typically decades or longer) (http://www.wmo.int/pages/prog/wcp/ccl/faqs.html). The IPCC relates their definition to the one of the UNFCCC in the following way: Climate change in IPCC usage refers to a change in the state of the climate that can be identified (e.g. using statistical tests) by changes in the mean and/or the variability of its properties, and that persists for an extended period, typically decades or longer. It refers to any change in climate over time, whether due to natural variability or as a result of human activity. This usage differs from that in the United Nations Framework Convention on Climate Change (UNFCCC), where climate change refers to a change of climate that is attributed directly or indirectly to human activity that alters the composition of the global atmosphere and that is in addition to natural climate variability observed over comparable time periods. (http://www.ipcc.ch/publications_and_data/ar4/syr/en/mains1.html).
Climate variability is defined as variations of climate on all temporal and spatial scales, beyond individual weather events. Variability is mainly due to natural internal processes within the climate system (internal variability) or variations in natural or anthropogenic external factors (external variability) (http://www.wmo.int/pages/prog/wcp/ccl/faqs.php). Internal variability arises from chaotic processes in the climate system and nonlinear interactions between its components, i.e., atmosphere, hydrosphere including cryosphere, biosphere and pedosphere. It is typically most pronounced on small spatial and short temporal scales, but is also relevant over multi-decadal time scales for regional and global climate projections (e.g. Hawkins & Sutton, 2011). External variability involves factors external to the climate system. These include natural factors such as solar variability, orbital variations or volcanic eruptions, but also anthropogenic forcings like emissions of greenhouse gases and aerosols into the atmosphere and land use changes.
Whether we observe or simulate climate change trends or climate variability can be tested with suitable statistical tools. The results may be different for different meteorological parameters, phenomena and derived extreme events. Any attribution of already observed or projected changes to human influences must be investigated with care as individual events cannot be directly attributed to human-induced climate change and even sequences of anomalous events might be within the bounds of natural variability (see http://www.wmo.int/pages/prog/wcp/ccl/faqs.html). Only when persistent series of anomalous events - with respect to the context of broader changes in regional climate parameters - is observed may a human-induced climate change be suggested. One special case is a sequence of record-breaking events, as variables that are independent and identically distributed (iid; i.e. a null-distribution for a stationary series) have well-defined probabilities for the recurrence of record-events (see http://onlinelibrary.wiley.com/doi/10.1029/2008EO410002/pdf). For examples on the attribution of past changes to human-induced climate change refer to, e.g., IPCC AR5 (Chapter 10).
Climate scenarios (or climate projections) are representations of various possible future states of the climate system, based on numerical model simulations. These models describe the complex processes and interactions affecting the climate system, but also use information about anthropogenic climate forcing. Different factors of anthropogenic activity like socio-economic, technological, demographic and environmental development are characterized in climate models as equivalent changes in greenhouse gas concentrations as well as changes in land use and land cover (However, land use and land cover changes are mainly incorporated in global models and therefore we focus here on greenhouse gas concentrations). Since the future evolution of anthropogenic factors cannot be known in advance, their potential effects are explored through different scenarios describing several possible emission (and thus greenhouse gas concentration) pathways. When performing a climate simulation, the chosen emission scenario provides forcing data for the climate model, resulting in the physical reaction of the climate system to that particular future anthropogenic forcing. Due to this forcing-dependent character, climate model outcomes are not interpreted as forecasts (known as an initial value problem in mathematics), but as projections based on a specific emission scenario (a boundary value problem in mathematics). The importance of the emission scenario choice can be evaluated using an ensemble of climate projections (see also How should an ensemble of climate projections be used?) - a set of parallel simulations with slight variations in the experimental setup (e.g. slightly different starting point or different model). Two sets of emission scenarios were used in the general circulation model (GCM) simulations that provided the basis for the last three assessment reports of the IPCC (2001, 2007, 2014). These are the so-called SRES (Special Report on Emissions Scenarios; Nakicenovic et al., 2000; AR3 & AR4) and RCP (Representative Concentration Pathways; Moss et al., 2008; AR5) scenarios. The EURO-CORDEX ensemble is based on the RCP scenarios, only. For more details on the differences between RCP and SRES scenarios see the Appendix of this document or the publication Climate change, impacts and vulnerability in Europe 2016 (EEA, 2016, http://www.eea.europa.eu/publications/climate-change-impacts-and-vulnerability-2016).
Numerical climate models are used to project the possible future evolution of the climate system as well as to understand the climate system itself. They are built on mathematical descriptions of the governing physical processes of the climate system (e.g., momentum, mass and energy conservation, etc.). Numerical solutions of the underlying equations are then obtained based on numerical algorithms. General circulation models (GCMs) are global numerical climate models which are used to study climate change on a global scale. They describe various components of the Earth system and the nonlinear interactions and feedbacks between them. In order to simulate the past climate, measured values are used as forcing data, whereas for future projections values from particular emission scenarios are employed (see also What are climate scenarios?). Due to the large number of data points and the high complexity of GCMs, their integration requires a large amount of computational resources. The resolution of their horizontal mesh currently ranges from 100-500 km and they provide output with a 6-hour temporal frequency. Due to this relatively coarse horizontal and temporal scale, GCMs are insufficient for many aspects of regional and local scale estimates of climate variability and change. Therefore, downscaling is needed to describe the local consequences of the global change, which can be done using empirical-statistical downscaling (ESD) or dynamical downscaling by means of regional climate models (RCMs), also referred to as limited area models (LAMs). LAMs have been widely and successfully used in weather forecasting since the 1970s. Their application for climate purposes started in the 1990s. RCMs are used to downscale GCM simulations using the GCM output data as lateral boundary conditions. RCM integrations are typically run at 10-50 km horizontal resolution over a specific region of interest (e.g., over Europe in case of EURO-CORDEX). Through a combination of explicitly resolving important processes (e.g., mountain circulations, land-ocean contrasts) and parameterization schemes adapted to higher resolutions, RCMs are able to provide more detailed characteristics of regional to local climate. Another method to derive regional to local climate information from GCMs is Empirical Statistical Downscaling (ESD). ESD exploits the dependency between large and small scales of different climate variables such as temperature and precipitation.
The application of RCMs requires high-level expertise and a considerable investment in human and computing resources. As such, the use of RCMs has to be well motivated in terms of their added value (AV) with respect to the driving global model, scientific questions and intended downstream applications. The same is true for costly high-resolution RCM integrations (e.g., EUR-11 or higher resolved) that should provide AV compared to their low-resolution counterparts (e.g. EUR-44). We focus here on the first aspect (RCM versus GCM) and also explicitly leave out the question to what extent RCM-based applications could be replaced or complemented by computationally cheaper statistical downscaling methods. AV of RCMs can be verified in two different aspects, which are partly dependent on each other but do not necessarily coincide: (1) A better representation of the present-day climate, and (2) a more accurate projection of future climate change. As GCMs and RCMs mostly share similar computational codes, AV basically arises from the fact that RCMs employ a much finer grid spacing. However, depending on the metric employed and on the specific type of comparison AV will not always be found. This is in particular true for mean features over large spatio-temporal scales (such as seasonal mean values averaged over larger domains) that can in principle also be well represented by coarse-resolution models. AV can primarily be expected for meso-scale atmospheric phenomena (e.g., Feser et al., 2011), for regional-scale spatial climate variability and its future changes, especially in regions of complex surface forcing (topography, land use, land-sea contrast etc.; e.g., Di Luca et al., 2012; Giorgi et al., 2016; Kotlarski et al., 2015; Torma et al., 2015) and for the tails of frequency distributions at high temporal resolution (e.g., for daily extremes; Jacob et al., 2014). In general, AV is more likely to occur for precipitation than for temperature (Di Luca et al., 2013). As resolutions are pushed towards scales where critical processes are explicitly resolved, additional benefits are seen. For example, convection-resolving RCM simulations at kilometer-resolution have shown additional AV in terms of the daily cycle of summer precipitation and sub-daily precipitation extremes (Ban et al., 2014; Prein et al., 2013). Besides benefits at high temporal and spatial scales, there are also strong indications that RCMs can improve on their driving GCMs for aggregated large-scale mean values that are, in principle, also resolved by GCMs themselves (Kerkhoff et al., 2014; Torma et al., 2015). Whether this translates into a better representation of present and future climate is, however, not necessarily clear. Despite obvious advantages of RCMs for many aspects of present-day climate and climate change patterns, it should be noted that any RCM-based climate scenario depends to some extend on its driving GCM. The quality and accuracy of a regional climate change scenario then is determined by both the RCM and the driving GCM. Considering only one RCM-GCM combination represents only one of very many potential outcomes. To sample the range of potential outcomes, and uncertainty associated with particular RCMs and/or GCMs, it is necessary to provide ensemble simulations combining different RCMs with different GCMs, as it is done within the CORDEX framework.
Each climate model realization is an incomplete representation of reality. The reason for this is that not all temporal and spatial scales can be resolved and not all processes within the Earth system can be simulated. Processes in the climate system occur on time scales that range from centuries to sub-daily and spatial scales from tens of thousands kilometres to below 1 kilometer. It is impossible to capture them all. Furthermore, several processes and interactions like turbulent exchanges under stable conditions or aerosol life cycles are not yet fully understood and therefore not directly quantifiable in explicit terms (If there is enough data describing these processes, however, it is possible to make use of statistical techniques to quantify some of their aspects). EURO-CORDEX models are operated on the same spatial scales of approximately 12km or 50km but have implemented slightly varying parameterizations of small-scale processes and therefore the results differ. Also the model configuration influences the results. Examples are the implementation of surface characteristics (e.g. land-use information), the number of vertical levels and the numerical scheme used to solve the equations. Other inherent limitations of climate projections are scenario uncertainty because the RCP-scenarios are based on certain assumptions for the future, and internal climate variability, which may be in the range of the analysed time scale of 30 years (Deser et al., 2012). ESD, on the other hand, requires much less computational resources than RCMs and can be applied to large multi-model ensembles and different emission scenarios (Benestad et al., 2016). These limitations and the resulting uncertainty influence the reliability of the results, but since ESD and RCMs make use of different sources of information, combining the results from these strategies can improve confidence. Model results nevertheless have to be used and interpreted carefully and in a manner consistent with their intended purpose. In general it can be stated that climate models are good at simulating the state and trends of the climate system for larger time slices and regions. Special care has to be taken in order to assess whether RCMs can be used to study events occurring on small temporal and spatial scales, e.g., when analysing the state of the climate system for a particular location (i.e., a single grid box) or a special date or a short time period (e.g. single storm events).
The evaluation of the model results aims at analysing the strengths and weaknesses of the global and regional climate models through different statistical (and physical) measures over long periods. Moreover, in case of regional models, their added value can be assessed with respect to the global climate models (see also What is the added value of regional climate models?). In order to evaluate climate model simulations, they have to be integrated for several past decades to be compared against suitable reference climatological data sets (e.g., observations and/or re-analyses data). It has to be noted that the available reference data sets also have shortcomings and should only be applied for purposes they have been intended for. For instance, E-OBS (Klok and Klein-Tank, 2009) is a commonly used gridded dataset for Europe, but since it contains some precipitation gaps, more often homogenized national data sets are taken instead. In case of regional climate models, two types of simulations are conducted for simulating the recent past each serving different purposes: Hindcast simulations: For hindcast simulations, the initial and lateral boundary conditions are provided by a re-analysis product. With these simulations the quality of the regional climate model itself can be evaluated. As explained above the re-analyses are three-dimensional data sets for the whole globe (recently also available for limited domains) based on the blend of a numerical short-term weather forecasts and many kinds of observations. Since the boundary conditions in the hindcast experiment are based on measurements that are a reasonable representation of the true atmospheric state, the evaluation results mainly reflect the weaknesses and strengths of the regional climate model. In addition, shorter time periods can be analysed since the observed year-to-year correlation is preserved. The results of such an evaluation are also used to improve RCMs (e.g., an overestimation of heavy precipitation, indicates the necessity to research on convection parameterization). Historical simulations: For historical simulations, initial and lateral boundary conditions are provided by a GCM. Therefore, the evaluation gives some hints on the GCM-RCM chain behaviour. Long time periods (usually 30 years) should be investigated since this type of experiment is not synchronised with the observed climate. Additionally, the GCM simulation should be investigated to assess whether a bias stems from the GCM or from deficiencies that are attributable to the RCM. This kind of evaluation experiment has great importance, as lateral boundary conditions for future projections are provided by GCMs. Physical consistency test. There are few evaluations of the consistency between the GCM/reanalysis and the embedded RCM which answer some critical statements about their physical consistency. The RCMs and GCMs may for instance employ different choices in the ‘model physics’ (parameterisation schemes) which result in different model solutions. Changes in the precipitation climate, cloudiness and convection will imply a change in the vertical energy flow from the surface to the top of the atmosphere. The question is whether this matters. Closure tests can be used to assess how the RCM and the GCM performed together, e.g., by comparing the aggregated energy and mass fluxes through the top and lateral boundaries of the RCM and corresponding surfaces in the GCM. The question that needs to be answered is whether there is a mismatch in the energy and mass fluxes in the two stages and if so are they related to the biases in a systematic way, or if they can introduce artificial trends. ESD evaluation. The evaluation of ESD needs to make use of different strategies than for RCMs. One is the use of cross-validation (Wilks, 1995), where the data is split into two batches: one for calibrating the statistical models and the other for independent validation. The models’ ability to reproduce the long-term trends is tested by calibrating the models with de-trended data, and then use the original data with any trend embedded to reproduce the original observations. This stage can be combined with the cross-validation for a more stringent test. It is also possible to stratify the data and use the low values to train the model and then use predictions for the high values for validation. The validation of both ESD and RCMs were discussed in the European COST-action VALUE (Maraun et al., 2015) Model outputs are inevitably imperfect, mainly due to the complex nature of the climate system, model shortcomings (i.e. errors) and model approximations (i.e. parameterizations), resulting in biases when compared to reference data sets. For more information on how to deal with such biases see How to interpret and adjust model biases?
Climate models are employed to generate projections of the future climate at multi-decadal to centennial time scales. The simulated temporal evolution of future climate is subject to uncertainties which are tackled by different ensemble simulation strategies. The uncertainties can be grouped into three major categories: (i) scenario uncertainty, (ii) internal climate variability and (iii) model uncertainty (Hawkins and Sutton, 2009, 2011). In the following subsections, these sources of uncertainties and the respective ensemble simulation strategies are shortly described. (i) Scenario uncertainty: External anthropogenic forcings are derived from emission scenarios (see above). The latest generation of climate projections for the 21st century build on Representative Concentration Pathways (RCPs) (Moss et al., 2010). RCPs are defined by different radiative forcing levels at the end of the 21st century. The related temporal evolution of atmospheric greenhouse gas and aerosol concentrations (in some cases emissions) are prescribed in global climate models, which then simulate the response of the climate system to the forcing. By prescribing different forcings according to different pathways, a range of potential future climate evolutions can be projected. A subset of currently four RCPs are used to create a multi-scenario ensemble to cover a large bandwidth of future climate evolutions. For the historical simulations of the 20th century, observed concentrations of atmospheric substances are prescribed in the models. The simulated climate projections are then compared to the historical climate simulations in order to derive projected climate change signals. (ii) Internal climate variability (see above) is simulated by models of the climate system (Deser et al., 2012). Its temporal evolution strongly depends on the initialisation of each model component. To consider different potential evolutions of climate variability, a set of simulations with the same external forcing can be performed, but with slightly different initialisation states. The results of such an initial-condition ensemble lie within a range of equally probable climate evolutions. (iii) Model uncertainty: Models are always simplified representations of the earth’s climate system. Different models apply different physical parameterisations and also different numerical approaches. Those structural differences lead to a range of simulated climate responses to external forcing. They are addressed with multi-model-ensemble simulations (see below). Multi-model ensemble simulations based on a certain scenario, sample modelling uncertainties, but also different initial conditions of the climate system (see Internal climate variability above), as each global model is initialised at a different climate state. Also included under model uncertainty is the fact that different classes of models (e.g. dynamical vs. statistical downscaling) might give different results. Within the EURO-CORDEX initiative, a coordinated multi-model, multi-method, multi-scenario, multi-initial-condition ensemble of downscaled experiments for Europe on 0.11° horizontal resolution has been established (Jacob et al. 2013).
For climate service purposes, it is recommended to use the largest possible model ensemble for evaluation and application of climate model results in order to achieve robust results. Only an ensemble analysis enables to make sensible use of the model-inherent uncertainties for assessing the results. An ensemble of model simulations may consist of different models but only one scenario (multi-model-ensemble), one model and different scenarios (multi-scenario-ensemble), one model and different physical parameterization schemes (multi-physics-ensemble), or one model, one parameterization scheme and different realisations (multi-member-ensemble). There exist several approaches to estimate the uncertainty of an ensemble by defining the bandwidth of the results (see e.g. Déqué et al., 2007). Analysing mean and standard deviation of ensemble members is the simplest method, but possible outliers often have a too large influence. This can be avoided by calculating median and suitable lower and upper percentiles. The percentile analysis can then be translated into likelihood terminology by an exceedance probability after Solomon et al. (Eds., 2007). Methods are described by Knutti et al. (2010). For specific cases and applications it might be useful to reduce the size of the available ensemble by means of subsampling. There are different criteria how such a subsampling can be performed. One criteria could be that based on the evaluation results better model simulations are weighted higher than ones with less quality (see, e.g., Christensen et al., 2010). Another criteria for subsampling could be that the smaller ensemble represents the same range of projected climate change signals as the full ensemble (e.g., refer to IMPACT2C).
The robustness of projected climate changes based on an ensemble of climate simulations is defined in the IPCC Third Assessment Report - Climate Change 2001: Synthesis Report, Question 9: '... a robust finding for climate change is defined as one that holds under a variety of approaches, methods, models, and assumptions and one that is expected to be relatively unaffected by uncertainties.' The verification of robustness is often based on satisfying different conditions. For example, the method applied in the ‘Klimasignalkarten’ (http://www.gerics.de/products_and_publications/maps_visualisation/csm_regional/index.php.en) identifies a projected change as beeing robust if, at least 66 % of all simulations agree in the direction of change and at least 66 % of the simulations pass a suitable statistical significance test (e.g., U-Test or Mann-Whitney-Wilcoxon Test). Other authors define climate change robustness differently. Seaby et al. (2013) apply robustness tests to two different bias-correction methods and length of reference and change periods and do not include the significance tests. Knutti and Sedláček (2013) define the climate change robustness parameter, 'inspired by the ranked probability skill score used in weather prediction, and by the ratio of model spread to the predicted change (noise to signal).'