# WP1 Mathematical and Statistical Issues

We want to explore to what extent this hypothesis of local time equilibrium (to slow forcings) can be relaxed, and how changes in the behavior of the climate system can be detected. In particular, we want to address the question whether a climate change is a “global shift” of the attractor (with noqualitative change of its properties), or a local or global deformation with qualitative changes. Specifically, we want to qualify climate change in terms of dynamical system theory [Guckenheimer and Holmes, 1983; Wiggins, 1990], by detecting and qualifying potential bifurcations.

Thus it is necessary to use an appropriate mathematical and statistical framework to tackle this question. This workpackage aims at building and consolidating the heuristic and pragmatic approach of flow analogues [Lorenz, 1969] and providing guidelines for their use, in terms of size of reference dataset, size of domain and metric.

The paradigm of flow analogues can be summarized as follows (or see Figure 3). We assume that a trajectory of a dynamical system (or an ensemble of trajectories) is observed during a reference period R (for reference), and that this trajectory is a good proxy of the underlying attractor. For a distinct time interval T, we determine, for each t in T, the states of the reference trajectory (in R) that are closest to the state reached at target time t (in T, for target). The closest states found during the reference R are called the analogues of the target T. We refer to flow analogues when working on variables representing atmospheric motion. There are several ways of obtaining closest analogues: by minimizing a distance or maximizing a correlation. The analogue determination hence strongly depends on the criterion to be optimized and there is no objective reason to prefer one method over another [Toth, 1991a], although the results can be different. We will use standard spatial distances (Euclidean, Mahalanobis, Kullback-Leibler) and correlation types (linear, rank) that are adapted to mean variations, and we will explore other types of distances that are more relevant to extreme variations (e.g., madogram: [Cooley et al., 2006]).

An important caveat of this method, stemming from the theory of dynamical systems, is that there is no guarantee that good analogues can be found in finite time because the large number of degrees of freedom of the planetary climate is large enough that it is unlikely to have been sampled [Lorenz, 1969; Toth, 1995; Van den Dool, 1983]. This difficulty is overcome by choosing smaller geographical domains and reducing the number of spatial degrees of freedom. Such domains can cover Europe (or the North Atlantic), the Arctic region, etc. Heuristic tests of the sensitivity to analogues to the domain definition are seldom done. We will first define “target” regions (Europe, Arctic, North America and Asia) and examine systematically the sensitivity of the analogue computations to the size of the regions. Such an investigation is crucial for Asian domains, which can be influenced by air flows from the Siberian anticyclone, El Niño and the monsoon, which have their own lives and spatial extension. This will provide an empirical base for a geographical domain definition on which the flow analogues are computed. The climate attractor sampling question is also less acute now (compared to the epoch of the seminal paper of [Lorenz, 1969]), because of the availability of ensembles of very long climate model simulations (covering several centuries).

In practice for climate applications, we compute the best flow analogues of all time steps of the system, and then we compare the dates of the analogues and their scores. This can be done in two ways for climate models: either by comparing “historical” simulations (i.e. with time varying forcings) to a control simulations (with all forcings fixed), or sorting time varying analogues in historical simulations or reanalyses.

The rationale of this diagnostic is that, under a hypothesis of ergodicity [Manneville, 2004], if the climate attractor does not change shape, then the dates of best analogues and their scores should be uniformly distributed, with no trend. The statistical assessment of this uniformity will be performed on simple models [Lorenz, 1963], quasi-geostrophic models that have a chaotic behavior [Molteni, 2003], and control simulations of the CMIP5/PMIP3 ensembles that have a daily time resolution (or more).

The important methodological step is to detect trends or persisting outliers in the dates and scores of analogues when the system yields time-varying forcings. This will be done heuristically from idealized models and full size climate models in which the forcings are known. We will devise a test for analogue trend detection, by bootstrapping the data. By such a test, we can provide p-values of this attractor deformation by assessing the extent to which the observed field belongs to the distribution of its analogues. This statistical development is new but essential to assess how good are analogues, or whether trajectories of a dynamical system indeed shadow [Ghil et al., 2002] the underlying attractor.

Meteorological events are often the results of sequences of synoptic atmospheric conditions. It can hence be useful to consider analogues of sequences of atmospheric variables. For instance, it can be interesting to look for analogues of five consecutive days. Of course, the length of this “windowing” can reduce the scores of the analogues, because one has to find sequence of days that have globally similar patterns, rather that single days. But considering such windows also constrains the dynamical features of the field, especially on the derivative because using windows of more than one day gives a “direction” to the atmospheric field. We will hence make numerical tests on the optimal choice of the window size of analogues, to achieve a trade-off between good analogue scores, and the dynamical smoothness of the computed analogues.

The reason for the formalism of the reference R and the target T sets is that they can stem from different sources. For instance, the reference R can be a long control simulation (e.g. 1000 years) from a climate model for which all the slow components have smoothed out. And the target T can cover a climate projection from an IPCC scenario, with a different model. This flexibility offers a wide range of analogue analysis combinations, with different applications that will be explored in the second part of the project.