Climate Dynamics

, Volume 40, Issue 7, pp 1749–1766

The impact of the MJO on clusters of wintertime circulation anomalies over the North American region


    • Climate Prediction CenterNCEP/NWS/NOAA
    • Wyle Science Technology and Engineering
  • Marshall B. Stoner
    • Climate Prediction CenterNCEP/NWS/NOAA
  • Nathaniel C. Johnson
    • International Pacific Research Center, School of Ocean Earth Science and TechnologyUniversity of Hawaii
  • Michelle L. L’Heureux
    • Climate Prediction CenterNCEP/NWS/NOAA
  • Dan C. Collins
    • Climate Prediction CenterNCEP/NWS/NOAA
  • Steven B. Feldstein
    • Department of MeteorologyThe Pennsylvania State University

DOI: 10.1007/s00382-012-1493-y

Cite this article as:
Riddle, E.E., Stoner, M.B., Johnson, N.C. et al. Clim Dyn (2013) 40: 1749. doi:10.1007/s00382-012-1493-y


Recent studies have shown that the Madden–Julian Oscillation (MJO) impacts the leading modes of intraseasonal variability in the northern hemisphere extratropics, providing a possible source of predictive skill over North America at intraseasonal timescales. We find that a k-means cluster analysis of mid-level geopotential height anomalies over the North American region identifies several wintertime cluster patterns whose probabilities are strongly modulated during and after MJO events, particularly during certain phases of the El Niño-Southern Oscillation (ENSO). We use a simple new optimization method for determining the number of clusters, k, and show that it results in a set of clusters which are robust to changes in the domain or time period examined. Several of the resulting cluster patterns resemble linear combinations of the Arctic Oscillation (AO) and the Pacific/North American (PNA) teleconnection pattern, but show even stronger responses to the MJO and ENSO than clusters based on the AO and PNA alone. A cluster resembling the positive (negative) PNA has elevated probabilities approximately 8–14 days following phase 6 (phase 3) of the MJO, while a negative AO-like cluster has elevated probabilities 10–20 days following phase 7 of the MJO. The observed relationships are relatively well reproduced in the 11-year daily reforecast dataset from the National Centers for Environmental Prediction (NCEP) Climate Forecast System version 2 (CFSv2). This study statistically links MJO activity in the tropics to common intraseasonal circulation anomalies over the North American sector, establishing a framework that may be useful for improving extended range forecasts over this region.


The Madden–Julian Oscillation (MJO)Tropical–extratropical connectionsIntraseasonal climate variabilityExtended range predictionCluster analysisModel hindcastsThe Arctic Oscillation (AO)The Pacific/North America pattern (PNA)

1 Introduction

The Madden–Julian Oscillation (MJO) is a large-scale, eastward propagating pattern of tropical convection and atmospheric circulation anomalies. The circulation anomalies circumnavigate the globe in approximately 30–60 days, with the strongest convective signal occurring over the warm waters of the Indian and Pacific Oceans. The MJO is the primary source of variability at intraseasonal timescales in these tropical regions (Zhang 2005). By modulating tropical convection, the MJO can also initiate poleward propagating Rossby waves that impact extratropical weather patterns and influence the leading modes of low-frequency northern hemisphere variability, including the Arctic Oscillation (AO), the North Atlantic Oscillation (NAO) and the Pacific/North American (PNA) teleconnection pattern (e.g., Cassou 2008; Higgins and Mo 1997; Johnson and Feldstein 2010; Lin et al. 2009; L’Heureux and Higgins 2008; Seo and Son 2012). Through these mechanisms, the MJO may provide some degree of enhanced predictability for precipitation and temperature in the northern hemisphere extratropics, especially during the winter months at extended range timescales (~10–30 days) (e.g., Cassou 2008; Jones et al. 2011; Lin et al. 2010a, b; Vitart and Molteni 2010; Zhou et al. 2011).

The purpose of this study is (1) to explore when and for how long tropical MJO activity impacts common intraseasonal climate patterns over North America and the surrounding oceans, (2) to examine how these impacts change during different phases of the El Niño-Southern Oscillation (ENSO) and, (3) to assess how well the National Centers for Environmental Prediction (NCEP) Climate Forecast System model version 2 (CFSv2) captures the observed relationships.

In Sect. 2, we provide some further background on the relationship between the MJO and extratropical climate anomalies. In Sect. 3, we describe our data and methods and introduce a novel approach for optimizing the number of clusters in a k-means cluster analysis. The results of our cluster analysis are presented in Sect. 4, and the relationship between the resulting clusters and the AO/NAO and PNA is examined. In Sect. 5, we present how the occurrence probabilities of three of the clusters are modulated during and after MJO episodes. Only three of the seven clusters are shown because the MJO influence on the remaining clusters is weak. The results are also compared with results for similar clusters that are based exclusively on the AO and PNA indices. The impact of ENSO on the results is examined in Sect. 6. In Sect. 7, we present surface temperature and precipitation signatures for the three clusters over the continental United States, and, in Sect. 8, we evaluate how well CFSv2 hindcasts are able to reproduce the observed modulations in cluster occurrence. Finally, we summarize our conclusions in Sect. 9.

2 Background

Large-scale tropical convection has long been known to influence extratropical climate patterns through Rossby wave propagation and storm track modifications. The mechanisms leading to these relationships have been studied extensively on seasonal timescales (see Trenberth et al. 1998 for a review) as well as on intraseasonal timescales, such as those associated with the MJO (e.g., Ferranti et al. 1990; Higgins and Mo 1997; Johnson and Feldstein 2010; Matthews et al. 2004; Mori and Watanabe 2008; Seo and Son 2012). At both of these timescales, increased (decreased) convection over the equatorial central Pacific is associated, to first order, with an enhancement (reduction) in upper level divergence, an extension (retraction) of the Pacific subtropical jet, and associated modification of mid-latitude storm tracks. If the Rossby wave source, i.e., the sum of vorticity advection by the divergent wind and upper tropospheric horizontal divergence (Sardeshmukh and Hoskins 1988) reaches the subtropical regions of the background westerlies, vorticity anomalies can initiate a poleward propagating Rossby wave train that mediates teleconnections downstream in regions far from the tropical Pacific.

The Rossby wave response to enhanced heating over the tropical Pacific resembles the positive phase of the PNA pattern, the second leading mode of northern hemisphere variability at intraseasonal and interannual timescales. To first order, this PNA-like response can be predicted with simple linearized barotropic models forced with upper tropospheric divergence in the tropical Pacific (e.g., Branstator 1985; Hoskins and Karoly 1981; Seo and Son 2012). Consistent with this basic mechanism, observational studies have found that the positive (negative) phase of the PNA is more common during and after MJO-related enhanced (suppressed) convection over the western and central tropical Pacific (e.g., Ferranti et al. 1990; Higgins and Mo 1997; Johnson and Feldstein 2010; Knutson and Weickmann 1987; Mori and Watanabe 2008). However, non-barotropic mechanisms are also needed to explain the timing, location and amplitude of the PNA response (e.g., Higgins and Mo 1997; Hsu 1996; Trenberth et al. 1998). For example, mid-latitude eddy/mean flow interactions associated with breaking waves have been found to be very important to the amplification and maintenance of the PNA pattern after the initial Rossby wave train is established (e.g., Franzke et al. 2011; Moore et al. 2010).

Recent research has suggested that MJO-related convection can also excite certain phases of the AO and the closely-related North Atlantic Oscillation (NAO), the leading modes of variability in the Northern Hemisphere and the North Atlantic sector, respectively. Studies have shown that the MJO significantly impacts the sign of the AO/NAO several weeks after Rossby wave trains are initiated in the Pacific sector (e.g., Cassou 2008; Lin et al. 2009; L’Heureux and Higgins 2008; Roundy et al. 2010; Zhou and Miller 2005). In contrast, no conclusive impact on the AO has been found from tropical convection anomalies associated with ENSO (L’Heureux and Thompson 2006). The mechanism by which the MJO affects the AO is not completely understood, but is possibly due to interactions between MJO-driven Rossby waves and wave breaking events downstream that impact the subtropical jet strength and position over the North Atlantic (Benedict et al. 2004; Cassou 2008).

A few studies have examined how MJO-related teleconnections change during different phases of ENSO. Moon et al. (2010) and Roundy et al. (2010) both demonstrate that the extratropical response to the MJO is enhanced when MJO-related convection is in phase with heating and convection anomalies associated with ENSO. However, both studies also note that the difference between the El Niño and La Niña teleconnections cannot be explained entirely by a linear superposition of the expected ENSO and MJO signals. Moon et al. (2010) show that the structure of the Rossby wave response is not only weakened when the MJO and ENSO convective signals are out of phase, but also compressed spatially.

There is much interest in determining whether these relationships with the MJO might be used to improve extratropical prediction at lead times up to several weeks. Preliminary studies (e.g., Cassou 2008; Jones et al. 2011; Lin et al. 2010b; Roundy et al. 2010; Vitart and Molteni 2010; Yao et al. 2011) suggest that some predictive skill outside of the tropics may be derived from tropical MJO activity, but these relationships have yet to be fully exploited operationally. Our study contributes to these previous efforts by developing a comprehensive framework with which to examine the impact of the MJO on northern hemisphere tropospheric geopotential height fields. We use a relatively large geographic domain intended to capture the broad spatial extent of northern hemisphere teleconnection patterns, but, by using a cluster analysis, we do not limit ourselves to a particular linear mode of variability. Instead, we examine cluster patterns which represent common combinations of several modes and which are able to represent non-linear structures in the geopotential height data. Since extended range forecasts of temperature and precipitation are closely tied to the predicted geopotential height field, we believe this framework is relevant to the problem of extended range prediction.

3 Methods

3.1 Datasets and indices

To examine how the MJO affects the extratropics, we must first identify episodes when the MJO is active, and summarize the spatial location and propagation of the MJO during these periods. To do this, we use the Wheeler–Hendon multivariate MJO index (Wheeler and Hendon 2004) as provided by the Australian Bureau of Meteorology. The index is derived from the leading two principal components (PCs) in an empirical orthogonal function (EOF) analysis performed on three combined fields: tropical outgoing longwave radiation (OLR), equatorial zonal wind at 850 hPa, and equatorial zonal wind at 200 hPa. Based on the values of these leading PCs, the Wheeler–Hendon (WH) index traces through eight phases as the MJO signature propagates eastward. Between phase 2 and phase 6, for example, a convectively active region propagates from the western Indian Ocean across the maritime continent and into the western Pacific. The OLR and zonal wind composites associated with each phase of the WH index can be found in a number of previously published papers (e.g., Cassou 2008; Johnson and Feldstein 2010; Wheeler and Hendon 2004) and are also available on the NOAA Climate Prediction Center (CPC) website: (

Following L’Heureux and Higgins (2008), we identify active MJO events based on a pentad-averaged version of the WH index. An MJO episode is identified when the following guidelines are met for at least six consecutive pentads: (1) The index amplitude remains primarily above 1.0, though some temporary dips below this threshold may be allowed and (2) The index phase progresses in a counter-clockwise direction without reversing direction or stalling in a particular phase for more than four pentads. Some subjectivity is involved in these determinations.

We perform a k-means cluster analysis on the 500-hPa geopotential height anomalies to identify commonly occurring intraseasonal climate patterns over the North American region. We use the NCEP/NCAR (National Center for Atmospheric Research) reanalysis dataset at 2.5° × 2.5° horizontal resolution (Kalnay et al. 1996). The cluster analysis is performed on 3,962 wintertime days (Dec–Mar) over the years ranging from January 1979 to March 2011. This time period is chosen to match those dates when satellite data is available for assimilation. The domain for the cluster analysis ranges from 20°N to 87.5°N and from 157.5°E to 2.5°W, covering North America and the surrounding ocean basins. This was chosen because it encompasses regions with the strongest MJO response, and because our focus is on prediction over North America. However, we note that a larger domain encompassing the full northern hemisphere extratropics (poleward of 20°N) yields very similar cluster patterns. Anomalies in the 500-hPa geopotential height data are calculated with respect to the daily 1981–2010 reference climatology used in CPC forecasts at the time of publication. Finally, the daily data are smoothed with a 7-day running mean to ensure that the cluster analysis focuses on lower frequency features in the geopotential height field and to match the averaging timescale of NOAA CPC’s extended range 8–14 day climate outlook. For convenience, in the remainder of this paper, the “day” associated with a particular cluster occurrence will always refer to the central day of the 7-day running mean.

At several points in the analysis we look at the correspondence between our cluster patterns and the AO and PNA indices. Daily AO and PNA index values are taken directly from the CPC website. The CPC AO index is calculated as the daily projection of the 1,000-hPa geopotential height pattern onto the leading mode in an EOF analysis of monthly mean 1,000-hPa geopotential height poleward of 20°N. The CPC PNA index is calculated with a rotated principal component analysis (RPCA) of the monthly-mean 500-hPa geopotential height (Barnston and Livesey 1987) in the same domain. To examine the effect of ENSO on MJO-related teleconnections, we need to define El Niño, La Niña and neutral episodes. As indicated on the CPC website, an El Niño (La Niña) episode is identified as taking place when the 3 month running mean of the Niño 3.4 sea surface temperature (SST) anomaly remains above 0.5° (below −0.5°) Celsius for at least five consecutive overlapping seasons.

We also examine cluster composites of surface temperature and precipitation over the United States to make a direct link between the large-scale circulation and the surface. Surface temperature and precipitation composites associated with the clusters are calculated, respectively, based on the gridded daily cooperative dataset of Janowiak et al. (1999), and a new, high resolution analysis of daily rain-gauge precipitation estimates (Xie et al., in preparation).

Finally, a set of 45-day retrospective forecast simulations (hindcasts) from version 2 of NCEP’s Climate Forecast System model (CFSv2) are used to assess how well this model, which was newly operational in 2011, is able to capture the observed relationships with the MJO. The CFSv2 model (Saha et al. in preparation) consists of the NCEP Global Forecast System (GFS) atmospheric model run at T126 (~0.937°) horizontal resolution fully coupled with ocean, sea-ice, and land surface models. The ocean model is the Geophysical Fluid Dynamics Laboratory Modular Ocean Model version 4.0 at 0.25°–0.5° grid spacing. The land surface model (LSM) is the four-layer Noah LSM and the sea ice model is a simple two-layer model. Retrospective forecasts are started at 6-h intervals from 1999 through 2010 and run out for 45 days. Ensemble means are created from the four runs initialized during each 24-h period, and then the output is smoothed with a 7-day running mean to match the smoothed geopotential height fields used in our cluster analysis. The mean model bias with respect to the NCEP/NCAR reanalysis is removed at each lead time by subtracting the difference between the lead-dependent model climatology and the reanalysis climatology. Finally, as before, geopotential height anomalies are calculated relative to a 1981–2010 reference climatology in order to facilitate comparison with the cluster analysis results.

3.2 K-means cluster analysis: a new approach for choosing k

A k-means cluster analysis using Euclidean distance is used to identify commonly occurring patterns of 500-hPa geopotential height. The iterative algorithm seeks to find an optimal partition of the data into k clusters, where members within each cluster are similar to each other, but separated as much as possible from members of other clusters. The number of cluster, k, must be prescribed a priori. The “optimal” solution is the one which minimizes S, the sum of the squared distances between the cluster members and their respective cluster centroids. Wilks (2011) and Michelangeli et al. (1995) among others provide detailed descriptions of the k-means clustering algorithm. The final partition can be sensitive to the algorithm initialization which requires a first guess for the cluster centroids. Thus, the algorithm is repeated 50 times, each time with different initial seeds resulting in 50 different partitions of the data. Of these, the partition is retained that minimizes S, as defined above.

To reduce computational time and focus the analysis on large-scale modes of variability, an EOF analysis is performed in preparation for the cluster analysis. The first 50 EOFs are retained, which together account for more than 98 % of the total variance in the 500 hPa geopotential height field. The k-means cluster analysis is performed in the sub-space spanned by these 50 EOFs, resulting in a partition of the data into k clusters. As previous authors have also found (e.g., Michelangeli et al. 1995), the results of the k-means algorithm are not very sensitive to the number of EOFs retained.

Cluster composites (centroids) are then calculated from the original geopotential height fields by averaging over all members assigned to each cluster. Instead of performing a separate cluster analysis on the CFSv2 hindcasts, each 7-day period in the hindcast dataset is classified into the one of the k previously determined clusters. This classification is done by finding the nearest cluster centroid based on Euclidean distance.

One of the challenges in a k-means cluster analysis is deciding on the optimal number of clusters to use. For datasets with a clear number of distinct, well-separated regimes, the optimal number of clusters may be clearly defined by the data (Christiansen 2007; Michelangeli et al. 1995). For other datasets, the data may be better represented by a continuum of states (see Johnson and Feldstein 2010), and the optimal number of clusters may be less obvious. Our 500-hPa geopotential height data, like many datasets, falls somewhere in between these extremes. It is relatively smooth, but includes high-density pockets which cannot be easily represented in a linear analysis, making the cluster analysis a useful tool. In choosing a value for k, we would like to identify a relatively small number of the clusters that efficiently capture the most important organizational structures in the dataset. Here, we present a new simple and computationally efficient methodology to meet this goal.

The method is described as follows: First, the k-means algorithm is run for several consecutively increasing choices of \( k \), starting with \( k = 1 \). Next, for each choice of \( k \), and each cluster \( j \) (\( 1 \le j \le k \)), the 90th percentile distance-to-centroid is determined. We call this number the “cluster radius” (\( R_{j,k} \)). Figure 1 shows a 2-dimensional illustration of the process (each circle represents a cluster radius \( R_{j,k} \)).
Fig. 1

Two dimensional example of the clustering methodology described in Sect. 3.2 for k = 1 through k = 9 clusters. The radii of the circles are such that they contain 90 % of the points in each cluster. Using this algorithm, the optimal clustering occurs at k = 8, since this is where the points are most efficiently covered with little empty space or overlap

Then, for each choice of \( k \) we compute the volume ratio index:
$$ \sigma_{k} = \sum\limits_{j = 1}^{k} {\left( {\frac{{R_{j,k} }}{{R_{1,1} }}} \right)^{n} } $$
where \( n \) is the dimensionality of the data set. For each value of k, this index calculates how efficiently the k hyperspheres (e.g., circles in Fig. 1) can cover important structures in the dataset. A value of \( \sigma_{k} \) < 1 implies that k clusters is more efficient than a single cluster at covering the data.

In the 2-dimensional case shown in Fig. 1, \( \sigma_{k} \) is proportional to the sum of the areas of the covering circles. In this example, we can see that a single cluster is not very efficient at covering the dataset, since a lot of empty (low-density) space is included inside the circle. As k increases beyond one cluster, the circles are better focused around high-density regions of the data and their summed area decreases. When the value of k gets too large, however, adjacent circles begin to overlap. At some intermediate point, the circles provide good coverage of the data with minimal wasted area. Therefore, we propose that a value of \( k \) corresponding to the first local minimum of \( \sigma_{k} \) (k = 8 in Fig. 1) represents a point where important structures in the data are efficiently represented by k clusters, minimizing empty space and overlap between clusters.

Some caveats with this methodology are:
  1. 1.

    \( \sigma_{k} \) will eventually approach zero as \( k \) is increased to the point where individual clusters contain a very small number of points. Therefore, the first local minimum of \( \sigma_{k} \)should be used, instead of the absolute minimum.

  2. 2.

    If the dimensionality of the dataset is very large, the optimal number of clusters can be unstable, particularly with respect to changes in the number of EOFs retained. Therefore, it may be necessary to reduce the dimensionality of the dataset to minimize this instability, as in the EOF strategy used here. Current work is focused on reducing the sensitivity of the algorithm to the addition of dimensions with very small variance.

  3. 3.

    The method presented above is not always stable and should be repeated multiple times to ensure robustness.


3.3 Cluster occurrence analysis and significance testing

To examine how the frequency of cluster occurrence is modulated in the days and weeks following an MJO event and during different phases of ENSO, we roughly follow the methodology of Cassou (2008). Like Cassou (2008), we examine how the conditional frequency of a cluster occurring under a particular condition X (e.g., 7 days after the MJO is active in phase 1) is elevated or suppressed with respect to the cluster’s climatological occurrence frequency over all 3,962 December–March days. In our case, X always refers to the state of the MJO and/or ENSO. The percent change in frequency, C, is a function of the MJO/ENSO state, X, and the cluster number i:
$$ C(i,X) = 100*\left[ {\frac{{\left( {\frac{{N_{i,X} }}{{N_{X} }} - \frac{{N_{i} }}{{N_{T} }}} \right)}}{{\frac{{N_{i} }}{{N_{T} }}}}} \right], $$
where \( N_{T} \) is 3,962, the total number of days in the study, \( N_{i} \) is the number of times in the study that cluster i occurs, \( N_{X} \) is the total number of days in the study when the MJO/ENSO is in state X, and \( N_{i,X} \) is the number of times that cluster i occurs in state X. C(i, X) is equal to 100 if cluster i occurs twice as frequently under conditions X as it does in the full record, and is equal to −100 if the cluster never occurs under conditions X. C is calculated for a range of states X, including each of the 8 phases of the active MJO at leads ranging from zero to 40 days, and for these same states during active La Niña and El Niño periods only. In all of these cases, the full reference December–March climatology is used for comparison.
To test the significance of the results, we perform a Monte-Carlo simulation which is used to calculate the null distribution of C. To create these synthetic datasets we first generate a cluster transition probability matrix (e.g. Table 1), based on transition frequencies between clusters. For example, in Table 1 there is a 2.9 % conditional probability that Cluster 3 will occur on day n, given membership in Cluster 1 on day n − 1. Using this matrix, 10,000 synthetic partitions are created, each using the following steps: (1) Cluster numbers on 01 December of each year are assigned at random. (2) The remaining days of the year are assigned progressively based on a Markov Chain (a seven-state Markov Chain is used in the example in Table 1 since there are seven clusters). For example, the cluster number on 02 December is assigned according to the probabilities in Table 1, conditional on the cluster number on 01 December. The simulation generates 10,000 synthetic partitions of the 3,962 days into k clusters with the observed autocorrelation structure but no underlying relationship with either the MJO or ENSO. C(i, X) is then calculated for each of the synthetic cluster partitions and the significance of the observations is assessed with respect to this null distribution. A similar approach is applied to assess the significance for the shorter time period of the CFSv2 hindcasts.
Table 1

Cluster transition probabilities (%)

Cluster #
































































The seven entries in row i show the probabilities of Clusters 1–7 occurring on day n, given residence in cluster i on day n − 1

Even in the null case of no underlying physical relationship between cluster occurrence and the state of the MJO or ENSO, a number of tests of C(i, X) will return nominally significant results because of the large number of tests performed. Several approaches have been proposed in the literature for assessing the “global significance” of the multiple hypothesis tests. The most common method (Livezey and Chen 1983) involves counting the number of locally significant results obtained, but has been shown to be overly permissive for correlated tests (Wilks 2006). An alternative approach, using the false discovery rate (FDR; Wilks 2006) has several advantages. First, it is relatively insensitive to correlations among the local tests, providing modestly conservative results in the case of correlated tests. Second, it realistically identifies significant local tests, using a threshold p value which ensures that only a small number of the local tests identified will represent false rejections of the null hypothesis.

The approach is summarized briefly here, and the reader is referred to Benjamini and Hochberg (1995) and Wilks (2006) for further details. First, p values are calculated for each of the N local tests. Second, the desired level of global (field) significance (q) is chosen, usually to be 0.05. Third, a threshold p value (Pthreshold) is calculated based on the equation:
$$ P_{\text{threshold}} = \mathop {\max }\limits_{j = 1, \ldots ,N} \left[ {p_{j} :p_{j} \le q\left( \frac{j}{N} \right)} \right] $$
where q is the desired global significance level, N is the number of local tests, and \( p_{j} \) is the jth smallest of the N local p values. Fourth, only local tests with \( p < P_{\text{threshold}} \) are retained as rejections of the null hypothesis. Since \( P_{\text{threshold}} \) is generally much smaller than q, this test is much more stringent than the typical approach of rejecting all local tests with p < 0.05. If any local tests are deemed significant using this methodology, one can assume field significance at the level q (Wilks 2006). In addition, this methodology ensures that only a small fraction (also equal to q) of the local tests identified will represent mistaken rejections of the null hypothesis. This test is much more stringent than what is usually performed in the atmospheric science literature, but it is important in cases, such as this, where a very large number of correlated tests are performed.

4 Cluster analysis

The volume ratio index described in Sect. 3.2 is plotted in Fig. 2 for a k-means cluster analysis of 500-hPa geopotential height anomalies on the domain described in Sect. 3.1. Values of k between 2 and 11 are tested, and the index is found to be minimized for k = 7. The resulting seven cluster composites (centroids) are shown in Fig. 3, and their frequencies of occurrence are shown in Table 2. The seven clusters occur with relatively equal frequencies in the overall climatology, with Clusters 2 and 7 being the most common. Cluster 2 is most common during El Niño, while Cluster 7 is most common during La Niña. The clusters are comparatively equally distributed during neutral ENSO periods. Further discussion of the effects of ENSO on the cluster occurrence is deferred until Sect. 6.
Fig. 2

Volume ratio index derived in Sect. 3.2, plotted as a function of the value of k. By minimizing this index, the algorithm determines that k = 7 gives the most efficient coverage of the multidimensional space
Fig. 3

K-means cluster centroid patterns of 500-hPa geopotential height anomalies (m) over the extended North American region. A 7 day smoothing is performed on the geopotential height anomalies before the cluster analysis

Table 2

Cluster occurrences in the overall climatology, and during El Niño, La Niña and neutral periods

Cluster #


El Niño

La Niña


# of days

% of all days

# of days

% of El Niño days

# of days

% of La Niña days

# of days

% of neutral days









































































We have performed several tests to check that seven cluster centroids are robust and emerge repeatedly across a range of domains, time periods, values of k, smoothing algorithms and PCs retained. Results from these tests are described here briefly, but not shown. First, we performed the analysis separately on even and odd years and found almost identical cluster centroid patterns for both subsets of the data. Second, we performed the analysis using a 5-day smoothing instead of a 7 day smoothing, with no effect on either the optimal value for k or the resulting centroids. Third, we examined the centroids that resulted from k = 8 through k = 12, and found that the same seven clusters centroids shown in Fig. 3 are present among other clusters in most of these partitions. Fourth, we tried varying the number of EOF’s retained, testing values between 3 (36.3 % of the variance) and 100 (99.7 % of the variance). While we found that the optimal value for k was dependent on the number of EOFs, the centroid patterns themselves were again very robust when k = 7 was prescribed.

Two final tests were performed. First, we tried performing the same cluster analysis on a full northern hemispheric domain poleward of 20°N. We found that the volume ratio index for this domain was minimized at k = 8, and that among the resulting 8 clusters, six closely resembled the seven clusters from the smaller North American-sector domain, including Clusters 4, 6 and 7 which will be the primary focus of the paper. This suggests that variability in our study domain dominates the hemispheric variability. Finally, we performed the cluster analysis on the full year rather than only the winter months. We found an optimal value of k = 12, with 6 of the 7 wintertime clusters repeated in the full year analysis. Together, these tests make us confident that the seven clusters described in this analysis represent a meaningful partition of the data.

Cluster averages of the AO and PNA indices described in Sect. 3.1 are plotted in Fig. 4, demonstrating the relationship between the seven clusters and the northern hemisphere teleconnection indices. All clusters are combinations of multiple modes; however, Cluster 4 very strongly resembles a canonical negative AO pattern, and Clusters 6 and 7 respectively resemble the positive and negative phases of the PNA (combined also with opposing phases of the AO). The remainder of this paper will focus particularly on these three clusters (4, 6 and 7) because, unlike the other clusters, their probabilities are significantly influenced by MJO activity, as will be discussed in Sect. 5.
Fig. 4

Mean AO and PNA indices of days in each of the seven cluster centroids

One advantage of the cluster analysis approach is that it does not assume symmetry between positive and negative polarities of a given spatial pattern. For example, there is no pattern that is the polar opposite of Cluster 4 as would be expected if there were symmetry between the positive and negative phases of the AO. Instead, the positive phase of the NAO in the Atlantic tends to occur in conjunction with other 500-hPa wave patterns over North America and the Pacific as seen in Clusters 1, 3 and 7. Each of these clusters projects positively onto the Climate Prediction Center’s AO loading pattern (see We briefly tested whether the asymmetry between the positive and negative AO patterns occurs for values of k other than seven, and found that while a canonical negative AO similar to Cluster 4 emerges consistently for k = 3 through k = 12 with little variation, an opposite pattern only emerges only for k = 12.

These results suggest that there may be more variability in the geopotential height patterns during the positive phase of the AO than during the negative phase. Such results may be explained by the results of Feldstein (2003), who found that the positive NAO (which is highly correlated to the AO) is preceded by an upstream wave train located over North America and the northeast Pacific, whereas the negative NAO exhibits in situ development. Because of its more complex temporal evolution, it is plausible that there would be more variability associated with the positive NAO than with the negative NAO. More work is needed to understand these results fully.

5 Lagged relationships between Clusters 4, 6 and 7 and the MJO

In Fig. 5, we present the anomalous change in occurrence (C in Eq. 2) of Clusters 4, 6 and 7 at lags of 0–40 days after an active MJO episode occurs in the tropics. Anomalous occurrences of Clusters 1, 2, 3, and 5 are much weaker and so are not shown here. Figure 5 is very similar to Fig. 3 in Cassou (2008), but extends through 40 instead of 14 days after a particular MJO phase, in order to investigate any longer range effects of the MJO on the cluster probabilities. The “days” indicated on the x-axis refer to the lag after the MJO phase in terms of the central day of the 7-day running mean. For example, Lag 0 refers to the cluster that spans the period from 3 days before the MJO until 3 days after the MJO. While previous studies (e.g., Lin et al. 2009) have found interesting results for negative lags, we focus here on positive lags only because this period is most relevant to extended range prediction over North America.
Fig. 5

Percent change in the occurrence frequency of Cluster 4, Cluster 6 and Cluster 7 during and in the 40 days after MJO events in phases 1–8. Light red (light blue) shading represents enhanced (suppressed) probabilities that are locally significant at a 95 % level based on a Monte Carlo simulation. Dark red shading represents enhanced probabilities that are locally significant at a 98.5 % level which is the level needed to control the false discovery rate at 15 %

Results in Fig. 5 that are locally significant at the 95 % level based on the Monte Carlo test are shown with light red and light blue shading. As described in Sect. 3.3, we tested the global significance by controlling the FDR using (3). The number tests performed (N in 3) is taken to be 6,888 since we performed separate tests for each of the 7 clusters, 8 MJO phases, 41 different lead times, and then repeated all of these tests for El Niño and La Niña days only (Figs. 7, 8). Because of this large number of tests, we found that very stringent threshold p values were needed to control the FDR. Without considering the effect of correlations between the tests, we found that our results were globally significant at the 89 % level, suggesting an 11 % chance that the results could have been obtained by chance under the null hypothesis. However, these results are somewhat conservative since we did not account for correlations between the tests (Wilks 2006). Enhanced occurrence frequencies that remain significant after controlling the FDR at 15 % (q = 0.15) are shown in Fig. 5 with darker red shading. Darker blue shading is not needed for this plot, but is used in subsequent figures to indicate significantly suppressed occurrence frequencies. The threshold p value associated with this value of q is 0.015.

Figure 5a shows the anomalous change in occurrence of Cluster 4 during and after MJO events. The occurrence frequencies of Cluster 4 are significantly elevated with respect to climatology following active MJO episodes in phases 6 and 7, which represent suppressed convection over the eastern Indian Ocean, and enhanced convection over the eastern maritime continent (phase 6 only) and South Pacific Convergence Zone (SPCZ). The largest positive anomalies in these frequencies occur approximately 10–20 days after phase 7 of the MJO and approximately 20–25 days after phase 6 of the MJO. Under these conditions Cluster 4 is between 2 and 2.5 times as likely as it is in the overall climatology. For example, Cluster 4 occurs 27 % of the time at a lag of 23 days after phase 6 of the MJO, compared with 12 % of the time in the overall climatology. Though weaker, nominally significant anomalies are observed as far out as 40 days after an occurrence of the MJO in phase 5. The most significant suppression of Cluster 4 frequencies occurs approximately 13–20 days after phase 3 of the MJO and 24–28 days after phase 2. This phasing of the observed responses is expected and consistent with an MJO signal propagating eastward from phase 1 through phase 8.

The results shown in Fig. 5a are consistent with findings from L’Heureux and Higgins (2008), Cassou (2008), Lin et al. (2009), Roundy et al. (2010), and others, which have noted an increase in negative AO events approximately 10–20 days after the occurrence of an MJO event in phases 6 and 7. Since the mid-latitude mid-tropospheric signal is unlikely to persist beyond a few weeks, any signal at lead times beyond 15–20 days may be related to the predictability of the MJO convective signal in the tropics prior to the excitation of a poleward propagating Rossby wave train.

Figure 5b shows the anomalous change in occurrence frequency for Cluster 6, which resembles a positive PNA pattern combined with a weakly negative AO pattern. The figure shows that the frequencies of Cluster 6 are elevated significantly with respect to climatology immediately following active MJO episodes in phases 7 and 8, with even larger anomalies approximately 2 weeks following phase 6. As mentioned above, phase 6 is associated with enhanced convection over the eastern maritime continent, western Pacific and SPCZ. It is also associated with large-scale upper-tropospheric divergence over much of the Pacific basin as indicated by 200 hPa velocity potential composites, as seen on the CPC website:

These results are consistent with previous studies that link upper level divergence over the Pacific with an extension of the East Asian jet and the development of a cyclonic circulation anomaly over the northeast Pacific (e.g., Hoskins and Karoly 1981; Matthews et al. 2004; Seo and Son 2012). Higgins and Mo (1997) and others show that positive anomalies in the PNA index occur approximately 10 days following phase 6 of the MJO, in line with the results presented here.

Finally, Fig. 5c shows the occurrence frequency of Cluster 7 which resembles a negative PNA combined with a weakly positive AO. Cluster 7 frequencies are elevated significantly with respect to climatology following active MJO episodes in phases 1–5. The largest changes in frequency occur simultaneously with phase 5 of the MJO, 3–8 days after phase 4, 8–12 days after phase 3, and 12–16 days after phase 2. This timing is consistent with the results of Lin et al. (2009) who present 500 hPa geopotential height composites that resemble Cluster 7 at lags of 5 days after phase 3 of the MJO. MJO episodes in phase 3 are characterized by enhanced convection over the eastern Indian Ocean and western Maritime Continent, and reduced convection over the SPCZ near the dateline. Positive 200 hPa velocity potential anomalies, indicating broad anomalous upper-tropopospheric convergence, is observed over much of the Pacific basin during phases 2 and 3 of the MJO.

Figure 6 is similar to Fig. 5, but for similarly-sized clusters discriminated solely based on the AO and PNA indices alone. For example, Fig. 6a plots the anomalous change in occurrence (C) of a cluster consisting of the 264 most positive AO events. The anomalies for the most negative AO days (Fig. 6b) are of similar magnitude to the anomalies seen in Cluster 4 (Fig. 5a). This suggests that the cluster partition presented here can capture the MJO/negative AO teleconnection relationships as well as those methods that rely on the AO index alone. Comparing Fig. 5b, c (Clusters 6 and 7) with Fig. 6c, d (positive and negative PNA), we find that the MJO has as stronger impact on the cluster patterns than on the PNA. This is particularly true for Cluster 7 (Fig. 5c) which shows a much stronger response to the MJO than does the negative PNA (Fig. 6d). These results suggest that the MJO may not excite a pure PNA pattern per se, but rather a PNA-like response with its own unique signature.
Fig. 6

Percent change in the occurrence frequency of a positive AO clusters, b negative AO clusters, c positive PNA clusters and d negative PNA clusters during and in the 40 days after MJO events in phases 1–8. Light red (light blue) shading represents enhanced (suppressed) probabilities that are locally significant at a 95 % level based on a Monte Carlo simulation. Dark red (dark blue) shading represents enhanced (suppressed) probabilities that are locally significant at a 98.5 % level which is the level needed to control the false discovery rate at 15 %

One difference between our results and those of Cassou (2008), Lin et al. (2009), and others is that we do not see significant modulation in the frequencies of the most positive AO days by the MJO (Fig. 6a). Several differences between our analysis and that of Cassou (2008) could account for this. One difference is that our analysis uses the hemispheric AO index as opposed to Cassou’s analysis which focuses on the Atlantic sector only. Another difference is that we examined only the most positive AO anomalies (12 % of the days) as opposed to Cassou (2008) whose positive NAO cluster included 30 % of the days in his study. We note that in the results presented by Cassou (2008), the modulations for the positive NAO cluster are somewhat weaker than those for the negative phase (Fig. 3 of Cassou 2008).

6 Modulation by ENSO

We demonstrated in Sect. 4 (Table 2) that ENSO has a relatively large impact on the occurrence probabilities of the seven clusters. Cluster 2 and 6 are more likely to occur during El Niño periods than during La Niña or neutral periods. Cluster 7 is more likely to occur during La Niña periods than during El Niño or neutral periods. Given these shifts, we would expect that the anomalous changes in occurrence associated with the MJO (Fig. 5) to be different during El Niño and La Niña periods. For example, enhanced probabilities might be expected when convection and upper level divergence anomalies associated with ENSO constructively interfere with those associated with the MJO.

Figure 7 is similar to Fig. 5, except that only cluster occurrences during El Niño are considered in the frequency of occurrence calculation. For example, \( N_{i,X} \) in Eq. 2 might be the number of times Cluster 4 occurs during an El Niño, 10 days after a phase 3 MJO event. The entire climatological record is still used as the reference in (2). Figure 8 is the same, except that only La Niña periods are considered.
Fig. 7

Same as Fig. 5, except that only El Niño days are considered. The maximum value on the y-axis is 300 % here, as opposed to 200 % in Figs. 5 and 6
Fig. 8

Same as Fig. 5, except that only La Niña days are considered. The maximum value on the y-axis is 300 % here, as opposed to 200 % in Figs. 5 and 6

Figures 7 and 8 demonstrate that the enhanced/suppressed frequencies presented in Sect. 5 are significantly altered depending on the phase of ENSO. Enhanced probabilities of Cluster 4 occurring 10–20 days after an active MJO event in phase 7 and 20–25 days after an active MJO in phase 6 can be seen in both the El Niño and La Niña cases, but are statistically significant only during El Niño. The responses of Clusters 6 and 7 to the MJO are even more altered by ENSO. In fact, the enhanced probabilities of Cluster 6 (Cluster 7) following phase 6 (phase 3) are completely absent during unfavorable ENSO conditions and are nearly triple the climatology during favorable ENSO conditions.

In general, the strongest responses occur when upper-level convergence/divergence patterns associated with ENSO constructively interfere with those associated with the MJO over the Pacific Basin. For example, Cluster 6 probabilities are particularly high during El Niño following phase 6 of the MJO when both ENSO and the MJO contribute towards anomalous upper-tropospheric divergence over the central Pacific region. Conversely, Cluster 7 probabilities are particularly high during La Niña after phases 2 and 3 of the MJO, when both contribute towards anomalous upper-tropospheric convergence over the central Pacific.

Previous studies have found that the Rossby wave response to a combined ENSO and MJO event is not simply the linear combination of the separate responses to the MJO and ENSO (Roundy et al. 2010; Moon et al. 2010; Schrage et al. 1999). With our analysis, however, we do not see a large non-linear effect, at least for Clusters 6 and 7. That is, when Cluster 6 and 7 occurrences are normalized with respect to the ENSO background state (i.e. \( \frac{{N_{i} }}{{N_{T} }} \) in (2) is calculated for the El Niño or La Niña periods only, instead of for the full climatology), the MJO modulation of the clusters is quite similar in magnitude to that for the entire time period (Fig. 5). Cluster 7, for example, occurs only 8 % of the time during El Niño according to Table 2, but these probabilities increase to approximately 25 % a few weeks following phases 2 and 3 of the MJO. This represents a threefold increase over the 8 % El Niño baseline, but only a moderate increase over the climatological baseline for Cluster 7 (18 %).

Cluster 4, on the other hand, is equally common during the El Niño and La Niña periods (Table 2). Thus, the enhanced signal seen during El Niño after phase 7 of the MJO (Fig. 7) cannot be explained by a combination of the MJO and ENSO signals. These results are in contrast with those of Roundy et al. (2010) who note an amplified relationship between the MJO and the NAO during La Niña periods compared with El Niño periods. Further work is needed to better understand how interactions between the MJO and ENSO affect Clusters 4, 6 and 7.

7 Surface signatures of Clusters 4, 6 and 7

Since an underlying motivation for this study is to improve extended range forecasts over the United States, we are interested in the temperature and precipitation signatures of Clusters 4, 6 and 7 at the surface. Surface temperature and precipitation composites over the continental US associated with these clusters are presented in Figs. 9, 10 and 11, respectively. These composites are computed following the same approach as that for the 500-hPa geopotential height composites shown in Fig. 3.
Fig. 9

a Composite of precipitation anomalies (mm/day) over the United States for all 487 days in Cluster 4. b Same as a except for a cluster of the 487 days with the most negative AO values. c Composite of surface temperature anomalies (°C) associated with all 487 days in Cluster 4. d Same as c except for a cluster of the 487 days with the most negative AO
Fig. 10

Same as Fig. 9 except for the 545 days in Cluster 6 (a, c) and for the 545 days with the most positive PNA index values (b, d)
Fig. 11

Same as Figs. 9 and 10 except for the 728 days in Cluster 7 (a, c) and for the 728 days with the most negative PNA index values (b, d)

The left panels of Fig. 9 show composites of precipitation and temperature anomalies over the continental US for all 487 7-day periods classified in Cluster 4. For comparison, the right panels of Fig. 9 show the corresponding composite anomalies for the 487 7-day periods with the most negative AO indices. Both Cluster 4 and the negative AO index days are associated with substantial cold anomalies across the eastern and north central United States associated with a weakening and southward shift of the midlatitude jet (not shown). Both are also associated with a southward shift in precipitation over the eastern and mid-western United States. In general, the Cluster 4 temperature and precipitation anomalies are slightly weaker than anomalies associated with strong negative AO events.

Figure 10 is similar to Fig. 9, except that it shows composites of the 542 7-day periods in Cluster 6 in the left panels, and the 542 7-day periods with the most positive PNA indices in the right panels. Rainfall anomalies over the continental United States for both the Cluster 6 and the positive PNA composites show a wet signal along the eastern coastal US and an extensive dry signal over the interior southeastern states and the Midwest, similar to the composite signature associated with El Niño. Except for a wet region in northern California, dry anomalies are also observed over most of the west coast and western mountain states. In Fig. 10c, d, both composites show warm anomalies in the northwestern U.S. and cold anomalies in the southeastern U.S. associated with the PNA ridge-trough pattern. Negative temperature anomalies extend further into the northeastern U.S. in the Cluster 6 composite (Fig. 10c) due to the influence of a negative AO/NAO pattern over the north Atlantic. With the exception of the very strong positive temperature anomalies over the western mountain states, the Cluster 6 temperature and precipitation anomalies are generally of comparable magnitude to those associated with the PNA composite.

Figure 11 is the same as Figs. 9 and 10, except that it shows the 747 7-day mean periods in Cluster 7, and the 747 7-day periods with the most negative PNA index values. Precipitation composites for Cluster 7 and the negative PNA periods show dry anomalies along the southeastern coast and wet anomalies in the interior southeastern, mid-western and Pacific northwestern states in a pattern similar to La Niña composites. The temperature signatures look quite different from each other, however, with the Cluster 7 composites associated primarily with warm anomalies over the eastern half of the continental United States, while the negative PNA composites are mostly associated with cooling over the northern, mid-western, and western states. These cold anomalies are explained by a deeper trough over western Canada and the US in the 500-mb PNA height field when compared with Cluster 7 (not shown). The warm anomalies over the eastern US are possibly linked to the positive AO/NAO pattern associated with Cluster 7.

We conclude that Clusters 4, 6 and 7 are associated with surface anomalies that are generally comparable in magnitude to those associated with days with the most positive and negative AO and PNA indices. Thus, cluster occurrence forecasts may be as useful to North American extended-range outlooks as the prediction of strong AO, NAO and PNA events.

8 CFSv2 hindcasts

CFSv2 is used operationally to forecast MJO activity and propagation in the tropics. In this section, we examine how well it simulates the observed tropical–extratropical linkages identified in Sect. 5. Before focusing on the MJO, however, we first examine how well the model simulates the overall December–March distributions of 500-hPa geopotential height patterns. As described in Sect. 3.1, each CFSv2 7-day hindcast is assigned to one of the seven clusters shown in Fig. 3 by finding the nearest cluster centroid. The cluster distributions in the week-2, week-3, week-4, and week-5 hindcasts are then compared to the reanalysis in Fig. 12. Figure 12 resembles the data presented in Table 2, except for the shorter hindcast period (1999–2010).
Fig. 12

Occurrence frequency for each of the seven clusters during a all days, b El Niño days, and c La Niña days. Black bars show the reanalysis frequencies and grey and white bars show frequencies for the CFSv2 week-2, week-3, week-4 and week-5 forecasts from left to right

Figure 12a shows the relative frequencies of Clusters 1–7 for the hindcasts years (1999–2010). At a 2-week lead, the CFSv2 cluster distributions are quite similar to the reanalysis distributions. In both cases, Cluster 2 and Cluster 7 are the most common clusters and Cluster 3 is the least common cluster. At longer leads, however, variability in the model is reduced and it tends to converge to certain “preferred” patterns. The model is increasingly biased towards Clusters 6 and 7 at the expense of Clusters 1-4. The anomalies in Clusters 6 and 7 are nearly opposite of each other so the bias favoring these clusters is not corrected by the mean bias correction.

Figure 12b, c are similar to Fig. 12a, except that they show cluster frequencies for El Niño and La Niña years only. As with the overall climatology, the week 2 forecasts capture the relative frequencies of the clusters quite accurately, with the probabilities of Cluster 2 and Cluster 7 elevated during La Niña, and the probability of Cluster 6 elevated during El Niño. The week-4 and week-5 forecasts show even stronger enhancements of Cluster 6 (Cluster 7) during El Niño (La Niña) than do the observations. In these forecasts, Cluster 7 has more than a 50 % chance of occurring during La Niña episodes. This indicates that the long lead forecasts have lower variability than they should, favoring the most likely cluster more often than is climatologically appropriate. A correction to the model accounting for this could be applied to produce more realistic baseline distributions for the week-4 and week-5 forecasts.

Despite these biases, we can still examine how the baseline distributions shown in Fig. 12 are impacted by the MJO. Figure 13 shows anomalous cluster occurrences (C in Eq. 2) for the CFSv2 hindcasts. For comparison, similar plots are also included for the reanalysis for the same years (1999–2010). The reanalysis plots (Fig. 13a, c, e) are essentially identical to Fig. 5 except for the different record lengths. Enchanced/suppressed probabilities in this shorter record generally occur at similar times as in the full record, though with some discrepancies in exact timing. For the model (Fig. 13b, d, f), the x-axis represents the model lead in addition to the number of days lag after the MJO. Model runs are initialized on day 0 during an MJO event and run forward for 40 days. Reference cluster frequencies (the denominator in 2) are also lead-dependent.
Fig. 13

a Anomalous frequency of occurrence for Cluster 4 based on the NCEP reanalysis 1999–2010. b Same as a except for the CFSv2 hindcasts. c, d Same as a, b, except for Cluster 6. e, f Same as a, b except for Cluster 7. CFSv2 hindcasts are for 0–40 days lead after the model initialization on day 0 during an active MJO event. For example, day 11 refers to the week-2 forecast. Changes in occurrence are calculated with respect to the lead-dependent model climatological frequencies (as in Fig. 12). All positive days are shaded light red, while all negative days are shaded light blue. Slanted red (blue) lines approximate maxima in enhanced (suppressed) probabilities for the full reanalysis record (1981–2010; Fig. 5)

The statistical significance was assessed as before, using a Monte Carlo simulation. However, because of the shorter record, only a few statistically significant results were found for the anomalous cluster frequencies shown in Fig. 13, and these are not indicated on the plot. Instead, we chose to highlight the timing of all positive and negative anomalies with contrasting colors. Despite the absence of statistically significant anomalies, Fig. 13 suggests that the model correctly captures the approximate timing of observed enhanced/suppressed probabilities of Clusters 4, 6 and 7. For example, the model shows a near doubling of the occurrence of Cluster 4 several weeks after an active MJO in phase 6. The model response, however, is somewhat later than in the observations, peaking approximately 28–32 days after the MJO phase 6 episode as opposed to 21–24 days after in the reanalysis. A doubling of the occurrence of Cluster 6 also occurs after the MJO is active in phase 6 in both the model and the reanalysis. In this case, both show the largest probability enhancements occurring at approximately 10–12 days after the MJO event, more than a week before the enhanced probabilities of Cluster 4 (negative AO). The timing of enhancements and reductions in the occurrence of Cluster 7 are also similar between the model and the reanalysis, though the anomalies are weaker and persist for longer in the model.

Though the data do not provide conclusive evidence due to the relatively short record, these results suggest that the CFSv2 model can capture the approximate timing of the tropical/extratropical connections between the MJO and northern hemisphere flow regimes.

9 Summary and conclusions

In this study we identify seven clusters that represent commonly occurring weather regime patterns spanning the Pacific, North American, and Atlantic sectors in the northern hemisphere. We find these patterns to be quite robust to a variety of sensitivity tests, and suggest that they represent a useful way to summarize variability in the 500-hPa geopotential height field. One advantage of the cluster analysis is that it does not assume linearity between the positive and negative phases of the leading modes of variability. For instance, the presence of a cluster that resembles a strong negative AO/NAO (Cluster 4) along with the absence of a cluster which resembles a pure positive AO/NAO suggest that asymmetries exist that cannot be well represented simply by isolating linear modes. While the seven clusters have similarities with the leading modes of northern hemisphere variability, most represent mixtures of the AO and PNA together with higher order patterns.

Tropical convection associated with the MJO is shown to strongly modulate the occurrence probabilities of Clusters 4, 6 and 7. Cluster 4 very strongly resembles a negative AO/NAO pattern. Consistent with previous results (e.g., Cassou 2008; L’Heureux and Higgins 2008; and Lin et al. 2009), Cluster 4 probabilities are elevated approximately 10–20 days after an MJO event in phase 7. While Clusters 6 and 7 resemble opposite phases of the PNA, their occurrence probabilities are much more closely tied to the MJO than similarly sized clusters of positive and negative PNA events. The timing of the relationship between MJO and Clusters 6 and 7 is consistent with a Rossby wave train response to upper-level divergence and convergence anomalies in the central Pacific.

The relationship between the MJO and Clusters 6 and 7 is very different during El Niño and La Niña years. Enhanced probabilities of Cluster 6 occur in El Niño years only, while enhanced probabilities of Cluster 7 occur in La Niña years only. This may be partially due to destructive interference between ENSO and MJO during unfavorable ENSO conditions. During favorable ENSO conditions, when convection and upper-tropospheric circulation patterns associated with ENSO and the MJO interfere constructively, Clusters 6 and 7 frequencies can almost triple compared to their climatological frequencies. During La Niña, we find statistically significant modulations in the occurrence frequency of Cluster 7 as far as 35 days after the initial MJO event. This represents a significant probability shift associated with the MJO at longer leads than have previously been reported.

Finally, the CFSv2 model captures the observed relationships quite well. While the CFSv2 December–March cluster distributions are generally biased towards Clusters 6 and 7 at long lead times (Fig. 12), the model shows realistically timed modulations of these climatological probabilities in the days after an MJO event. However, the model anomalies are of slightly smaller magnitude than seen in the observations.

The results presented in this paper may have practical applications for improving extended range forecasts over the North American region. State-of-the-art extended-range forecasts beyond 2 weeks currently show very little skill. The results of this study suggest that some skill may be obtained under certain conditions for forecasts out to 4 weeks or longer, based on knowledge of prior MJO activity and the state of ENSO. We are currently working on a methodology to incorporate these results into probabilistic forecasts over North America for lead times ranging from 1 to 4 weeks.


Support for this work was provided by the NOAA Climate Test Bed and the NOAA Student Career Experience Program (SCEP). We would also like to thank Jon Gottschalck and Peitao Peng at the Climate Prediction Center and two anonymous reviews for their very helpful editorial comments on the manuscript.

Copyright information

© Springer-Verlag 2012