Skip to main content

Feature Clustering for Extreme Events Analysis, with Application to Extreme Stream-Flow Data

  • Conference paper
  • First Online:
New Frontiers in Mining Complex Patterns (NFMCP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10312))

Included in the following conference series:

Abstract

The dependence structure of extreme events of multivariate nature plays a special role for risk management applications, in particular in hydrology (flood risk). In a high dimensional context (\(d>50\)), a natural first step is dimension reduction. Analyzing the tails of a dataset requires specific approaches: earlier works have proposed a definition of sparsity adapted for extremes, together with an algorithm detecting such a pattern under strong sparsity assumptions. Given a dataset that exhibits no clear sparsity pattern we propose a clustering algorithm allowing to group together the features that are ‘dependent at extreme level’, i.e.,that are likely to take extreme values simultaneously. To bypass the computational issues that arise when it comes to dealing with possibly \(O(2^d)\) subsets of features, our algorithm exploits the graphical structure stemming from the definition of the clusters, similarly to the Apriori algorithm, which reduces drastically the number of subsets to be screened. Results on simulated and real data show that our method allows a fast recovery of a meaningful summary of the dependence structure of extremes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. DMKD 11(1), 5–33 (2005)

    MathSciNet  Google Scholar 

  2. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  3. Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.: Statistics of extremes: theory and applications. Wiley, Hoboken (2006)

    MATH  Google Scholar 

  4. Boldi, M.O., Davison, A.: A mixture model for multivariate extremes. JRSS-B 69(2), 217–229 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)

    Article  MATH  Google Scholar 

  6. Chautru, E.: Dimension reduction in multivariate extreme value analysis. Electron. J. Stat. 9(1), 383–418 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  7. Clifton, D.A., Hugueny, S., Tarassenko, L.: Novelty detection with multivariate extreme value statistics. J. Sig. Process. Syst. 65(3), 371–389 (2011)

    Article  Google Scholar 

  8. Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. Springer, London (2001)

    Book  MATH  Google Scholar 

  9. Coles, S., Tawn, J.: Modeling extreme multivariate events. JRSS-B 53, 377–392 (1991)

    MATH  Google Scholar 

  10. Cooley, D., Davis, R., Naveau, P.: The pairwise beta distribution: a flexible parametric multivariate model for extremes. JMVA 101(9), 2103–2117 (2010)

    MathSciNet  MATH  Google Scholar 

  11. Einmahl, J.H., Segers, J.: Maximum empirical likelihood estimation of the spectral measure of an extreme-value distribution. Ann. Stat. 37, 2953–2989 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  12. Fougeres, A.L., De Haan, L., Mercadier, C., et al.: Bias correction in multivariate extremes. Ann. Stat. 43(2), 903–934 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  13. Fougeres, A.L., Mercadier, C., Nolan, J.P.: Dense classes of multivariate extreme value distributions. J. Multivar. Anal. 116, 109–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  14. Giuntoli, I., Renard, B., Vidal, J.P., Bard, A.: Low flows in france and their relationship to large-scale climate indices. J. Hydro. 482, 105–118 (2013)

    Article  Google Scholar 

  15. Goix, N., Sabourin, A., Clémençon, S.: Learning the dependence structure of rare events: a non-asymptotic study. In: Proceedings of the 28th COLT (2015)

    Google Scholar 

  16. Goix, N., Sabourin, A., Clémençon, S.: Sparsity in multivariate extremes with applications to anomaly detection. arXiv preprint arXiv:1507.05899 (2015)

  17. Goix, N., Sabourin, A., Clémençon, S.: Sparse representation of multivariate extremes with applications to anomaly ranking. In: Proceedings of the 19th AISTAT conference, pp. 287–295 (2016)

    Google Scholar 

  18. Guillotte, S., Perron, F., Segers, J.: Non-parametric Bayesian inference on bivariate extremes. JRSS-B 73(3), 377–406 (2011)

    Article  MathSciNet  Google Scholar 

  19. Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Trans. Database Syst. 28(2), 140–174 (2003)

    Article  Google Scholar 

  20. Katz, R.W., Parlange, M.B., Naveau, P.: Statistics of extremes in hydrology. Adv. Water Resour. 25(8), 1287–1304 (2002)

    Article  Google Scholar 

  21. Lee, H.-J., Roberts, S.J.: On-line novelty detection using the Kalman filter and extreme value theory. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)

    Google Scholar 

  22. Qi, Y.: Almost sure convergence of the stable tail empirical dependence function in multivariate extreme statistics. Acta Math. Applicatae Sin. (English Ser.) 13(2), 167–175 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  23. Resnick, S.I.: Extreme Values, Regular Variation and Point Processes. Springer, Heidelberg (2013)

    MATH  Google Scholar 

  24. Sabourin, A., Naveau, P.: Bayesian Dirichlet mixture model for multivariate extremes: a re-parametrization. CSDA 71, 542–567 (2014)

    MathSciNet  Google Scholar 

  25. Sabourin, A., Naveau, P., Fougeres, A.L.: Bayesian model averaging for multivariate extremes. Extremes 16(3), 325 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  26. Stephenson, A.: Simulating multivariate extreme value distributions of logistic type. Extremes 6(1), 49–59 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  27. Tawn, J.A.: Modelling multivariate extreme value distributions. Biometrika 77(2), 245–253 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  28. Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theoret. Comput. Sci. 363(1), 28–42 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  29. Xie, Y., Philip, S.Y.: Max-clique: a top-down graph-based approach to frequent pattern mining. In: 2010 IEEE International Conference Data Mining, pp. 1139–1144. IEEE (2010)

    Google Scholar 

Download references

Acknowledgments

Part of this work has been funded by the the ‘LabEx Mathématiques Hadamard’ (LMH) project, by the ‘AGREED’ project from the PEPS JCJC program (INS2I, CNRS) and by the chair ‘Machine Learning for Big Data’ from Télécom ParisTech. The authors would like to thank Benjamin Renard for interesting discussions about the hydrological use case and for sharing the data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maël Chiapino .

Editor information

Editors and Affiliations

A Appendix: Proof of Lemma 1

A Appendix: Proof of Lemma 1

Step 1. As a first step we show that \(\mathcal {M}\subset \mathbb {M}, \text {i.e.,} \mu (\mathcal {C}_\alpha )>0 \Rightarrow \mu (\varGamma _\alpha )>0. \)

Proof

Write \(\mathcal {C}_\alpha = \bigcup _{\epsilon >0,\epsilon \in \mathbb {Q}} R_{\alpha ,\epsilon }\), where \(R_{\alpha ,\epsilon } = \{x \in \mathbb {R}_+^d:\; \Vert x\Vert _{\infty }\ge 1; \quad x_j > \epsilon ~ (j\in \alpha ); \quad x_i = 0 ~ (i\notin \alpha ) \}.\) Assume \(\mu (\mathcal {C}_\alpha )>0\). Since \(\mu (\mathcal {C}_{\alpha } )<\infty \), by the monotonous limit property of the measure \(\mu \), we have \(\mu (\mathcal {C}_\alpha ) = \lim _{\epsilon \rightarrow 0} \mu (R_{\alpha ,\epsilon })\). Also, from the definitions, \(R_{\alpha ,\epsilon }\subset \epsilon \varGamma _{\alpha }\). Thus,

$$\begin{aligned} \mu (\mathcal {C}_\alpha )>0&\Rightarrow \exists \epsilon \in (0,1) : \mu (R_{\alpha ,\epsilon })>0 \qquad \Rightarrow \mu (\epsilon \varGamma _{\alpha })>0 \\&\Rightarrow \rho _\alpha = \mu (\varGamma _{\alpha }) = \epsilon \mu (\epsilon \varGamma _{\alpha })>0. \end{aligned}$$

Step 2. We now prove the reverse inclusion for maximal elements of \(\mathbb {M}\), i.e., 

$$\begin{aligned} \alpha \text { is maximal in } \mathbb {M} \quad \Rightarrow \alpha \in \mathcal {M}. \end{aligned}$$
(13)

Proof

Consider, for \( i \notin \alpha \), the set \( \;\Delta _{i, \epsilon } = \varGamma _{\alpha }\cap \{ x \in \mathbb {R}_+^d : \quad x_i > \epsilon \}, \) so that \( \varGamma _{\alpha } = \big \{ \bigcup _{\begin{array}{c} i \in \{1,\ldots ,d\}\setminus \alpha \\ \epsilon \in \mathbb {Q} \cap (0,1) \end{array}} \Delta _{i, \epsilon }\big \} \cup R_{\alpha , 1}. \) Thus,

$$\begin{aligned} \alpha \in \mathbb {M} ~~\Rightarrow ~~\mu (\varGamma _{\alpha } )>0 ~~\Rightarrow ~~\Big (\exists i, \mu (\Delta _{i,\epsilon })>0 ~~\text { or }~~ \mu ( R_{\alpha ,1} )>0\Big ) \end{aligned}$$
(14)

To prove (13), it is enough to show that

$$\begin{aligned} \alpha \in \mathbb {M}\quad \Rightarrow \quad \text { for } i\notin \alpha , \;\mu (\Delta _{i,\epsilon }) =0 . \end{aligned}$$
(15)

Indeed if (15) is true, and if \(\alpha \in \mathbb {M}\), then (14) implies that \(\mu (R_{\alpha ,1})>0\), and the result follows from the inclusion \(R_{\alpha ,1}\subset \mathcal {C}_\alpha \). We show (15) by contradiction. If \(\mu (\Delta _{i,\epsilon })>0\) for some \(i\notin \alpha \), then

$$\frac{1}{\epsilon } \Delta _{i,\epsilon } = \left( \frac{1}{\epsilon }\,\varGamma _{\alpha }\right) \cap \{x\in \mathbb {R}_+^d : x_i>1 \}\subset \varGamma _{\alpha \cup \{i\}},$$

thus \(\mu (\varGamma _{\alpha \cup \{i\}})>0\), which contradicts the maximality of \(\alpha \) in \(\mathbb {M}\).

Step 3. From (13), if \(\alpha \text { is maximal in } \mathbb {M}\) then \( \alpha \in \mathcal {M}\). Now if \(\alpha \) is maximal in \(\mathbb {M}\) but not in \(\mathcal {M}\), there exists \(\beta \supsetneq \alpha \) in \(\mathcal {M}\). Thus from Step 1, \(\beta \in \mathbb {M}\), a contradiction. Hence \(\alpha \) is also maximal in \(\mathcal {M}\). Conversely, if \(\alpha \) is maximal in \(\mathcal {M}\) then (Step 1) \(\alpha \in \mathbb {M}\). If \(\alpha \) was not maximal in \(\mathbb {M}\), there would exist \(\beta \supsetneq \alpha \) maximal in \(\mathbb {M}\), and from (13), \(\beta \in \mathcal {M}\), contradicting the maximality of \(\alpha \) in \(\mathcal {M}\).

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Chiapino, M., Sabourin, A. (2017). Feature Clustering for Extreme Events Analysis, with Application to Extreme Stream-Flow Data. In: Appice, A., Ceci, M., Loglisci, C., Masciari, E., Raś, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2016. Lecture Notes in Computer Science(), vol 10312. Springer, Cham. https://doi.org/10.1007/978-3-319-61461-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-61461-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-61460-1

  • Online ISBN: 978-3-319-61461-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics