Skip to main content

A Box-Plot and Outliers Detection Proposal for Histogram Data: New Tools for Data Stream Analysis

  • Conference paper
  • First Online:
Analysis and Modeling of Complex Data in Behavioral and Social Sciences

Abstract

In this paper, we propose a method for monitoring the evolution of data described by histograms of values. Our proposal consists to define new order statistics on the quantile functions associated with the empirical distributions, represented by the histogram-data. We introduce the Median, the First and the Third Quartile quantile functions, as well as a generalized representation of the box and whiskers plot. For example, the proposed representations and indices are useful for identifying and classifying outliers, arriving along the time in a data stream environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Arroyo, J., González-Riviera, G., Maté, C., & Muñoz San Roque, A. (2011). Smoothing methods for histogram-valued time series. An application to value-at-risk. Statistical Analysis and Data Mining, 4(2), 216–228.

    Article  MathSciNet  Google Scholar 

  • Gama, J., & Pinto, C. (2006). Discretization from data streams: Applications to histograms and data mining. In Proceedings of the ACM Symposium on Applied Computing (pp. 662–667), New York.

    Google Scholar 

  • Gilchris, W. (2000). Statistical modelling with quantile functions. London/Boca Raton: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Irpino, A., & Verde, R. (2006). Dynamic clustering of histograms using Wasserstein metric. In A. Rizzi & M. Vichi (Eds.), Advances in computational statistics (pp. 869–876). Heidelberg: Physica-Verlag.

    Google Scholar 

  • Rivoli, L., Irpino, A., & Verde, R. (2012). The median of a set of histogram data. In XLVI Riunione Scientifica della Società Italiana di Statistica, CLEUP [ISBN 978-88-6129-882-8].

    Google Scholar 

  • Verde, R., & Irpino, A. (2007). Dynamic clustering of histogram data: Using the right metric. In Studies in classification, data analysis, and knowledge organization (vol. I, pp. 123–134).

    Google Scholar 

  • Verde, R., & Irpino, A. (2008). Comparing histogram data using a Mahalanobis-Wasserstein distance (COMPSTAT 2008) (pp. 77–89). Heidelberg: Physica-Verlag.

    Google Scholar 

  • Tukey, J. W. (1977). Exploratory data analysis. Reading: Addison-Wesley.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosanna Verde .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Verde, R., Irpino, A., Rivoli, L. (2014). A Box-Plot and Outliers Detection Proposal for Histogram Data: New Tools for Data Stream Analysis. In: Vicari, D., Okada, A., Ragozini, G., Weihs, C. (eds) Analysis and Modeling of Complex Data in Behavioral and Social Sciences. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-06692-9_30

Download citation

Publish with us

Policies and ethics