Skip to main content

Zeroes, Missings, and Outliers

  • Chapter
  • First Online:
Analyzing Compositional Data with R

Part of the book series: Use R! ((USE R))

Abstract

The presence of irregular data, i.e., zeroes, missings, and outliers, has consequences for plotting, descriptive statistics, estimation of parameters, and statistical tests. Since all of these methods depend on every value of the dataset, we cannot ignore them in any task. Some ad hoc procedures are needed, with the same aims as the classical methods, but which can still be computed despite the existence of irregular data. It is necessary to augment the basic concepts (e.g., the concept of expected value) to give them a meaning when there are irregular values. The current state of the art of the treatment of irregularities in compositional data analysis is far from being a closed subject and can improve a lot in the near future. The package only provides a set of tools limited to detect, represent, and briefly analyze such irregular values, either missing values, zeroes, or outliers. This chapter provides nevertheless additional background material to enable the reader to carefully treat datasets with irregular data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aitchison, J. (1986). The statistical analysis of compositional data. Monographs on statistics and applied probability. London: Chapman & Hall (Reprinted in 2003 with additional material by The Blackburn Press), 416 pp.

    Google Scholar 

  • Filzmoser, P., & Hron, K. (2008). Outlier detection for compositional data using robust methods. Mathematical Geosciences, 40(3), 233–248.

    Article  MATH  Google Scholar 

  • Fry, J. M., Fry, T. R. L., & McLaren, K. R. (2000). Compositional data analysis and zeros in micro data. Applied Economics, 32(8), 953–959.

    Article  Google Scholar 

  • Fung, W. -K., & Bacon-Shone, J. (1993). Quasi-Bayesian modelling of multivariate outliers. Computational Statistics & Data Analysis, 16(3), 271–278.

    Article  MATH  Google Scholar 

  • Madigan, D., Raftery, A. E., Volinksy, C. T., & Hoeting, J. A. (1996). Bayesian model averaging. Technical report. Colorado State University.

    Google Scholar 

  • Martín-Fernández, J. A., Barceló-Vidal, C., & Pawlowsky-Glahn, V. (2000). Zero replacement in compositional data sets. In H. Kiers, J. Rasson, P. Groenen, & M. Shader (Eds.), Studies in classification, data analysis, and knowledge organization (Proceedings of the 7th conference of the International Federation of Classification Societies (IFCS’2000), University of Namur, Namur, 11–14 July (pp. 155–160). Berlin: Springer, 428 pp.

    Google Scholar 

  • Martín-Fernández, J. A., Barceló-Vidal, C., & Pawlowsky-Glahn, V. (2003). Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Mathematical Geology, 35(3), 253–278.

    Article  Google Scholar 

  • Mateu-Figueras, G., & Pawlowsky-Glahn, V. (2007). The skew-normal distribution on the simplex. Communications in Statistics—Theory and Methods, 39(6), 1787–1802.

    Article  MathSciNet  Google Scholar 

  • Palarea-Albaladejo, J., & Martìn-Fernández, J. (2008). A modified em alr-algorithm for replacing rounded zeros in compositional data sets. Computers & Geosciences, 34, 2233–2251.

    Article  Google Scholar 

  • Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Maechler, M., et al. (2007). robustbase: Basic robust statistics. R package version 0.2-8.

    Google Scholar 

  • Rousseeuw, P., & Leroy, A. (2003). Robust regression and outlier detection. Wiley series in probability and statistics. New York: Wiley, 239 pp.

    Google Scholar 

  • Tjelmeland, H., & Lund, K. V. (2003). Bayesian modelling of spatial compositional data. Journal of Applied Statistics, 30(1), 87–100.

    Article  MathSciNet  MATH  Google Scholar 

  • van den Boogaart, K. G., Tolosana-Delgado, R., & Bren, M. (2006). Concepts for handling zeroes and missing values in compositional data. In E. Pirard, A. Dassargues, & H. B. Havenith (Eds.), Proceedings of IAMG’06—The XI annual conference of the International Association for Mathematical Geology, University of Liège, Belgium, CD-ROM.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

van den Boogaart, K.G., Tolosana-Delgado, R. (2013). Zeroes, Missings, and Outliers. In: Analyzing Compositional Data with R. Use R!. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36809-7_7

Download citation

Publish with us

Policies and ethics