Skip to main content

Exploratory Data Analysis through the Inspection of the Probability Density Function of the Number of Neighbors

  • Conference paper
Advances in Intelligent Data Analysis XII (IDA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8207))

Included in the following conference series:

  • 2403 Accesses

Abstract

Exploratory data analysis is a fundamental stage in data mining of high-dimensional datasets. Several algorithms have been implemented to grasp a general idea of the geometry and patterns present in high-dimensional data. Here, we present a methodology based on the distance matrix of the input data. The algorithm is based in the number of points considered to be neighbors of each input vector. Neighborhood is defined in terms of an hypersphere of varying radius, and from the distance matrix the probability density function of the number of neighbor vectors is computed. We show that when the radius of the hypersphere is systematically increased, a detailed analysis of the probability density function of the number of neighbors unfolds relevant aspects of the overall features that describe the high-dimensional data. The algorithm is tested with several datasets and we show its pertinence as an exploratory data analysis tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dasu, T., Johnson, T.: Exploratory data mining and data cleaning. Wiley (2003)

    Google Scholar 

  2. Basford, K.E., Tukey, J.: Graphical analysis of multiresponse data. Chapman & Hall/CRC (1998)

    Google Scholar 

  3. Morgenthaler, S.: Exploratory data analysis. WIREs Computational Statistics 1, 33–44 (2009)

    Article  Google Scholar 

  4. Martinez, W., Martinez, W.: Exploratory data analysis with Matlab. Chapman & Hall / CRC (2005)

    Google Scholar 

  5. Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high-dimensional data. In: New Vistas in Statistical Physics: Applications in Econophysics, Bioinformatics, and Pattern Recognition (2003)

    Google Scholar 

  6. Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. on Knowledge Discovery from Data 3(1), Article 1 (2009)

    Google Scholar 

  7. Berthold, M., Wiswedel, B., Patterson, D.: Interactive exploration of fuzzy clusters using Neighborgrams Fuzzy Sets and Systems, vol. 149, pp. 21–37 (2005)

    Google Scholar 

  8. Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and applications, 2nd edn. Springer (2005)

    Google Scholar 

  9. Vesanto, J., Sulkava, M.: Distance Matrix Based Clustering of the Self-Organizing Map. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 951–956. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  10. Brim, S.: Near neighbor search in large metric spaces. In: Proc. 21st VLDB Conf., Zürich, Switzerland, pp. 574–584 (1995)

    Google Scholar 

  11. Cha, S.H.: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. Int. J. of Mathematical Models and Methods in Applied Sciences 4(1), 300–307 (2007)

    Google Scholar 

  12. Brough, R., Frankum, J., Sims, D.: Functional viability profiles of breast cancer. Cancer Discovery 1, 260–273 (2011)

    Article  Google Scholar 

  13. Blake, C.L., Merz, C.U.: Repository of machine learning databases University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/mlearn/MLRepository.html

  14. Garcia-Vallve, S., Romeu, A., Palau, J.: Horizontal Gene Transfer in Bacterial and Archaeal Complete Genomes. Genome Res. 10, 1719–1725 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neme, A., Nido, A. (2013). Exploratory Data Analysis through the Inspection of the Probability Density Function of the Number of Neighbors. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds) Advances in Intelligent Data Analysis XII. IDA 2013. Lecture Notes in Computer Science, vol 8207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41398-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41398-8_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41397-1

  • Online ISBN: 978-3-642-41398-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics