Skip to main content

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

  • 2929 Accesses

Abstract

Scatter plot is a useful method for visualising clusters and outliers in continuous data. However, this method cannot be used directly on nominal data due to a lack of natural ordering and ‘distance’ in nominal values. One solution to this problem is to map the multi-dimensional nominal data to a numeric space, and then draw a scatter plot of the data points based on the first two principal components of the numeric space. This paper reports a study on how such plots can be generated using three types of mapping: (a) Binary Input Mapping (BImap), (b) Attribute Value Frequency Mapping (AVFmap), and (c) BImap combined with AVFmap. Results show that the combined method draws upon the complementary strengths of BImap and AVFmap, to generate meaningful scatter plots for visualising categorical outliers and achieve the highest information gain among the methods tested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dejan T (2008) Gestalt principles. Scholarpedia 3(12):5345

    Article  Google Scholar 

  2. Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, CA Online: http://archive.ics.uci.edu/ml

  3. Smith LI (2002) A tutorial on principal component analysis. Online: www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

  4. Kader GD, Perry M (2007) Variability for categorical variables. J Stat Educ 15(2). Online: www.amstat.org/publications/jse/v15n2/kader.html

  5. Koufakou A, Ortiz E, Georgiopoulos M, Anagnostopoulos G, Reynolds K (2007) A scalable and efficient outlier detection strategy for categorical data. In: Proceedings of IEEE international conference on tools with artificial intelligence ICTAI, pp. 210–217

    Google Scholar 

  6. Ma S, Hellerstein JL (1999) Ordering categorical data to improve visualization. In: IEEE information visualization symposium, IEEE, pp. 15–18

    Google Scholar 

  7. Chandola V, Boriah S, Kumar V (2009) A framework for exploring categorical data. In: Proceedings of the ninth SIAM international conference on data mining

    Google Scholar 

  8. Friendly M (2000) Visualizing categorical data. SAS Publishing, Cary

    Google Scholar 

  9. LeBlanc J, Ward MO, Wittels N (1990) Exploring N-dimensional databases. In: Proceedings of visualization ’90, pp. 230–237

    Google Scholar 

  10. Bendix F, Kosara R, Hauser H (2005) Parallel sets: a visual analysis of categorical data. In: Proceedings of the IEEE symposium on information visualization, pp. 133–140

    Google Scholar 

  11. Greenacre MJ (1984) Theory and application of correspondence analysis. Academic Press, London

    Google Scholar 

  12. Shiraishi K, Misue K, Tanaka J (2009) A tool for analyzing categorical data visually with granular representation. In: Proceedings of the symposium on human interface 2009 on human interface and the management of information. Information and interaction. Part II. Springer-Verlag Berlin, Heidelberg, pp. 342–351

    Google Scholar 

  13. Rabenhorst DA (2000) Revitalizing the scatter plot. In: Proceedings of SPIE vol 3905, 28th AIPR workshop: 3D visualization for data exploration and decision making, pp. 25–34

    Google Scholar 

  14. Rosario G, Rundensteiner E, Brown D, Ward M, Huang S (2004) Mapping nominal values to numbers for effective visualization. Inf Vis 3(2):80–95

    Google Scholar 

  15. Ting KM, Zhou GT, Liu FT, Tan SC (2013) Mass estimation. Mach Learn 90(1):127–160

    Article  MathSciNet  MATH  Google Scholar 

  16. Claude ES (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423

    Article  Google Scholar 

  17. Linting M, Meulman JJ, Groenen PJF, Van der Koojj AJ (2007) Nonlinear principal components analysis: introduction and application. Psychol Methods 12(3):36–358

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swee Chuan Tan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media Dordrecht

About this paper

Cite this paper

Tan, S.C. (2014). Visualising Outliers in Nominal Data. In: Uden, L., Wang, L., Corchado Rodríguez, J., Yang, HC., Ting, IH. (eds) The 8th International Conference on Knowledge Management in Organizations. Springer Proceedings in Complexity. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7287-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-7287-8_28

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-7286-1

  • Online ISBN: 978-94-007-7287-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics