Visualising Outliers in Nominal Data

Tan, Swee Chuan

doi:10.1007/978-94-007-7287-8_28

Swee Chuan Tan⁶

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

2929 Accesses

Abstract

Scatter plot is a useful method for visualising clusters and outliers in continuous data. However, this method cannot be used directly on nominal data due to a lack of natural ordering and ‘distance’ in nominal values. One solution to this problem is to map the multi-dimensional nominal data to a numeric space, and then draw a scatter plot of the data points based on the first two principal components of the numeric space. This paper reports a study on how such plots can be generated using three types of mapping: (a) Binary Input Mapping (BImap), (b) Attribute Value Frequency Mapping (AVFmap), and (c) BImap combined with AVFmap. Results show that the combined method draws upon the complementary strengths of BImap and AVFmap, to generate meaningful scatter plots for visualising categorical outliers and achieve the highest information gain among the methods tested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dejan T (2008) Gestalt principles. Scholarpedia 3(12):5345
Article Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, CA Online: http://archive.ics.uci.edu/ml
Smith LI (2002) A tutorial on principal component analysis. Online: www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Kader GD, Perry M (2007) Variability for categorical variables. J Stat Educ 15(2). Online: www.amstat.org/publications/jse/v15n2/kader.html
Koufakou A, Ortiz E, Georgiopoulos M, Anagnostopoulos G, Reynolds K (2007) A scalable and efficient outlier detection strategy for categorical data. In: Proceedings of IEEE international conference on tools with artificial intelligence ICTAI, pp. 210–217
Google Scholar
Ma S, Hellerstein JL (1999) Ordering categorical data to improve visualization. In: IEEE information visualization symposium, IEEE, pp. 15–18
Google Scholar
Chandola V, Boriah S, Kumar V (2009) A framework for exploring categorical data. In: Proceedings of the ninth SIAM international conference on data mining
Google Scholar
Friendly M (2000) Visualizing categorical data. SAS Publishing, Cary
Google Scholar
LeBlanc J, Ward MO, Wittels N (1990) Exploring N-dimensional databases. In: Proceedings of visualization ’90, pp. 230–237
Google Scholar
Bendix F, Kosara R, Hauser H (2005) Parallel sets: a visual analysis of categorical data. In: Proceedings of the IEEE symposium on information visualization, pp. 133–140
Google Scholar
Greenacre MJ (1984) Theory and application of correspondence analysis. Academic Press, London
Google Scholar
Shiraishi K, Misue K, Tanaka J (2009) A tool for analyzing categorical data visually with granular representation. In: Proceedings of the symposium on human interface 2009 on human interface and the management of information. Information and interaction. Part II. Springer-Verlag Berlin, Heidelberg, pp. 342–351
Google Scholar
Rabenhorst DA (2000) Revitalizing the scatter plot. In: Proceedings of SPIE vol 3905, 28th AIPR workshop: 3D visualization for data exploration and decision making, pp. 25–34
Google Scholar
Rosario G, Rundensteiner E, Brown D, Ward M, Huang S (2004) Mapping nominal values to numbers for effective visualization. Inf Vis 3(2):80–95
Google Scholar
Ting KM, Zhou GT, Liu FT, Tan SC (2013) Mass estimation. Mach Learn 90(1):127–160
Article MathSciNet MATH Google Scholar
Claude ES (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423
Article Google Scholar
Linting M, Meulman JJ, Groenen PJF, Van der Koojj AJ (2007) Nonlinear principal components analysis: introduction and application. Psychol Methods 12(3):36–358
Google Scholar

Download references

Author information

Authors and Affiliations

School of Business, SIM University, 535A Clementi Road, Singapore, Singapore
Swee Chuan Tan

Authors

Swee Chuan Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Swee Chuan Tan .

Editor information

Editors and Affiliations

School of Computing, Staffordshire University, Stafford, United Kingdom
Lorna Uden
College of Management, National University of Kaohsiung, Kaohsiung, Taiwan, Taiwan
Leon S.L. Wang
and Control Faculty of Science, Universidad Salamanca Department of Computing Science, Salamanca, Spain
Juan Manuel Corchado Rodríguez
National University of Kaohsiung, Kaohsiung, Taiwan
Hsin-Chang Yang
Department of Information Management, National University of Kaohsiung, Kaohsiung, Taiwan, Taiwan
I-Hsien Ting

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, S.C. (2014). Visualising Outliers in Nominal Data. In: Uden, L., Wang, L., Corchado Rodríguez, J., Yang, HC., Ting, IH. (eds) The 8th International Conference on Knowledge Management in Organizations. Springer Proceedings in Complexity. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7287-8_28

Download citation

DOI: https://doi.org/10.1007/978-94-007-7287-8_28
Published: 06 September 2013
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-7286-1
Online ISBN: 978-94-007-7287-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics