Skip to main content

Discovering and Visualizing Relations in High Dimensional Data

  • Chapter
  • First Online:

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

Abstract

\(\mathcal{A}\) dataset with M items has 2M subsets anyone of which may be the one we really want. With a good data display our fantastic pattern-recognition ability can not only cut great swaths searching through this combinatorial explosion but also extract insights from the visual patterns. These are the core reasons for data visualization. With Parallel Coordinates (abbr.-cs) the search for multivariate relations in high dimensional datasets is transformed into a 2-D pattern recognition problem. Multidimensional exploration is illustrated on real datasets, in the process describing good query design with atomic queries and compound ones using boolean operations. Then complex datasets are classified with a geometric classification algorithm based on-cs. The algorithm has low computational complexity providing the classification rule explicitly and visually. The minimal set of variables required to state the rule is found and ordered by their predictive value. By means of a new divide and conquer technique the classification is extended to previously inaccessible datasets. This new result and others like the triad adjancency problem appear for the first time. A visual economic model of a real country is constructed and analyzed to illustrate how multivariate relations can be modeled by means of hypersurfaces. The overview at the end provides the foundational understanding for-cs, examples of exciting recent results like viewing convexity in any dimension, non-orientability (as in the Mbius strip) and a prelude of what is on the way: the discovery and display of relational information in high-dimensional datasets as visual patterns multidimensional graphs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The venerable name “Exploratory Data Analysis” EDA is used interchangeably with the currently more fashionable “Visual Data Mining”.

  2. 2.

    MDG’s Ltd proprietary software – All Rights Reserved, is used by permission.

  3. 3.

    Suggesting that the Landsat Thematic mapper band 4 filters out water though unknown to me.

  4. 4.

    My dentist really liked this name!

  5. 5.

    By S j  ⊂ S k it is meant that the set of points enclosed in the hypersurface S j is contained in the set of points enclosed by the hypersurface S k .

References

  • Agarwal, R., Gehrke, J.E., Gunopoulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional data for Data Mining. USA Patent 6003029 (1999)

    Google Scholar 

  • Bollobas, B.: Graph Theory. Springer, New York (1979)

    MATH  Google Scholar 

  • Chatterjee, A.: Visualizing Multidimensional Polytopes and Topologies for Tolerances. Ph.D. thesis, Department of Computer Science, University of Southern California (1995)

    Google Scholar 

  • Chatterjee, A., Das, P.P., Bhattacharya, S.: Visualization in linear programming using parallel coordinates. Pattern Recogn. 26-11, 1725–36 (1993)

    Article  Google Scholar 

  • Choi, H., Lee, H.: PCAV: Internet Attack Visualization in Parallel Coordinates, LNCS 3783, 454–466. Springer, New York (2005)

    Google Scholar 

  • Chomut, T.: Exploratory Data Analysis in Parallel Coordinates. M.Sc. thesis, Department of Computer Science, UCLA (1987)

    Google Scholar 

  • Cohan, S.M., Yang, D.C.H.: Mobility analysis in parallel coordinates. J. Mech. Mach. 21, 63–71 (1986)

    Article  Google Scholar 

  • Conti, G.: Security Data Visualization. No Starch Press, San Francisco (2007)

    Google Scholar 

  • Desai, A., Walters, L.C.: Graphical representation of data envelopment analyses:management implications from parallel axes representations. Dec. Scien. 22(2), 335–353 (1991)

    Article  Google Scholar 

  • Eickemeyer, J.: Visualizing p-flats in N-space using Parallel Coordinates. Ph.D. thesis, Department of Computer Science, UCLA (1992)

    Google Scholar 

  • Fiorini, P., Inselberg, A.: Configuration Space Representation in Parallel Coordinates. IEEE Conf. Rob. Aut. 1215–1220 (1989)

    Google Scholar 

  • Friendly, M., al: Milestones in Thematic Cartography. www.math.yorku.ca/scs/SCS/Gallery/milestones/ (2005)

  • Gennings, C., Dawson, K.S., Carter, W.H., Myers, R.H.: Interpreting plots of a multidimensional dose-response surface in parallel coordinates. Biometrics 46, 719–35 (1990)

    Article  Google Scholar 

  • Han, J., Kamber, M.: Data Mining Concepts and Technology. Morgan-Kaufman, San Francisco (2001)

    Google Scholar 

  • Harary, F.: Graph Theory. Addison-Wesley, Reading, Mass (1969)

    Google Scholar 

  • Hauser, H.: Parallel Sets: Visual Analysis of Categorical Data. Proceedings of IEEE Infovis (2005)

    Google Scholar 

  • Hung, C.K., Inselberg, A.: Parallel Coordinate Representation of Smooth Hypersurfaces. USC Tech. Report # CS - 92 -531, Los Angeles (1992)

    Google Scholar 

  • Hung, C.K., Inselberg, A.: Description of Surfaces in Parallel Coordinates by Linked Planar Regions, Mathematics of Surfaces XII, 177-208, LNCS 4647. Springer, New York (2007)

    Google Scholar 

  • Hurley, C.B., Olford, R.W.: Pairwise Display of High-Dimensional Information via Eulerian Tours and Hamiltonian Decompositions, Journal of Computational and Graphical Statistics 19(4), 861–886 (2010).

    Article  Google Scholar 

  • Inselberg, A.: The plane with parallel coordinates. Vis. Comput. 1, 69–97 (1985)

    Article  MATH  Google Scholar 

  • Inselberg, A.: Multidimensional Detective, in Proceedings of IEEE Information Visualization ’97, 100-107. IEEE Computer Society, Los Alamitos, CA (1997)

    Google Scholar 

  • Inselberg, A.: Parallel Coordinates : VISUAL Multidimensional Geometry and its Applications. Springer, New York (2009)

    MATH  Google Scholar 

  • Inselberg, A., Avidan, T.: The Automated Multidimensional Detective, In Proceedings of IEEE Information Visualization ’99, 112-119. IEEE Computer Society, Los Alamitos, CA (1999)

    Google Scholar 

  • Inselberg, A., Avidan, T.: Classification and Visualization for High-Dimensional Data, In Proceedings of KDD, 370-4. ACM, New York (2000)

    Google Scholar 

  • Inselberg, A., Boz, M., Dimsdale, B.: Planar Conflict Resolution Algorithm for Air-Traffic Control and the One-Shot Problem, in IBM PASC Tech. Rep. G320-3559. IBM Palo Alto Scientific Center (1991)

    Google Scholar 

  • Inselberg, A., Dimsdale, B.: Parallel Coordinates: A Tool For Visualizing Multi-Dimensional Geometry, Proceedings of IEEE Conference on Visualization, 361-378. IEEE Computer Society, Los Alamitos, CA (1990)

    Google Scholar 

  • Inselberg, A., Reif, M., Chomut, T.: Convexity algorithms in parallel coordinates. J. ACM 34, 765–801 (1987)

    Article  MathSciNet  Google Scholar 

  • Jones, C.: Visualization and Optimization. Kluwer Academic Publishers, Boston (1996)

    MATH  Google Scholar 

  • Matskewich, T., Inselberg, A., Bercovier, M.: Approximated Planes in Parallel Coordinates. In Proceedings of Geometry Modeling Conference, St. Malo, Vanderbilt University Press, 257–266 (2000)

    Google Scholar 

  • Schmid, C., Hinterberger, H.: Comparative Multivariate Vis. Across Conceptually Different Graphic Displays, in Proceedings of 7th SSDBM. IEEE Computer Society, Los Alamitos, CA (1994)

    Google Scholar 

  • Theus, M., Urbanek, S.: Interactive Graphics for Data Analysis. CRC Press, Boca Raton FL (2009)

    Google Scholar 

  • Tufte, E.R.: Visual Explanation. Graphic Press, Connecticut (1996)

    Google Scholar 

  • UCI. Machine Learning Database Repository at. www.ics.uci.edu/~mlearn/MLRepository.html.

  • Ward, M.O.: XmdvTool: integrating multiple methods for visualizing multivariate data, Proceedings IEEE Conference on Visualization, CA, 326-333. IEEE Computer Society, Los Alamitos, CA (1994)

    Google Scholar 

  • Wegman, E.: Hyperdimensional data analysis using parallel coordinates. J. Am. Stat. Assoc. 85, 664–675 (1990)

    Article  Google Scholar 

Download references

Acknowledgements

I am grateful to David Adjiashvili who wrote the magnificent interactive software diplaying the ∥ -cs representation of surfaces seen in Fig. 11.33 through 11.36. Senior Fellow San Diego SuperComputing Center & Multidimensional Graphs Ltd, Raanana 43556, Israel

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfred Inselberg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Inselberg, A. (2012). Discovering and Visualizing Relations in High Dimensional Data. In: Gentle, J., Härdle, W., Mori, Y. (eds) Handbook of Computational Statistics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21551-3_11

Download citation

Publish with us

Policies and ethics