Advertisement

Visual Data Mining and Discovery with Binarized Vectors

  • Boris Kovalerchuk
  • Florian Delizy
  • Logan Riggs
  • Evgenii Vityaev
Part of the Intelligent Systems Reference Library book series (ISRL, volume 24)

Abstract

The emerging field of Visual Analytics combines several fields where Data Mining and Visualization play leading roles. The fundamental departure of visual analytics from other approaches is in extensive use of visual analytical tools to discover patterns not only to visualize pattern that have been discovered by traditional data mining methods. High complexity data mining tasks often require employing a multi-level top-down approach, where first at the top levels a qualitative analysis of the complex situation is conducted and top-level patterns are discovered. This paper presents the concept of Monotone Boolean Function Visual Analytics (MBFVA) for such top level pattern discovery. This approach employs binarization and monotonization of quantitative attributes to get a top level data representation. The top level discoveries form a foundation for next more detailed data mining levels where patterns are refined. The approach is illustrated with application to the medical, law enforcement and security domains. The medical application is concerned with discovering breast cancer diagnostic rules (i) interactively with a radiologist, (ii) analytically with data mining algorithms, and (iii) visually. The coordinated visualization of these rules opens an opportunity to coordinate the multi-source rules, and to come up with rules that are meaningful for the expert in the field, and are confirmed with the database. Often experts and data mining algorithms operate at the very different and incomparable levels of detail and produce incomparable patterns. The proposed MBFVA approach allows solving this problem. This paper shows how to represent and visualize binary multivariate data in 2-D and 3-D. This representation preserves the structural relations that exist in multivariate data. It creates a new opportunity to guide the visual discovery of unknown patterns in the data. In particular, the structural representation allows us to convert a complex border between the patterns in multidimensional space into visual 2-D and 3-D forms. This decreases the information overload on the user. The visualization shows not only the border between classes, but also shows a location of the case of interest relative to the border between the patterns. A user does not need to see the thousands of previous cases that have been used to build a border between the patterns. If the abnormal case is deeply inside in the abnormal area, far away from the border between “normal” and “abnormal” classes, then this shows that this case is very abnormal and needs immediate attention. The paper concludes with the outline of the scaling of the algorithm for the large data sets and expanding the approach for non-monotone data.

Keywords

Data Mining Visual discovery Monotone chains Multi-level Data Mining Monotone Boolean Function Visual Analytics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Beilken, C., Spenke, M.: Visual interactive data mining with InfoZoom-the Medical Data Set. In: 3rd European Conf. on Principles and Practice of Knowledge Discovery in Databases, PKDD (1999), http://lisp.vse.cz/pkdd99/Challenge/spenke-m.zip
  2. 2.
    Groth, D., Robertson, E.: Architectural support for database visualization. In: Workshop on New Paradigms in Information Visualization and Manipulation, pp. 53–55 (1998)Google Scholar
  3. 3.
    Hansel, G.: Sur le nombre des functions Bool’eenes monotones de n variables. C.R. Acad. Sci., Paris 262(20), 1088–1090 (1966)MathSciNetGoogle Scholar
  4. 4.
    Inselberg, A., Dimsdale, B.: Parallel coordinates: A tool for visualizing multidimensional Geometry. In: Proceedings of IEEE Visualization 1990, pp. 360–375. IEEE Computer Society Press, Los Alamitos (1990)Google Scholar
  5. 5.
    Keim, D., Hao Ming, C., Dayal, U., Meichun, H.: Pixel bar charts: a visualization technique for very large multiattributes data sets. Information Visualization 1(1), 20–34 (2002)Google Scholar
  6. 6.
    Keim, D., Müller, W., Schumann, H.: Visual Data Mining. In: EUROGRAPHICS 2002 STAR (2002), http://www.eg.org/eg/dl/conf/eg2002/stars/s3_visualdatamining_mueller.pdf
  7. 7.
    Keim, D.: Information Visualization and Visual Data Mining. IEEE TVCG 7(1), 100–107 (2002)MathSciNetGoogle Scholar
  8. 8.
    Keller, N., Pilpel, H.: Linear transformations of monotone functions on the discrete cube. Discrete Mathematics 309(12), 4210–4214 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Korshunov, A.D.: Monotone Boolean Functions. Russian Math. Surveys 58(5), 929–1001 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Kovalerchuk, B., Delizy, F.: Visual Data Mining using Monotone Boolean functions. In: Kovalerchuk, B., Schwing, J. (eds.) Visual and Spatial Analysis, pp. 387–406. Springer, Heidelberg (2005)Google Scholar
  11. 11.
    Kovalerchuk, B., Triantaphyllou, E., Despande, A., Vityaev, E.: Interactive Learning of Monotone Boolean Functions. Information Sciences. Information Sciences 94(1-4), 87–118 (1996)CrossRefGoogle Scholar
  12. 12.
    Kovalerchuk, B., Vityaev, E., Ruiz, J.: Consistent and complete data and “expert” mining in medicine. In: Medical Data Mining and Knowledge Discovery, pp. 238–280. Springer, Heidelberg (2001)Google Scholar
  13. 13.
    Kovalerchuk, B., Vityaev, E.: Data Mining in Finance: Advances in Relational and Hybrid Methods. Kluwer/Springer, Heidelberg, Dordrecht (2000)zbMATHGoogle Scholar
  14. 14.
    Kovalerchuk, B., Perlovsky, L.: Fusion and Mining Spatial Data in Cyber-physical space with Phenomena Dynamic Logic. In: Proceedings of the 2009 International Joint Conference on Neural Networks, Atlanta, Georgia, USA, pp. 2440–2447 (2009)Google Scholar
  15. 15.
    Kovalerchuk, B., Perlovsky, L.: Dynamic Logic of Phenomena and Cognition. In: Computational Intelligence: Research Frontiers, pp. 3529–3536. IEEE, Hong Kong (2008)Google Scholar
  16. 16.
    Lim, S.: Interactive Visual Data Mining of a Large Fire Detector Database. In: International Conference on Information Science and Applications (ICISA), pp. 1–8 (2010), doi:10.1109/ICISA.2010.5480395Google Scholar
  17. 17.
    Lim, S.: On A Visual Frequent Itemset Mining. In: Proc. of the 4th Int’l Conf. on Digital Information Management (ICDIM 2009), pp. 46–51. IEEE, Los Alamitos (2009)Google Scholar
  18. 18.
    de Oliveira, M., Levkowitz, H.: From Visual Data Exploration to Visual Data Mining: A Survey. IEEE TVCG 9(3), 378–394 (2003)Google Scholar
  19. 19.
    Pak, C., Bergeron, R.: 30 Years of Multidimensional Multivariate Visualization. In: Scientific Visualization, pp. 3–33. Society Press (1997)Google Scholar
  20. 20.
    Shaw, C., Hall, J., Blahut, C., Ebert, D., Roberts, A.: Using shape to visualize multivariate data. In: CIKM 1999 Workshop on New Paradigms in Information Visualization and Manipulation, pp. 17–20. ACM Press, New York (1999)Google Scholar
  21. 21.
    Ward, M.: A taxonomy of glyph placement strategies for multidimensional data visualization. Information Visualization 1, 194–210 (2002)CrossRefGoogle Scholar
  22. 22.
    Schulz, H., Nocke, T., Schumann, H.: A framework for visual data mining of structures. In: ACM International Conf. Proc Series, vol. 171; Proc. 29th Australasian Computer Science Conf., Hobart, vol. 48, pp. 157–166 (2006)Google Scholar
  23. 23.
    Badjio, E., Poulet, F.: Dimension Reduction for Visual Data Mining. In: Stochastic Models and Data Analysis, ASMDA-2005 (2002), http://conferences.telecom-bretagne.eu/asmda2005/IMG/pdf/proceedings/266.pdf
  24. 24.
    Wong, P., Whitney, P., Thomas, j.: Visualizing Association Rules for Text Mining. In: Proc. of the IEEE INFOVIS, pp. 120–123. IEEE, Los Alamitos (1999)Google Scholar
  25. 25.
    Wong, P.C.: Visual Data Mining. In: IEEE CG&A, pp. 20–21 (September/October 1999)Google Scholar
  26. 26.
    Zhao, K., Bing, L., Tirpak, T.M., Weimin, X.: A visual data mining framework for convenient identification of useful knowledge. In: Fifth IEEE International Conference on Data Mining, 8 p (2005), doi:10.1109/ICDM.2005.16Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Boris Kovalerchuk
    • 1
  • Florian Delizy
    • 1
  • Logan Riggs
    • 1
  • Evgenii Vityaev
    • 2
  1. 1.Dept. of Computer ScienceCentral Washington UniversityEllensburgUSA
  2. 2.Institute of Mathematics, Russian Academy of SciencesNovosibirskRussia

Personalised recommendations