Skip to main content

Towards Effective Visual Data Mining with Cooperative Approaches

  • Chapter
Visual Data Mining

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4404))

Abstract

Visual data-mining strategy lies in tightly coupling the visualizations and analytical processes into one data-mining tool that takes advantage of the strengths from multiple sources. We present concrete cooperation between automatic algorithms, interactive algorithms and visualization methods. The first kind of cooperation is an interactive decision tree algorithm CIAD. It allows the user to be helped by an automatic algorithm based on a support vector machine (SVM) to optimize the interactive split performed in the current tree node or to compute the best split in an automatic mode. Another effective cooperation is a visualization algorithm used to explain the results of SVM algorithm. The same visualization method can also be used to help the user in the parameters tuning step in input of automatic SVM algorithms. Then we present methods using both automatic and interactive methods to deal with very large datasets. The obtained results let us think it is a promising way to deal with very large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.: Towards Effective and Interpretable Data Mining by Visual Interaction. SIKDD Explorations 3(2), 11–22, www.acm.org/sigkdd/explorations/

  2. Aggarwal, C., Yu, P.: Redifining Clustering for High-Dimensional Applications. IEEE Transactions on Knowledge and Data Engineering 14(2), 210–225 (2002)

    Article  Google Scholar 

  3. Ankerst, M.: Visual Data Mining, PhD Thesis, Ludwig Maximilians University of Munich (2000)

    Google Scholar 

  4. Ankerst, M., Ester, M., Kriegel, H.-P.: Toward an Effective Cooperation of the Computer and the User for Classification. In: Proc. of KDD 2001, pp. 179–188 (2001)

    Google Scholar 

  5. Asimov, D.: The Grand Tour: A Tool for Viewing Multidimensional Data. SIAM Journal of Scientific and Statistical Computing 6(1), 128–143 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  6. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. In: University of California, School of Information and Computer Science, Irvine, CA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  7. Barber, C., Dobkin, D., Huhdanpaa, H.: The Quickhull algorithm for convex hulls. ACM Transactions On Mathematical Software 22, 469–483 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bennett, K., Bredensteiner, E.: Duality and Geometry in SVM Classifiers. In: Langley, P. (ed.) Proc. of the Seventeenth International Conference on Machine Learning, pp. 57–64. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  9. Bennett, K., Mangasarian, O.: Robust Linear Programming Discrimination of Two Linearly Inseparable Sets. Optimization Methods and Software 1, 23–34 (1992)

    Article  Google Scholar 

  10. Bock, H.H., Diday, E.: Analysis of Symbolic Data. Springer, Heidelberg (2000)

    Google Scholar 

  11. Breiman, L., Friedman, J., Olsen, R., Stone, C.: Classification and Regression Trees, Wadsworth (1984)

    Google Scholar 

  12. Caragea, D., Cook, D., Honavar, V.: Gaining Insights into Support Vector Machine Pattern Classifiers Using Projection-Based Tour Method. In: Proc. of KDD 2001 Workshop on Visual Data Mining (2001)

    Google Scholar 

  13. Carr, D.B., Littlefield, R.J., Nicholson, W.L., Littlefield, J.S.: Scatterplot Matrix Techniques for Large N. Journal of the American Statistical Association 82(398), 424–436 (1987)

    Article  MathSciNet  Google Scholar 

  14. Chang, C.-C., Lin, C.-J.: A Library for Support Vector Machines (2002), http://www.-csie.ntu.edu.tw/~cjlin/libsvm

  15. Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Technical Report IDIAP-RR 02-46, IDIAP (2002)

    Google Scholar 

  16. Collobert, R., Bengio, S., Bengio, Y.: A parallel mixture of SVMs for very large scale problems. In: Advances in Neural Information Processing Systems, NIPS 2002, vol. 14, pp. 633–640. MIT Press, Cambridge (2002)

    Google Scholar 

  17. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)

    Google Scholar 

  18. Fayyad, U.: Inventing the New Sciences of the Internet: Towards Understanding the New Interactive Media, invited talk at Extraction et Gestion des Connaissances, INRIA, Sophia Antipolis, France (January 2008)

    Google Scholar 

  19. Fung, G., Mangasarian, O.: Incremental Support Vector Machine Classification. In: Proc. of the 2nd SIAM International Conference on Data Mining, Airlington, USA, April 11-13 (2002)

    Google Scholar 

  20. Fung, G., Mangasarian, O.: A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Applications 28(2), 185–202 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  21. Gama, J., Brazdil, P.: Linear Tree. Intelligent Data Analysis 3, 1–22 (1999)

    Article  MATH  Google Scholar 

  22. Han, J., Cercone, N.: Interactive Construction of Decision Trees. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 575–580. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  23. Inselberg, A.: The Plane with Parallel Coordinates. Special Issue on Computational Geometry 1, 69–97 (1985)

    MATH  Google Scholar 

  24. Jinyan, L., Huiqing, L.: Kent Ridge Bio-medical Data Set Repository (2002), http://sdmc.lit.org.sg/GEDatasets

  25. Lee, Y.-J., Mangasarian, O.: RSVM: Reduced Support Vector Machines. Data Mining Institute Technical Report 00-07, Computer Sciences Department, University of Wisconsin, Madison, USA (2000)

    Google Scholar 

  26. Liu, Y., Salvendy, G.: Design and Evaluation of Interactive Visual Decision Tree Classification. International Journal of Human-Computer Studies 65(2), 95–110 (2006)

    Article  Google Scholar 

  27. MacQueen, J.: Some Methods for classification and Analysis of Multivariate Observations. In: Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  28. Metha, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: Proc. of the 5th International Conference on Extending Database Technology, Avignon, France, pp. 18–32 (1996)

    Google Scholar 

  29. Muller, K.-R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An Introduction to Kernel-Based Learning Algorithms. IEEE Transactions on Neural Networks 12(2), 181–201 (2001)

    Article  Google Scholar 

  30. Murthy, S., Kasif, S., Salzberg, S.: A system for induction of oblique trees. Journal of Artificial Intelligence Research 2, 1–32 (1994)

    MATH  Google Scholar 

  31. Poulet, F.: Visualization in data mining and knowledge discovery. In: Lenca, P. (ed.) Proc. of HCP 1999, 10th Mini Euro Conference, Human Centered Processes, Brest, pp. 183–192 (1999)

    Google Scholar 

  32. Poulet, F.: CIAD: Interactive Decision Tree Construction. In: Proc. of VIII Annual Meeting of the French Classification Society, Pointe-à-Pitre, pp. 275–282 (2001) (in french)

    Google Scholar 

  33. Poulet, F.: FullView: A Visual Data-Mining Environment. International Journal of Image and Graphics 2(1), 127–144 (2002)

    Article  Google Scholar 

  34. Poulet, F., Do, T.-N.: Mining Very Large Datasets with Support Vector Machine Algorithms. In: Camp, O., Piattini, M., Hammoudi, S. (eds.) Enterprise Information Systems V, pp. 177–184. Kluwer, Dordrecht (2004)

    Google Scholar 

  35. Poulet, F.: SVM and graphical algorithms: a cooperative approach. In: Proc. of IEEE ICDM 2004, the 4th IEEE International Conference on Data Mining, Brighton, UK, pp. 499–502 (2004)

    Google Scholar 

  36. Poulet, F.: Visual SVM. In: Proc. of ICEIS’2005, 7th International Conference on Enterprise Information Systems, Miami, USA, May 2005, vol. 2, pp. 309–314 (2005)

    Google Scholar 

  37. Quinlan, J.: C4.5: Programs for Machine Learning. Morgan-Kaufman Publishers, San Francisco (1993)

    Google Scholar 

  38. Schneiderman, B.: Inventing Discovery Tools: Combining Information Visualization with Data Mining. Information Visualization 1(1), 5–12 (2002)

    Article  Google Scholar 

  39. Toussaint, G.: Solving geometric problems with the rotating calipers. In: Proc. of IEEE MELECON 1983, Athens, Greece, pp. A10.02/1-4 (1983)

    Google Scholar 

  40. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  41. Ware, M., Franck, E., Holmes, G., Hall, M., Witten, I.: Interactive Machine Learning: Letting Users Build Classifiers. International Journal of Human-Computer Studies (55), 281–292 (2001)

    Article  MATH  Google Scholar 

  42. Wong, P.: Visual Data Mining. IEEE Computer Graphics and Applications 19(5), 20–21 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Simeon J. Simoff Michael H. Böhlen Arturas Mazeika

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Poulet, F. (2008). Towards Effective Visual Data Mining with Cooperative Approaches. In: Simoff, S.J., Böhlen, M.H., Mazeika, A. (eds) Visual Data Mining. Lecture Notes in Computer Science, vol 4404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71080-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71080-6_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71079-0

  • Online ISBN: 978-3-540-71080-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics