Advertisement

Towards Effective Visual Data Mining with Cooperative Approaches

  • François Poulet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4404)

Abstract

Visual data-mining strategy lies in tightly coupling the visualizations and analytical processes into one data-mining tool that takes advantage of the strengths from multiple sources. We present concrete cooperation between automatic algorithms, interactive algorithms and visualization methods. The first kind of cooperation is an interactive decision tree algorithm CIAD. It allows the user to be helped by an automatic algorithm based on a support vector machine (SVM) to optimize the interactive split performed in the current tree node or to compute the best split in an automatic mode. Another effective cooperation is a visualization algorithm used to explain the results of SVM algorithm. The same visualization method can also be used to help the user in the parameters tuning step in input of automatic SVM algorithms. Then we present methods using both automatic and interactive methods to deal with very large datasets. The obtained results let us think it is a promising way to deal with very large datasets.

Keywords

Support Vector Machine Automatic Algorithm Support Vector Machine Algorithm Cooperative Approach Interactive Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.: Towards Effective and Interpretable Data Mining by Visual Interaction. SIKDD Explorations 3(2), 11–22, www.acm.org/sigkdd/explorations/
  2. 2.
    Aggarwal, C., Yu, P.: Redifining Clustering for High-Dimensional Applications. IEEE Transactions on Knowledge and Data Engineering 14(2), 210–225 (2002)CrossRefGoogle Scholar
  3. 3.
    Ankerst, M.: Visual Data Mining, PhD Thesis, Ludwig Maximilians University of Munich (2000)Google Scholar
  4. 4.
    Ankerst, M., Ester, M., Kriegel, H.-P.: Toward an Effective Cooperation of the Computer and the User for Classification. In: Proc. of KDD 2001, pp. 179–188 (2001)Google Scholar
  5. 5.
    Asimov, D.: The Grand Tour: A Tool for Viewing Multidimensional Data. SIAM Journal of Scientific and Statistical Computing 6(1), 128–143 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. In: University of California, School of Information and Computer Science, Irvine, CA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
  7. 7.
    Barber, C., Dobkin, D., Huhdanpaa, H.: The Quickhull algorithm for convex hulls. ACM Transactions On Mathematical Software 22, 469–483 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Bennett, K., Bredensteiner, E.: Duality and Geometry in SVM Classifiers. In: Langley, P. (ed.) Proc. of the Seventeenth International Conference on Machine Learning, pp. 57–64. Morgan Kaufmann, San Francisco (2000)Google Scholar
  9. 9.
    Bennett, K., Mangasarian, O.: Robust Linear Programming Discrimination of Two Linearly Inseparable Sets. Optimization Methods and Software 1, 23–34 (1992)CrossRefGoogle Scholar
  10. 10.
    Bock, H.H., Diday, E.: Analysis of Symbolic Data. Springer, Heidelberg (2000)Google Scholar
  11. 11.
    Breiman, L., Friedman, J., Olsen, R., Stone, C.: Classification and Regression Trees, Wadsworth (1984)Google Scholar
  12. 12.
    Caragea, D., Cook, D., Honavar, V.: Gaining Insights into Support Vector Machine Pattern Classifiers Using Projection-Based Tour Method. In: Proc. of KDD 2001 Workshop on Visual Data Mining (2001)Google Scholar
  13. 13.
    Carr, D.B., Littlefield, R.J., Nicholson, W.L., Littlefield, J.S.: Scatterplot Matrix Techniques for Large N. Journal of the American Statistical Association 82(398), 424–436 (1987)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Chang, C.-C., Lin, C.-J.: A Library for Support Vector Machines (2002), http://www.-csie.ntu.edu.tw/~cjlin/libsvm
  15. 15.
    Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Technical Report IDIAP-RR 02-46, IDIAP (2002)Google Scholar
  16. 16.
    Collobert, R., Bengio, S., Bengio, Y.: A parallel mixture of SVMs for very large scale problems. In: Advances in Neural Information Processing Systems, NIPS 2002, vol. 14, pp. 633–640. MIT Press, Cambridge (2002)Google Scholar
  17. 17.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)Google Scholar
  18. 18.
    Fayyad, U.: Inventing the New Sciences of the Internet: Towards Understanding the New Interactive Media, invited talk at Extraction et Gestion des Connaissances, INRIA, Sophia Antipolis, France (January 2008)Google Scholar
  19. 19.
    Fung, G., Mangasarian, O.: Incremental Support Vector Machine Classification. In: Proc. of the 2nd SIAM International Conference on Data Mining, Airlington, USA, April 11-13 (2002)Google Scholar
  20. 20.
    Fung, G., Mangasarian, O.: A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Applications 28(2), 185–202 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Gama, J., Brazdil, P.: Linear Tree. Intelligent Data Analysis 3, 1–22 (1999)zbMATHCrossRefGoogle Scholar
  22. 22.
    Han, J., Cercone, N.: Interactive Construction of Decision Trees. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 575–580. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  23. 23.
    Inselberg, A.: The Plane with Parallel Coordinates. Special Issue on Computational Geometry 1, 69–97 (1985)zbMATHGoogle Scholar
  24. 24.
    Jinyan, L., Huiqing, L.: Kent Ridge Bio-medical Data Set Repository (2002), http://sdmc.lit.org.sg/GEDatasets
  25. 25.
    Lee, Y.-J., Mangasarian, O.: RSVM: Reduced Support Vector Machines. Data Mining Institute Technical Report 00-07, Computer Sciences Department, University of Wisconsin, Madison, USA (2000)Google Scholar
  26. 26.
    Liu, Y., Salvendy, G.: Design and Evaluation of Interactive Visual Decision Tree Classification. International Journal of Human-Computer Studies 65(2), 95–110 (2006)CrossRefGoogle Scholar
  27. 27.
    MacQueen, J.: Some Methods for classification and Analysis of Multivariate Observations. In: Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  28. 28.
    Metha, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: Proc. of the 5th International Conference on Extending Database Technology, Avignon, France, pp. 18–32 (1996)Google Scholar
  29. 29.
    Muller, K.-R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An Introduction to Kernel-Based Learning Algorithms. IEEE Transactions on Neural Networks 12(2), 181–201 (2001)CrossRefGoogle Scholar
  30. 30.
    Murthy, S., Kasif, S., Salzberg, S.: A system for induction of oblique trees. Journal of Artificial Intelligence Research 2, 1–32 (1994)zbMATHGoogle Scholar
  31. 31.
    Poulet, F.: Visualization in data mining and knowledge discovery. In: Lenca, P. (ed.) Proc. of HCP 1999, 10th Mini Euro Conference, Human Centered Processes, Brest, pp. 183–192 (1999)Google Scholar
  32. 32.
    Poulet, F.: CIAD: Interactive Decision Tree Construction. In: Proc. of VIII Annual Meeting of the French Classification Society, Pointe-à-Pitre, pp. 275–282 (2001) (in french)Google Scholar
  33. 33.
    Poulet, F.: FullView: A Visual Data-Mining Environment. International Journal of Image and Graphics 2(1), 127–144 (2002)CrossRefGoogle Scholar
  34. 34.
    Poulet, F., Do, T.-N.: Mining Very Large Datasets with Support Vector Machine Algorithms. In: Camp, O., Piattini, M., Hammoudi, S. (eds.) Enterprise Information Systems V, pp. 177–184. Kluwer, Dordrecht (2004)Google Scholar
  35. 35.
    Poulet, F.: SVM and graphical algorithms: a cooperative approach. In: Proc. of IEEE ICDM 2004, the 4th IEEE International Conference on Data Mining, Brighton, UK, pp. 499–502 (2004)Google Scholar
  36. 36.
    Poulet, F.: Visual SVM. In: Proc. of ICEIS’2005, 7th International Conference on Enterprise Information Systems, Miami, USA, May 2005, vol. 2, pp. 309–314 (2005)Google Scholar
  37. 37.
    Quinlan, J.: C4.5: Programs for Machine Learning. Morgan-Kaufman Publishers, San Francisco (1993)Google Scholar
  38. 38.
    Schneiderman, B.: Inventing Discovery Tools: Combining Information Visualization with Data Mining. Information Visualization 1(1), 5–12 (2002)CrossRefGoogle Scholar
  39. 39.
    Toussaint, G.: Solving geometric problems with the rotating calipers. In: Proc. of IEEE MELECON 1983, Athens, Greece, pp. A10.02/1-4 (1983)Google Scholar
  40. 40.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)zbMATHGoogle Scholar
  41. 41.
    Ware, M., Franck, E., Holmes, G., Hall, M., Witten, I.: Interactive Machine Learning: Letting Users Build Classifiers. International Journal of Human-Computer Studies (55), 281–292 (2001)zbMATHCrossRefGoogle Scholar
  42. 42.
    Wong, P.: Visual Data Mining. IEEE Computer Graphics and Applications 19(5), 20–21 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • François Poulet
    • 1
  1. 1.IRISA - TexmexRennes cedexFrance

Personalised recommendations