Abstract
Visual data-mining strategy lies in tightly coupling the visualizations and analytical processes into one data-mining tool that takes advantage of the strengths from multiple sources. We present concrete cooperation between automatic algorithms, interactive algorithms and visualization methods. The first kind of cooperation is an interactive decision tree algorithm CIAD. It allows the user to be helped by an automatic algorithm based on a support vector machine (SVM) to optimize the interactive split performed in the current tree node or to compute the best split in an automatic mode. Another effective cooperation is a visualization algorithm used to explain the results of SVM algorithm. The same visualization method can also be used to help the user in the parameters tuning step in input of automatic SVM algorithms. Then we present methods using both automatic and interactive methods to deal with very large datasets. The obtained results let us think it is a promising way to deal with very large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.: Towards Effective and Interpretable Data Mining by Visual Interaction. SIKDD Explorations 3(2), 11–22, www.acm.org/sigkdd/explorations/
Aggarwal, C., Yu, P.: Redifining Clustering for High-Dimensional Applications. IEEE Transactions on Knowledge and Data Engineering 14(2), 210–225 (2002)
Ankerst, M.: Visual Data Mining, PhD Thesis, Ludwig Maximilians University of Munich (2000)
Ankerst, M., Ester, M., Kriegel, H.-P.: Toward an Effective Cooperation of the Computer and the User for Classification. In: Proc. of KDD 2001, pp. 179–188 (2001)
Asimov, D.: The Grand Tour: A Tool for Viewing Multidimensional Data. SIAM Journal of Scientific and Statistical Computing 6(1), 128–143 (1985)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. In: University of California, School of Information and Computer Science, Irvine, CA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Barber, C., Dobkin, D., Huhdanpaa, H.: The Quickhull algorithm for convex hulls. ACM Transactions On Mathematical Software 22, 469–483 (1996)
Bennett, K., Bredensteiner, E.: Duality and Geometry in SVM Classifiers. In: Langley, P. (ed.) Proc. of the Seventeenth International Conference on Machine Learning, pp. 57–64. Morgan Kaufmann, San Francisco (2000)
Bennett, K., Mangasarian, O.: Robust Linear Programming Discrimination of Two Linearly Inseparable Sets. Optimization Methods and Software 1, 23–34 (1992)
Bock, H.H., Diday, E.: Analysis of Symbolic Data. Springer, Heidelberg (2000)
Breiman, L., Friedman, J., Olsen, R., Stone, C.: Classification and Regression Trees, Wadsworth (1984)
Caragea, D., Cook, D., Honavar, V.: Gaining Insights into Support Vector Machine Pattern Classifiers Using Projection-Based Tour Method. In: Proc. of KDD 2001 Workshop on Visual Data Mining (2001)
Carr, D.B., Littlefield, R.J., Nicholson, W.L., Littlefield, J.S.: Scatterplot Matrix Techniques for Large N. Journal of the American Statistical Association 82(398), 424–436 (1987)
Chang, C.-C., Lin, C.-J.: A Library for Support Vector Machines (2002), http://www.-csie.ntu.edu.tw/~cjlin/libsvm
Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Technical Report IDIAP-RR 02-46, IDIAP (2002)
Collobert, R., Bengio, S., Bengio, Y.: A parallel mixture of SVMs for very large scale problems. In: Advances in Neural Information Processing Systems, NIPS 2002, vol. 14, pp. 633–640. MIT Press, Cambridge (2002)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)
Fayyad, U.: Inventing the New Sciences of the Internet: Towards Understanding the New Interactive Media, invited talk at Extraction et Gestion des Connaissances, INRIA, Sophia Antipolis, France (January 2008)
Fung, G., Mangasarian, O.: Incremental Support Vector Machine Classification. In: Proc. of the 2nd SIAM International Conference on Data Mining, Airlington, USA, April 11-13 (2002)
Fung, G., Mangasarian, O.: A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Applications 28(2), 185–202 (2004)
Gama, J., Brazdil, P.: Linear Tree. Intelligent Data Analysis 3, 1–22 (1999)
Han, J., Cercone, N.: Interactive Construction of Decision Trees. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 575–580. Springer, Heidelberg (2001)
Inselberg, A.: The Plane with Parallel Coordinates. Special Issue on Computational Geometry 1, 69–97 (1985)
Jinyan, L., Huiqing, L.: Kent Ridge Bio-medical Data Set Repository (2002), http://sdmc.lit.org.sg/GEDatasets
Lee, Y.-J., Mangasarian, O.: RSVM: Reduced Support Vector Machines. Data Mining Institute Technical Report 00-07, Computer Sciences Department, University of Wisconsin, Madison, USA (2000)
Liu, Y., Salvendy, G.: Design and Evaluation of Interactive Visual Decision Tree Classification. International Journal of Human-Computer Studies 65(2), 95–110 (2006)
MacQueen, J.: Some Methods for classification and Analysis of Multivariate Observations. In: Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Metha, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: Proc. of the 5th International Conference on Extending Database Technology, Avignon, France, pp. 18–32 (1996)
Muller, K.-R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An Introduction to Kernel-Based Learning Algorithms. IEEE Transactions on Neural Networks 12(2), 181–201 (2001)
Murthy, S., Kasif, S., Salzberg, S.: A system for induction of oblique trees. Journal of Artificial Intelligence Research 2, 1–32 (1994)
Poulet, F.: Visualization in data mining and knowledge discovery. In: Lenca, P. (ed.) Proc. of HCP 1999, 10th Mini Euro Conference, Human Centered Processes, Brest, pp. 183–192 (1999)
Poulet, F.: CIAD: Interactive Decision Tree Construction. In: Proc. of VIII Annual Meeting of the French Classification Society, Pointe-à-Pitre, pp. 275–282 (2001) (in french)
Poulet, F.: FullView: A Visual Data-Mining Environment. International Journal of Image and Graphics 2(1), 127–144 (2002)
Poulet, F., Do, T.-N.: Mining Very Large Datasets with Support Vector Machine Algorithms. In: Camp, O., Piattini, M., Hammoudi, S. (eds.) Enterprise Information Systems V, pp. 177–184. Kluwer, Dordrecht (2004)
Poulet, F.: SVM and graphical algorithms: a cooperative approach. In: Proc. of IEEE ICDM 2004, the 4th IEEE International Conference on Data Mining, Brighton, UK, pp. 499–502 (2004)
Poulet, F.: Visual SVM. In: Proc. of ICEIS’2005, 7th International Conference on Enterprise Information Systems, Miami, USA, May 2005, vol. 2, pp. 309–314 (2005)
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan-Kaufman Publishers, San Francisco (1993)
Schneiderman, B.: Inventing Discovery Tools: Combining Information Visualization with Data Mining. Information Visualization 1(1), 5–12 (2002)
Toussaint, G.: Solving geometric problems with the rotating calipers. In: Proc. of IEEE MELECON 1983, Athens, Greece, pp. A10.02/1-4 (1983)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Ware, M., Franck, E., Holmes, G., Hall, M., Witten, I.: Interactive Machine Learning: Letting Users Build Classifiers. International Journal of Human-Computer Studies (55), 281–292 (2001)
Wong, P.: Visual Data Mining. IEEE Computer Graphics and Applications 19(5), 20–21 (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Poulet, F. (2008). Towards Effective Visual Data Mining with Cooperative Approaches. In: Simoff, S.J., Böhlen, M.H., Mazeika, A. (eds) Visual Data Mining. Lecture Notes in Computer Science, vol 4404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71080-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-71080-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71079-0
Online ISBN: 978-3-540-71080-6
eBook Packages: Computer ScienceComputer Science (R0)