Abstract
Data mining refers to a process that aims at extracting knowledge by discovering new patterns from large datasets. Classification is a data mining task that generalizes an established, proven structure to apply to new data. A dominant area of modern-day research is the field of medical investigations that include disease prediction and malady categorization. In this paper, our focus is to design an efficient classifier that is trained to classify oncogenic data. The Lymphographic dataset is utilized by means of machine learning techniques to train the classifier using feature selection and classification algorithms. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the performance of sixteen classification algorithms on the Lymphographic dataset that enables the classifier to accurately perform multi-class categorization of medical data. Furthermore our research work also places emphasis on the performance of four feature selection algorithms and their impact on the classification accuracy. Our work asserts the fact that the Random Tree algorithm and the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm. Moreover ReliefF feature selection algorithm gives improved results for Radial Basis Function algorithm improving the classifier accuracy by 1.35%. It is also stated here that the C4.5 algorithm offers more efficient classification since the decision tree size generated is smaller than the Random Tree.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kotsiantis, S.B.: Supervised Machine Learning: A Review of Classification Techniques. Informatica (31), 249–268 (2007)
Han, J., Kamber, M.: Data Mining; Concepts and Techniques. Morgan Kaufmann Publishers (2000)
Mitchell, T.M.: Machine Learning. The Mc-Graw-Hill Companies, Inc. (1997)
Nancy, P., Geetha Ramani, R., Jacob, S.G.: Discovery of Gender Classification Rules for Social Network Data using Data Mining Algorithms. In: Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC 2011), Kanyakumari, India, pp. 808–812 (2011a), IEEE Catalog Number: CFP1120J-PRT, ISBN:978-1-61284-766-5
Nancy, P., Geetha Ramani, R.: A Comparison on Performance of Data Mining Algorithms in Classification of Social Network Data. International Journal of Computer Applications 32(8), 47–54 (2011b), doi:10.5120/3927-5555
Tan, Steinbach, Kumar: Introduction to Data Mining (2004)
Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer (2000)
Jacob, S.G., Geetha Ramani, R.: Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data. International Journal of Computer Applications (IJCA) 32(7), 46–53 (2011a), doi:10.5120/3920-5521
Warwick, R., Williams, P.L.: Angiology, ch. 6. Gray’s anatomy. Illustrated by Richard E. M. Moore, 3rd edn., pp. 588–785, Longman, London (1973) (1858)
Guermazi, A., Brice, P., Hennequin, C., Sarfati, E.: Lymphography: an old technique retains its usefulness. Radiographics 23(6), 1541–1558, discussion 1559–1560 (2003)
Chuang, T.-C., Ersoy, O.K., Gelfand, S.B.: Boosting Classification Accuracy With Samples Chosen From A Validation Set. In: ANNIE, Intelligent Engineering Systems through Artificial Neural Networks, St. Louis, MO, pp. 455–461 (2007)
Polat, K., Gunes, S.: A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Systems with Applications: An International Journal 36(2) (2009)
Holte, R.C.: Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63–91 (1993)
McSherry, D.: Conversational case-based reasoning in medical decision making. Artificial Intelligence Med. 52(2), 59–66 (2011)
SGI - MLC++: Datasets from UCI
Tanagra Data Mining tutorials, http://data-mining-tutorials.blogspot.com/
Garcia-Lopez, F.C., Garcia-Torres, M., Melian, B., Moreno-Perez, J.A., Moreno-Vega, J.M.: Solving feature subset selection problem by a Parallel Scatter Search. European Journal of Operational Research 169(2), 477–489 (2006)
Nguyen, H., Franke, K., Petrovic, S.: Optimizing a class of feature selection measures. In: Proceedings of the NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Sub modularity, Sparsity & Polyhedra (DISCML), Vancouver, Canada (2009)
Jacob, S.G., Geetha Ramani, R., Nancy, P.: Feature Selection and Classification in Breast Cancer Datasets through Data Mining Algorithms. In: Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC 2011), Kanyakumari, India, pp. 661–667 (2011b), IEEE Catalog Number: CFP1120J-PRT, ISBN: 978-1-61284-766-5
Jacob, S.G., Geetha Ramani, R., Nancy, P.: Efficient Classifier for Classification of Hepatitis C Virus Clinical Data through Data Mining Algorithms and Techniques. In: Proceedings of the International Conference on Computer Applications, Pondicherry, India, January 27-31, Techno Forum Group, India (2012), doi:10.73445/ISBN_0768, ISBN: 978-81-920575-8-3, ACM#.dber.imera.10.73445
Dat, T.H., Guan, C.: Feature Selection Based on Fisher Ratio and Mutual Information Analyses for Robust Brain Computer Interface. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2007)
Kohavi, R., Quinlan, R.: Decision Tree Discovery (2009)
Breiman, L., Cuttler, A.: Random Trees, http://www.stat.berkeley.edu/users/breiman/RandomForests/
Korting, T.S.: C4.5 algorithm and Multivariate Decision Trees. Image Processing Division, National Institute for Space Research – INPESão José dos Campos–SP, Brazil (2006)
Chandra, B., Basker, S.: A new approach for classification of patterns having categorical attributes. In: 2011 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Anchorage, AK, October 9-12, pp. 960–964 (2011), doi:10.1109/ICSMC.2011.6083793, ISSN:1062-922X, ISBN: 978-1-4577-0652-3, INSPEC Accession Number: 12387415
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jacob, S.G., Geetha Ramani, R., Nancy, P. (2013). Discovery of Knowledge Patterns in Lymphographic Clinical Data through Data Mining Methods and Techniques. In: Meghanathan, N., Nagamalai, D., Chaki, N. (eds) Advances in Computing and Information Technology. Advances in Intelligent Systems and Computing, vol 178. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31600-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-31600-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31599-2
Online ISBN: 978-3-642-31600-5
eBook Packages: EngineeringEngineering (R0)