Skip to main content

Discovery of Knowledge Patterns in Lymphographic Clinical Data through Data Mining Methods and Techniques

  • Conference paper
Advances in Computing and Information Technology

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 178))

Abstract

Data mining refers to a process that aims at extracting knowledge by discovering new patterns from large datasets. Classification is a data mining task that generalizes an established, proven structure to apply to new data. A dominant area of modern-day research is the field of medical investigations that include disease prediction and malady categorization. In this paper, our focus is to design an efficient classifier that is trained to classify oncogenic data. The Lymphographic dataset is utilized by means of machine learning techniques to train the classifier using feature selection and classification algorithms. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the performance of sixteen classification algorithms on the Lymphographic dataset that enables the classifier to accurately perform multi-class categorization of medical data. Furthermore our research work also places emphasis on the performance of four feature selection algorithms and their impact on the classification accuracy. Our work asserts the fact that the Random Tree algorithm and the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm. Moreover ReliefF feature selection algorithm gives improved results for Radial Basis Function algorithm improving the classifier accuracy by 1.35%. It is also stated here that the C4.5 algorithm offers more efficient classification since the decision tree size generated is smaller than the Random Tree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kotsiantis, S.B.: Supervised Machine Learning: A Review of Classification Techniques. Informatica (31), 249–268 (2007)

    Google Scholar 

  2. Han, J., Kamber, M.: Data Mining; Concepts and Techniques. Morgan Kaufmann Publishers (2000)

    Google Scholar 

  3. Mitchell, T.M.: Machine Learning. The Mc-Graw-Hill Companies, Inc. (1997)

    Google Scholar 

  4. Nancy, P., Geetha Ramani, R., Jacob, S.G.: Discovery of Gender Classification Rules for Social Network Data using Data Mining Algorithms. In: Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC 2011), Kanyakumari, India, pp. 808–812 (2011a), IEEE Catalog Number: CFP1120J-PRT, ISBN:978-1-61284-766-5

    Google Scholar 

  5. Nancy, P., Geetha Ramani, R.: A Comparison on Performance of Data Mining Algorithms in Classification of Social Network Data. International Journal of Computer Applications 32(8), 47–54 (2011b), doi:10.5120/3927-5555

    Google Scholar 

  6. Tan, Steinbach, Kumar: Introduction to Data Mining (2004)

    Google Scholar 

  7. Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer (2000)

    Google Scholar 

  8. Jacob, S.G., Geetha Ramani, R.: Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data. International Journal of Computer Applications (IJCA) 32(7), 46–53 (2011a), doi:10.5120/3920-5521

    Google Scholar 

  9. Warwick, R., Williams, P.L.: Angiology, ch. 6. Gray’s anatomy. Illustrated by Richard E. M. Moore, 3rd edn., pp. 588–785, Longman, London (1973) (1858)

    Google Scholar 

  10. Guermazi, A., Brice, P., Hennequin, C., Sarfati, E.: Lymphography: an old technique retains its usefulness. Radiographics 23(6), 1541–1558, discussion 1559–1560 (2003)

    Google Scholar 

  11. Chuang, T.-C., Ersoy, O.K., Gelfand, S.B.: Boosting Classification Accuracy With Samples Chosen From A Validation Set. In: ANNIE, Intelligent Engineering Systems through Artificial Neural Networks, St. Louis, MO, pp. 455–461 (2007)

    Google Scholar 

  12. Polat, K., Gunes, S.: A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Systems with Applications: An International Journal 36(2) (2009)

    Google Scholar 

  13. Holte, R.C.: Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63–91 (1993)

    Article  MATH  Google Scholar 

  14. McSherry, D.: Conversational case-based reasoning in medical decision making. Artificial Intelligence Med. 52(2), 59–66 (2011)

    Article  Google Scholar 

  15. SGI - MLC++: Datasets from UCI

    Google Scholar 

  16. Tanagra Data Mining tutorials, http://data-mining-tutorials.blogspot.com/

  17. Garcia-Lopez, F.C., Garcia-Torres, M., Melian, B., Moreno-Perez, J.A., Moreno-Vega, J.M.: Solving feature subset selection problem by a Parallel Scatter Search. European Journal of Operational Research 169(2), 477–489 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  18. Nguyen, H., Franke, K., Petrovic, S.: Optimizing a class of feature selection measures. In: Proceedings of the NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Sub modularity, Sparsity & Polyhedra (DISCML), Vancouver, Canada (2009)

    Google Scholar 

  19. Jacob, S.G., Geetha Ramani, R., Nancy, P.: Feature Selection and Classification in Breast Cancer Datasets through Data Mining Algorithms. In: Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC 2011), Kanyakumari, India, pp. 661–667 (2011b), IEEE Catalog Number: CFP1120J-PRT, ISBN: 978-1-61284-766-5

    Google Scholar 

  20. Jacob, S.G., Geetha Ramani, R., Nancy, P.: Efficient Classifier for Classification of Hepatitis C Virus Clinical Data through Data Mining Algorithms and Techniques. In: Proceedings of the International Conference on Computer Applications, Pondicherry, India, January 27-31, Techno Forum Group, India (2012), doi:10.73445/ISBN_0768, ISBN: 978-81-920575-8-3, ACM#.dber.imera.10.73445

    Google Scholar 

  21. Dat, T.H., Guan, C.: Feature Selection Based on Fisher Ratio and Mutual Information Analyses for Robust Brain Computer Interface. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2007)

    Google Scholar 

  22. Kohavi, R., Quinlan, R.: Decision Tree Discovery (2009)

    Google Scholar 

  23. Breiman, L., Cuttler, A.: Random Trees, http://www.stat.berkeley.edu/users/breiman/RandomForests/

  24. Korting, T.S.: C4.5 algorithm and Multivariate Decision Trees. Image Processing Division, National Institute for Space Research – INPESão José dos Campos–SP, Brazil (2006)

    Google Scholar 

  25. Chandra, B., Basker, S.: A new approach for classification of patterns having categorical attributes. In: 2011 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Anchorage, AK, October 9-12, pp. 960–964 (2011), doi:10.1109/ICSMC.2011.6083793, ISSN:1062-922X, ISBN: 978-1-4577-0652-3, INSPEC Accession Number: 12387415

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shomona Gracia Jacob .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jacob, S.G., Geetha Ramani, R., Nancy, P. (2013). Discovery of Knowledge Patterns in Lymphographic Clinical Data through Data Mining Methods and Techniques. In: Meghanathan, N., Nagamalai, D., Chaki, N. (eds) Advances in Computing and Information Technology. Advances in Intelligent Systems and Computing, vol 178. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31600-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31600-5_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31599-2

  • Online ISBN: 978-3-642-31600-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics