Skip to main content

Evaluating Frequent-Set Mining Approaches in Machine-Learning Problems with Several Attributes: A Case Study in Healthcare

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2018)

Abstract

Often datasets may involve thousands of attributes, and it is important to discover relevant features for machine-learning (ML) algorithms. Here, approaches that reduce or select features may become difficult to apply, and feature discovery may be made using frequent-set mining approaches. In this paper, we use the Apriori frequent-set mining approach to discover the most frequently occurring features from among thousands of features in datasets where patients consume pain medications. We use these frequently occurring features along with other demographic and clinical features in specific ML algorithms and compare algorithms’ accuracies for classifying the type and frequency of consumption of pain medications. Results revealed that Apriori implementation for features discovery improved the performance of a large majority of ML algorithms and decision tree performed better among many ML algorithms. The main implication of our analyses is in helping the machine-learning community solves problems involving thousands of attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Due to a non-disclosure agreement, we have anonymized the actual names of these medications.

References

  1. Seeja, K.R., Zareapoor, M.: FraudMiner: a novel credit card fraud detection model based on frequent itemset mining. Sci. World J. (2014)

    Google Scholar 

  2. Oswal, S., Shah, G., Student, P.G.: A study on data mining techniques on healthcare issues and its uses and application on health sector. Int. J. Eng. Sci. 7, 13536 (2017)

    Google Scholar 

  3. Parikh, R.B., Obermeyer, Z., Bates, D.W.: Making Predictive Analytics a Routine Part of patient Care. https://hbr.org/2016/04/making-predictive-analytics-a-routine-part-of-patient-care

  4. Winters-Miner, L.A.: Seven Ways Predictive Analytics Can Improve Healthcare. Elsevier, New York (2014)

    Google Scholar 

  5. Kornegay, C., Segal, J.B.: Selection of Data Sources. Developing a Protocol for Observational Comparative Effectiveness Research: A User’s Guide, pp. 109–28. Agency for Healthcare Research and Quality (US), Rockville, MD (2013)

    Google Scholar 

  6. Song, F., Guo, Z., Mei, D.: Feature selection using principal component analysis. In: International Conference on IEEE System Science, Engineering Design and Manufacturing Informatization (ICSEM), vol. 1, pp. 27–30 (2010)

    Google Scholar 

  7. Surendiran, B., Vadivel, A.: Feature selection using stepwise ANOVA discriminant analysis for mammogram mass classification. Int. J. Recent Trends Eng. Technol. 3(2), 55–57 (2010)

    Google Scholar 

  8. Shlens, J.: A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100 (2014)

  9. Kim, H.Y.: Analysis of variance (ANOVA) comparing means of more than two groups. Restor. Dent. Endod. 39(1), 74–77 (2014)

    Article  Google Scholar 

  10. Kumar, M., Rath, N.K., Swain, A., Rath, S.K.: Feature selection and classification of microarray data using MapReduce based ANOVA and K-Nearest Neighbor. Procedia Comput. Sci. 54, 301–310 (2015)

    Article  Google Scholar 

  11. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  12. Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 3 (2014)

    Article  Google Scholar 

  13. Sharma, R., Singh, S.N., Khatri, S.: Medical data mining using different classification and clustering techniques: a critical survey. In: IEEE Second International Conference on Computational Intelligence & Communication Technology (CICT), pp. 687–691 (2016)

    Google Scholar 

  14. Yadav, C., Wang, S., Kumar, M.: An approach to improve apriori algorithm based on association rule mining. In: IEEE Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1–9 (2013)

    Google Scholar 

  15. Ilayaraja, M., Meyyappan, T.: Efficient data mining method to predict the risk of heart diseases through frequent itemsets. Procedia Comput. Sci. 70, 586–592 (2015)

    Article  Google Scholar 

  16. Rani, G.U., Prakash, R.V., Govardhan, A.: Mining multilevel association rule using pincer search algorithm. Comput. Sci. 2(5) (2013)

    Google Scholar 

  17. Narvekar, M., Syed, S.F.: An optimized algorithm for association rule mining using FP tree. Int. Conf. Adv. Comput. Technol. Appl. 45, 101–110 (2015)

    Google Scholar 

  18. Tsumoto, S.: Mining diagnostic taxonomy and diagnostic rules for multi-stage medical diagnosis from hospital clinical data. In: IEEE International Conference on Granular Computing. GRC 2007, p. 611 (2007)

    Google Scholar 

  19. Kaushik, S., Choudhury, A., Mallik, K., Moid, A., Dutt, V.: Applying data mining to healthcare: a study of social network of physicians and patient journeys. Machine Learning and Data Mining in Pattern Recognition. LNCS (LNAI), vol. 9729, pp. 599–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41920-6_47

    Chapter  Google Scholar 

  20. Vembandasamy, K., Sasipriya, R., Deepa, E.: Heart diseases detection using Naive Bayes Algorithm. IJISET-Int. J. Innov. Sci. Eng. Technol. 2, 441–444 (2015)

    Google Scholar 

  21. Gulia, A., Vohra, R., Rani, P.: Liver patient classification using intelligent techniques. (IJCSIT) Int. J. Comput. Sci. Inf. Technol. 5, 5110–5115 (2014)

    Google Scholar 

  22. Parveen, A.N., Inbarani, H.H., Kumar, E.S.: Performance analysis of unsupervised feature selection methods. In: Computing, Communication and Applications (ICCCA), pp. 1–7. IEEE (2012)

    Google Scholar 

  23. Danielson, E.: Health research data for the real world: the MarketScan® Databases. Truven Health Analytics, Ann Arbor (2014)

    Google Scholar 

  24. KDB+ 3.4: Computer software. Kx Systems, Palo Alto (2016)

    Google Scholar 

  25. World Health Organization: Manual of the International Classification of Diseases, Injuries, and Causes of Death, Ninth Revision, Geneva (1977). https://simba.isr.umich.edu/restricted/docs/Mortality/icd_09_codes.pdf

  26. Sayad, S.: ZeroR Classifier. http://chem-eng.utoronto.ca/~datamining/dmc/zeror.htm

  27. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  28. Mitchell, T.: Decision tree learning. Mach. Learn. 414, 52–78 (1997)

    Google Scholar 

  29. Witten, I., Frank, E., Hall, M.: Data Mining, pp. 102–103. Morgan Kaufmann, Burlington (2010). ISBN 978-0-12-374856-0

    Google Scholar 

  30. Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the Tenth international Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., pp. 399–406 (1994)

    Google Scholar 

  31. Peng, C.Y.J., Lee, K.L., Ingersoll, G.M.: An introduction to logistic regression analysis and reporting. J. Educ. Res. 96(1), 3–14 (2002)

    Article  Google Scholar 

  32. Brownlee, J.: Logistic Regression for Machine Learning. https://machinelearningmastery.com/logistic-regression-for-machine-learning

  33. Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)

    Article  Google Scholar 

  34. Ting, K.M.: Precision and recall. In: Liu, L., Ă–zsu, M. (eds.) Encyclopedia of Machine Learning, p. 781. Springer, New York (2011). https://doi.org/10.1007/978-1-4899-7993-3_5050-2

    Chapter  Google Scholar 

  35. Dezyre: Top 10 Machine Learning Algorithms. https://www.dezyre.com/article/top-10-machine-learning-algorithms/202

  36. Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In: Knowledge Discovery in Databases (1991)

    Google Scholar 

  37. Janecek, A., Gansterer, W., Demel, M., Ecker, G.: On the relationship between feature selection and classification accuracy. In: New Challenges for Feature Selection in Data Mining and Knowledge Discovery, pp. 90–105 (2008)

    Google Scholar 

  38. Motoda, H., Liu, H.: Feature selection, extraction and construction. In: Communication of IICM (Institute of Information and Computing Machinery, Taiwan), vol. 5, pp. 67–72 (2002)

    Google Scholar 

  39. Pearl, J.: Entropy, information and rational decisions. Technical report. Cognitive Systems Laboratory, University of California, Los Angeles (1978)

    Google Scholar 

  40. Russell, S., Norvig, P.: Artificial Intelligence. A modern approach, vol. 25, p. 27. Prentice-Hall, Egnlewood Cliffs (1995)

    MATH  Google Scholar 

  41. Bayes, M., Price, M.: An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, FRS communicated by Mr. Price, in a letter to John Canton, AMFRS. Philos. Trans. (1683–1775) 53, 370–418 (1963)

    Google Scholar 

  42. Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)

    Article  MathSciNet  Google Scholar 

  43. Wickens, T.D.: Elementary Signal Detection Theory. Oxford University Press, Oxford (2002)

    Google Scholar 

  44. Jiang, F., Jiang, Y., Zhi, H., Dong, Y., Li, H., Ma, S., Wang, Y., Dong, Q., Shen, H., Wang, Y.: Artificial intelligence in healthcare: past, present and future. Stroke Vasc. Neurol. SVN 2, 230–243 (2017)

    Article  Google Scholar 

  45. Rajeswari, K., Vaithiyanathan, V., Pede, S.V.: Feature selection for classification in medical data mining. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 2(2), 492–497 (2013)

    Google Scholar 

Download references

Acknowledgement

The project was supported by grants (awards: #IITM/CONS/PPLP/VD/03 and # IITM/CONS/RxDSI/VD/16) to Varun Dutt.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shruti Kaushik , Abhinav Choudhury or Varun Dutt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kaushik, S., Choudhury, A., Dasgupta, N., Natarajan, S., Pickett, L.A., Dutt, V. (2018). Evaluating Frequent-Set Mining Approaches in Machine-Learning Problems with Several Attributes: A Case Study in Healthcare. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10934. Springer, Cham. https://doi.org/10.1007/978-3-319-96136-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96136-1_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96135-4

  • Online ISBN: 978-3-319-96136-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics