Evaluating Frequent-Set Mining Approaches in Machine-Learning Problems with Several Attributes: A Case Study in Healthcare

Kaushik, Shruti; Choudhury, Abhinav; Dasgupta, Nataraj; Natarajan, Sayee; Pickett, Larry A.; Dutt, Varun

doi:10.1007/978-3-319-96136-1_20

Shruti Kaushik¹³,
Abhinav Choudhury¹³,
Nataraj Dasgupta¹⁴,
Sayee Natarajan¹⁴,
Larry A. Pickett¹⁴ &
…
Varun Dutt¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10934))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

1829 Accesses
4 Citations

Abstract

Often datasets may involve thousands of attributes, and it is important to discover relevant features for machine-learning (ML) algorithms. Here, approaches that reduce or select features may become difficult to apply, and feature discovery may be made using frequent-set mining approaches. In this paper, we use the Apriori frequent-set mining approach to discover the most frequently occurring features from among thousands of features in datasets where patients consume pain medications. We use these frequently occurring features along with other demographic and clinical features in specific ML algorithms and compare algorithms’ accuracies for classifying the type and frequency of consumption of pain medications. Results revealed that Apriori implementation for features discovery improved the performance of a large majority of ML algorithms and decision tree performed better among many ML algorithms. The main implication of our analyses is in helping the machine-learning community solves problems involving thousands of attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Due to a non-disclosure agreement, we have anonymized the actual names of these medications.

References

Seeja, K.R., Zareapoor, M.: FraudMiner: a novel credit card fraud detection model based on frequent itemset mining. Sci. World J. (2014)
Google Scholar
Oswal, S., Shah, G., Student, P.G.: A study on data mining techniques on healthcare issues and its uses and application on health sector. Int. J. Eng. Sci. 7, 13536 (2017)
Google Scholar
Parikh, R.B., Obermeyer, Z., Bates, D.W.: Making Predictive Analytics a Routine Part of patient Care. https://hbr.org/2016/04/making-predictive-analytics-a-routine-part-of-patient-care
Winters-Miner, L.A.: Seven Ways Predictive Analytics Can Improve Healthcare. Elsevier, New York (2014)
Google Scholar
Kornegay, C., Segal, J.B.: Selection of Data Sources. Developing a Protocol for Observational Comparative Effectiveness Research: A User’s Guide, pp. 109–28. Agency for Healthcare Research and Quality (US), Rockville, MD (2013)
Google Scholar
Song, F., Guo, Z., Mei, D.: Feature selection using principal component analysis. In: International Conference on IEEE System Science, Engineering Design and Manufacturing Informatization (ICSEM), vol. 1, pp. 27–30 (2010)
Google Scholar
Surendiran, B., Vadivel, A.: Feature selection using stepwise ANOVA discriminant analysis for mammogram mass classification. Int. J. Recent Trends Eng. Technol. 3(2), 55–57 (2010)
Google Scholar
Shlens, J.: A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100 (2014)
Kim, H.Y.: Analysis of variance (ANOVA) comparing means of more than two groups. Restor. Dent. Endod. 39(1), 74–77 (2014)
Article Google Scholar
Kumar, M., Rath, N.K., Swain, A., Rath, S.K.: Feature selection and classification of microarray data using MapReduce based ANOVA and K-Nearest Neighbor. Procedia Comput. Sci. 54, 301–310 (2015)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 3 (2014)
Article Google Scholar
Sharma, R., Singh, S.N., Khatri, S.: Medical data mining using different classification and clustering techniques: a critical survey. In: IEEE Second International Conference on Computational Intelligence & Communication Technology (CICT), pp. 687–691 (2016)
Google Scholar
Yadav, C., Wang, S., Kumar, M.: An approach to improve apriori algorithm based on association rule mining. In: IEEE Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1–9 (2013)
Google Scholar
Ilayaraja, M., Meyyappan, T.: Efficient data mining method to predict the risk of heart diseases through frequent itemsets. Procedia Comput. Sci. 70, 586–592 (2015)
Article Google Scholar
Rani, G.U., Prakash, R.V., Govardhan, A.: Mining multilevel association rule using pincer search algorithm. Comput. Sci. 2(5) (2013)
Google Scholar
Narvekar, M., Syed, S.F.: An optimized algorithm for association rule mining using FP tree. Int. Conf. Adv. Comput. Technol. Appl. 45, 101–110 (2015)
Google Scholar
Tsumoto, S.: Mining diagnostic taxonomy and diagnostic rules for multi-stage medical diagnosis from hospital clinical data. In: IEEE International Conference on Granular Computing. GRC 2007, p. 611 (2007)
Google Scholar
Kaushik, S., Choudhury, A., Mallik, K., Moid, A., Dutt, V.: Applying data mining to healthcare: a study of social network of physicians and patient journeys. Machine Learning and Data Mining in Pattern Recognition. LNCS (LNAI), vol. 9729, pp. 599–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41920-6_47
Chapter Google Scholar
Vembandasamy, K., Sasipriya, R., Deepa, E.: Heart diseases detection using Naive Bayes Algorithm. IJISET-Int. J. Innov. Sci. Eng. Technol. 2, 441–444 (2015)
Google Scholar
Gulia, A., Vohra, R., Rani, P.: Liver patient classification using intelligent techniques. (IJCSIT) Int. J. Comput. Sci. Inf. Technol. 5, 5110–5115 (2014)
Google Scholar
Parveen, A.N., Inbarani, H.H., Kumar, E.S.: Performance analysis of unsupervised feature selection methods. In: Computing, Communication and Applications (ICCCA), pp. 1–7. IEEE (2012)
Google Scholar
Danielson, E.: Health research data for the real world: the MarketScan® Databases. Truven Health Analytics, Ann Arbor (2014)
Google Scholar
KDB+ 3.4: Computer software. Kx Systems, Palo Alto (2016)
Google Scholar
World Health Organization: Manual of the International Classification of Diseases, Injuries, and Causes of Death, Ninth Revision, Geneva (1977). https://simba.isr.umich.edu/restricted/docs/Mortality/icd_09_codes.pdf
Sayad, S.: ZeroR Classifier. http://chem-eng.utoronto.ca/~datamining/dmc/zeror.htm
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Mitchell, T.: Decision tree learning. Mach. Learn. 414, 52–78 (1997)
Google Scholar
Witten, I., Frank, E., Hall, M.: Data Mining, pp. 102–103. Morgan Kaufmann, Burlington (2010). ISBN 978-0-12-374856-0
Google Scholar
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the Tenth international Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., pp. 399–406 (1994)
Google Scholar
Peng, C.Y.J., Lee, K.L., Ingersoll, G.M.: An introduction to logistic regression analysis and reporting. J. Educ. Res. 96(1), 3–14 (2002)
Article Google Scholar
Brownlee, J.: Logistic Regression for Machine Learning. https://machinelearningmastery.com/logistic-regression-for-machine-learning
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Article Google Scholar
Ting, K.M.: Precision and recall. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Machine Learning, p. 781. Springer, New York (2011). https://doi.org/10.1007/978-1-4899-7993-3_5050-2
Chapter Google Scholar
Dezyre: Top 10 Machine Learning Algorithms. https://www.dezyre.com/article/top-10-machine-learning-algorithms/202
Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In: Knowledge Discovery in Databases (1991)
Google Scholar
Janecek, A., Gansterer, W., Demel, M., Ecker, G.: On the relationship between feature selection and classification accuracy. In: New Challenges for Feature Selection in Data Mining and Knowledge Discovery, pp. 90–105 (2008)
Google Scholar
Motoda, H., Liu, H.: Feature selection, extraction and construction. In: Communication of IICM (Institute of Information and Computing Machinery, Taiwan), vol. 5, pp. 67–72 (2002)
Google Scholar
Pearl, J.: Entropy, information and rational decisions. Technical report. Cognitive Systems Laboratory, University of California, Los Angeles (1978)
Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence. A modern approach, vol. 25, p. 27. Prentice-Hall, Egnlewood Cliffs (1995)
MATH Google Scholar
Bayes, M., Price, M.: An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, FRS communicated by Mr. Price, in a letter to John Canton, AMFRS. Philos. Trans. (1683–1775) 53, 370–418 (1963)
Google Scholar
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Article MathSciNet Google Scholar
Wickens, T.D.: Elementary Signal Detection Theory. Oxford University Press, Oxford (2002)
Google Scholar
Jiang, F., Jiang, Y., Zhi, H., Dong, Y., Li, H., Ma, S., Wang, Y., Dong, Q., Shen, H., Wang, Y.: Artificial intelligence in healthcare: past, present and future. Stroke Vasc. Neurol. SVN 2, 230–243 (2017)
Article Google Scholar
Rajeswari, K., Vaithiyanathan, V., Pede, S.V.: Feature selection for classification in medical data mining. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 2(2), 492–497 (2013)
Google Scholar

Download references

Acknowledgement

The project was supported by grants (awards: #IITM/CONS/PPLP/VD/03 and # IITM/CONS/RxDSI/VD/16) to Varun Dutt.

Author information

Authors and Affiliations

Applied Cognitive Science Laboratory, Indian Institute of Technology Mandi, Mandi, 175005, Himachal Pradesh, India
Shruti Kaushik, Abhinav Choudhury & Varun Dutt
RxDataScience, Inc., New York, 27709, USA
Nataraj Dasgupta, Sayee Natarajan & Larry A. Pickett

Authors

Shruti Kaushik
View author publications
You can also search for this author in PubMed Google Scholar
Abhinav Choudhury
View author publications
You can also search for this author in PubMed Google Scholar
Nataraj Dasgupta
View author publications
You can also search for this author in PubMed Google Scholar
Sayee Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Larry A. Pickett
View author publications
You can also search for this author in PubMed Google Scholar
Varun Dutt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shruti Kaushik , Abhinav Choudhury or Varun Dutt .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaushik, S., Choudhury, A., Dasgupta, N., Natarajan, S., Pickett, L.A., Dutt, V. (2018). Evaluating Frequent-Set Mining Approaches in Machine-Learning Problems with Several Attributes: A Case Study in Healthcare. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10934. Springer, Cham. https://doi.org/10.1007/978-3-319-96136-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-96136-1_20
Published: 08 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96135-4
Online ISBN: 978-3-319-96136-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics