Abstract
Health information search (HIS) is the process of seeking health awareness information on the Internet by health professionals and consumers. Identifying whether the retrieved text is relevant to consumer query and identifying whether it supports, opposes or is neutral to the claim made by the query are challenging tasks in HIS. In this paper, we present our methodology to address these two tasks using supervised learning approaches by performing feature engineering and characterization of classifiers. We have used seven variations including an ensembling approach and hierarchical boosting by incorporating statistical feature selection to different set of features and have determined the best solutions to the two tasks. We have evaluated our methods using CHIS@FIRE2016 data set. We have obtained accuracies of 82.4% for the first challenge using hierarchical boosting and 61.48% for the second using ensembling method. These results are promising when compared with those of other systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Distributional semantic representation in health care text classification. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 201–204. CEUR-Working Notes of FIRE (2016)
Castano, J., Berinsky, H., Park, H., Pérez, D., Avila, P., Gambarte, L., Benıtez, S., Luna, D., Campos, F., Zanetti, S.: A machine learning approach to clinical terms normalization. In: ACL 2016, p. 1 (2016)
Cline, R.J., Haynes, K.M.: Consumer health information seeking on the internet: the state of the art. Health Educ. Res. 16(6), 671–692 (2001)
Fiksdal, A.S., Kumbamu, A., Jadhav, A.S., Cocos, C., Nelsen, L.A., Pathak, J., McCormick, J.B.: Evaluating the process of online health information searching: a qualitative approach to exploring consumer perspectives. J. Med. Internet Res. 16(10), e224 (2014)
Goeuriot, L., Jones, G.J., Kelly, L., Müller, H., Zobel, J.: Medical information retrieval: introduction to the special issue. Inf. Retr. 19(1–2), 1–5 (2016)
Hong, Y., de la Cruz, N., Barnas, G., Early, E. Gillis, R.: A query analysis of consumer health information retrieval. In: Proceedings of the AMIA Symposium, p. 1046. American Medical Informatics Association (2002)
Indurthi, V., Oota, S.R.: Relevance detection and argumentation mining in medical domain. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 214–216. CEUR-Working Notes of FIRE (2016)
Jalan, R.S., Priyatam, P.N., Varma, V.: Consumer health information system. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 217–220. CEUR-Working Notes of FIRE (2016)
Janaki Meena, M., Chandran, K.: Naive Bayes text classification with positive features selected by statistical method. In: International Conference on Autonomic Computing and Communications, pp. 28–33. IEEE (2009)
Keselman, A., Browne, A.C., Kaufmann, D.R.: Consumer health information seeking as hypothesis testing. J. Am. Med. Inform. Assoc. 15(4), 484–495 (2008)
Li, Y., Luo, C., Chung, S.M.: Text clustering with feature selection by using statistical data. IEEE Trans. Knowl. Data Eng. 20(5), 641–652 (2008)
Nerkar, B.E., Gharde, S.S.: Best treatment identification for disease using machine learning approach in relation to short text. IOSR J. Comput. Eng. (IOSR-JCE) 16(3), 5–12 (2014)
Sankhavara, J.: Team DA_IICT at consumer health information search@ FIRE2016. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 226–227. CEUR-Working Notes of FIRE (2016)
Sarkar, K., Das, D., Banerjee, I., Kumari, M., Biswas, P.: JU_KS_Group@ FIRE 2016: consumer health information search. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 208–213. CEUR-Working Notes of FIRE (2016)
Sillence, E., Briggs, P., Fishwick, L., Harris, P.: Trust and mistrust of online health sites. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 663–670. ACM (2004)
Sinha, M., Mannarswamy, S., Roy, S.: CHIS@FIRE: overview of the CHIS track on consumer health information search. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, pp. 193–196. CEUR Workshop Proceedings. CEUR-WS.org (2016)
Soldaini, L., Yates, A., Yom-Tov, E., Frieder, O., Goharian, N.: Enhancing web search in the medical domain via query clarification. Inf. Retr. J. 19(1–2), 149–173 (2016)
Spink, A., Yang, Y., Jansen, J., Nykanen, P., Lorence, D.P., Ozmutlu, S., Ozmutlu, H.C.: A study of medical and health queries to web search engines. Health Inf. Libr. J. 21(1), 44–51 (2004)
Suresh Kumar, S., Naveen, L.: Relevance and support calculation for health information. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 205–207. CEUR-Working Notes of FIRE (2016)
Thenmozhi, D., Aravindan, C.: An automatic and clause based approach to learn relations for ontologies. Comput. J. 59(6), 889–907 (2016)
Thenmozhi, D., Aravindan, C.: Paraphrase identification by using clause based similarity features and machine translation metrics. Comput. J. 59(9), 1289–1302 (2016)
Thenmozhi, D., Mirunalini, P., Aravindan, C.: Decision tree approach for consumer health information search. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 221–225. CEUR-Working Notes of FIRE (2016)
Toms, E.G., Latter, C.: How consumers search for health information. Health Inform. J. 13(3), 223–235 (2007)
Veena, P.V., Remmiya Devi, G., Anand Kumar, M., Soman, K.P.: AMRITA_CEN@ FIRE 2016: consumer health information search using keyword and word embedding features. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 197–200. CEUR-Working Notes of FIRE (2016)
Yang, H., Gonlves, T.: UEVORA@ 2016 FIRE CHIS. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 228–232. CEUR-Working Notes of FIRE (2016)
Yunzhi, C., Huijuan, L., Shapiro, L., Travillian, R.S., Lanjuan, L.: An approach to semantic query expansion system based on hepatitis ontology. J. Biol. Res.-Thessaloniki 23(1), 11 (2016)
Zeng, Q., Kogan, S., Ash, N., Greenes, R., Boxwala, A.: Characteristics of consumer terminology for health information retrieval. Methods Inf. Med. 41(4), 289–298 (2002)
Zeng, Q.T.: Assisting consumer health information retrieval with query recommendations. J. Am. Med. Inform. Assoc. 13(1), 80–90 (2006)
Zhang, Y., Cui, H., Burkell, J., Mercer, R.E.: A machine learning approach for rating the quality of depression treatment web pages. In: iConference 2014 Proceedings (2014)
Zhang, Y., Wang, P., Heaton, A., Winkler, H.: Health information searching behavior in MedlinePlus and the impact of tasks. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 641–650. ACM (2012)
Acknowledgments
We thank the management of SSN Institutions for funding the High Performance Computing (HPC) lab where this work is being carried out. We also thank the CHIS organizers for the data sets and the anonymous reviewers for their constructive comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Thenmozhi, D., Mirunalini, P., Aravindan, C. (2018). Feature Engineering and Characterization of Classifiers for Consumer Health Information Search. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds) Text Processing. FIRE 2016. Lecture Notes in Computer Science(), vol 10478. Springer, Cham. https://doi.org/10.1007/978-3-319-73606-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-73606-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73605-1
Online ISBN: 978-3-319-73606-8
eBook Packages: Computer ScienceComputer Science (R0)