Skip to main content

Feature Engineering and Characterization of Classifiers for Consumer Health Information Search

  • Conference paper
  • First Online:
Text Processing (FIRE 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10478))

Included in the following conference series:

Abstract

Health information search (HIS) is the process of seeking health awareness information on the Internet by health professionals and consumers. Identifying whether the retrieved text is relevant to consumer query and identifying whether it supports, opposes or is neutral to the claim made by the query are challenging tasks in HIS. In this paper, we present our methodology to address these two tasks using supervised learning approaches by performing feature engineering and characterization of classifiers. We have used seven variations including an ensembling approach and hierarchical boosting by incorporating statistical feature selection to different set of features and have determined the best solutions to the two tasks. We have evaluated our methods using CHIS@FIRE2016 data set. We have obtained accuracies of 82.4% for the first challenge using hierarchical boosting and 61.48% for the second using ensembling method. These results are promising when compared with those of other systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://sites.google.com/site/multiperspectivehealthqa/home.

  2. 2.

    http://nlp.stanford.edu/software/tagger.shtml.

  3. 3.

    http://www.java2s.com/Code/Jar/w/Downloadwekajar.htm.

References

  1. Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Distributional semantic representation in health care text classification. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 201–204. CEUR-Working Notes of FIRE (2016)

    Google Scholar 

  2. Castano, J., Berinsky, H., Park, H., Pérez, D., Avila, P., Gambarte, L., Benıtez, S., Luna, D., Campos, F., Zanetti, S.: A machine learning approach to clinical terms normalization. In: ACL 2016, p. 1 (2016)

    Google Scholar 

  3. Cline, R.J., Haynes, K.M.: Consumer health information seeking on the internet: the state of the art. Health Educ. Res. 16(6), 671–692 (2001)

    Article  Google Scholar 

  4. Fiksdal, A.S., Kumbamu, A., Jadhav, A.S., Cocos, C., Nelsen, L.A., Pathak, J., McCormick, J.B.: Evaluating the process of online health information searching: a qualitative approach to exploring consumer perspectives. J. Med. Internet Res. 16(10), e224 (2014)

    Article  Google Scholar 

  5. Goeuriot, L., Jones, G.J., Kelly, L., Müller, H., Zobel, J.: Medical information retrieval: introduction to the special issue. Inf. Retr. 19(1–2), 1–5 (2016)

    Article  Google Scholar 

  6. Hong, Y., de la Cruz, N., Barnas, G., Early, E. Gillis, R.: A query analysis of consumer health information retrieval. In: Proceedings of the AMIA Symposium, p. 1046. American Medical Informatics Association (2002)

    Google Scholar 

  7. Indurthi, V., Oota, S.R.: Relevance detection and argumentation mining in medical domain. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 214–216. CEUR-Working Notes of FIRE (2016)

    Google Scholar 

  8. Jalan, R.S., Priyatam, P.N., Varma, V.: Consumer health information system. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 217–220. CEUR-Working Notes of FIRE (2016)

    Google Scholar 

  9. Janaki Meena, M., Chandran, K.: Naive Bayes text classification with positive features selected by statistical method. In: International Conference on Autonomic Computing and Communications, pp. 28–33. IEEE (2009)

    Google Scholar 

  10. Keselman, A., Browne, A.C., Kaufmann, D.R.: Consumer health information seeking as hypothesis testing. J. Am. Med. Inform. Assoc. 15(4), 484–495 (2008)

    Article  Google Scholar 

  11. Li, Y., Luo, C., Chung, S.M.: Text clustering with feature selection by using statistical data. IEEE Trans. Knowl. Data Eng. 20(5), 641–652 (2008)

    Article  Google Scholar 

  12. Nerkar, B.E., Gharde, S.S.: Best treatment identification for disease using machine learning approach in relation to short text. IOSR J. Comput. Eng. (IOSR-JCE) 16(3), 5–12 (2014)

    Article  Google Scholar 

  13. Sankhavara, J.: Team DA_IICT at consumer health information search@ FIRE2016. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 226–227. CEUR-Working Notes of FIRE (2016)

    Google Scholar 

  14. Sarkar, K., Das, D., Banerjee, I., Kumari, M., Biswas, P.: JU_KS_Group@ FIRE 2016: consumer health information search. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 208–213. CEUR-Working Notes of FIRE (2016)

    Google Scholar 

  15. Sillence, E., Briggs, P., Fishwick, L., Harris, P.: Trust and mistrust of online health sites. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 663–670. ACM (2004)

    Google Scholar 

  16. Sinha, M., Mannarswamy, S., Roy, S.: CHIS@FIRE: overview of the CHIS track on consumer health information search. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, Kolkata, India, 7–10 December 2016, pp. 193–196. CEUR Workshop Proceedings. CEUR-WS.org (2016)

    Google Scholar 

  17. Soldaini, L., Yates, A., Yom-Tov, E., Frieder, O., Goharian, N.: Enhancing web search in the medical domain via query clarification. Inf. Retr. J. 19(1–2), 149–173 (2016)

    Article  Google Scholar 

  18. Spink, A., Yang, Y., Jansen, J., Nykanen, P., Lorence, D.P., Ozmutlu, S., Ozmutlu, H.C.: A study of medical and health queries to web search engines. Health Inf. Libr. J. 21(1), 44–51 (2004)

    Article  Google Scholar 

  19. Suresh Kumar, S., Naveen, L.: Relevance and support calculation for health information. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 205–207. CEUR-Working Notes of FIRE (2016)

    Google Scholar 

  20. Thenmozhi, D., Aravindan, C.: An automatic and clause based approach to learn relations for ontologies. Comput. J. 59(6), 889–907 (2016)

    Article  Google Scholar 

  21. Thenmozhi, D., Aravindan, C.: Paraphrase identification by using clause based similarity features and machine translation metrics. Comput. J. 59(9), 1289–1302 (2016)

    Article  Google Scholar 

  22. Thenmozhi, D., Mirunalini, P., Aravindan, C.: Decision tree approach for consumer health information search. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 221–225. CEUR-Working Notes of FIRE (2016)

    Google Scholar 

  23. Toms, E.G., Latter, C.: How consumers search for health information. Health Inform. J. 13(3), 223–235 (2007)

    Article  Google Scholar 

  24. Veena, P.V., Remmiya Devi, G., Anand Kumar, M., Soman, K.P.: AMRITA_CEN@ FIRE 2016: consumer health information search using keyword and word embedding features. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 197–200. CEUR-Working Notes of FIRE (2016)

    Google Scholar 

  25. Yang, H., Gonlves, T.: UEVORA@ 2016 FIRE CHIS. In: International Conference on Forum of Information Retrieval and Evaluation, pp. 228–232. CEUR-Working Notes of FIRE (2016)

    Google Scholar 

  26. Yunzhi, C., Huijuan, L., Shapiro, L., Travillian, R.S., Lanjuan, L.: An approach to semantic query expansion system based on hepatitis ontology. J. Biol. Res.-Thessaloniki 23(1), 11 (2016)

    Article  Google Scholar 

  27. Zeng, Q., Kogan, S., Ash, N., Greenes, R., Boxwala, A.: Characteristics of consumer terminology for health information retrieval. Methods Inf. Med. 41(4), 289–298 (2002)

    Google Scholar 

  28. Zeng, Q.T.: Assisting consumer health information retrieval with query recommendations. J. Am. Med. Inform. Assoc. 13(1), 80–90 (2006)

    Article  Google Scholar 

  29. Zhang, Y., Cui, H., Burkell, J., Mercer, R.E.: A machine learning approach for rating the quality of depression treatment web pages. In: iConference 2014 Proceedings (2014)

    Google Scholar 

  30. Zhang, Y., Wang, P., Heaton, A., Winkler, H.: Health information searching behavior in MedlinePlus and the impact of tasks. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 641–650. ACM (2012)

    Google Scholar 

Download references

Acknowledgments

We thank the management of SSN Institutions for funding the High Performance Computing (HPC) lab where this work is being carried out. We also thank the CHIS organizers for the data sets and the anonymous reviewers for their constructive comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Thenmozhi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thenmozhi, D., Mirunalini, P., Aravindan, C. (2018). Feature Engineering and Characterization of Classifiers for Consumer Health Information Search. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds) Text Processing. FIRE 2016. Lecture Notes in Computer Science(), vol 10478. Springer, Cham. https://doi.org/10.1007/978-3-319-73606-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73606-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73605-1

  • Online ISBN: 978-3-319-73606-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics