Abstract
The ever-increasing volume of health online information, coupled with the uneven reliability and quality, may have considerable implications for the citizen. In order to address this issue, we propose to use, within a general or specialised search engine, standards for identifying the reliability of online documents. Standards used are those related to the ethics as well as trustworthiness of websites. In this research, they are detected through the URL names of Web pages by applying machine learning algorithms. According to algorithms used and to principles, our straightforward approach shows up to 93% precision and 91% recall. But a few principles remain difficult to recognize.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Fox, S.: Online Health Search 2006. Most Internet users start at a search engine when looking for health information online. Very few check the source and date of the information they find. Technical report, Pew Internet & American Life Project, Washington DC (2006)
Risk, A., Dzenowagis, J.: Review of internet information quality initiatives. Journal of Medical Internet Research 3(4), e28 (2001)
Boyer, C., Baujard, O., Baujard, V., Aurel, S., Selby, M., Appel, R.: Health on the net automated database of health and medical information. Int. J Med. Inform. 47(1-2), 27–29 (1997)
Wang, Y., Liu, Z.: Automatic detecting indicators for quality of health information on the web. International Journal of Medical Informatics (2006)
Price, S., Hersh, W.: Filtering web pages for quality indicators: an empirical approach to finding high quality consumer health information on the world wide web. In: AMIA 1999, pp. 911–915 (1999)
Vinot, R., Grabar, N., Valette, M.: Application d’algorithmes de classification automatique pour la détection des contenus racistes sur l’internet. In: TALN, pp. 257–284 (2003)
Wang, Y.: Automatic recognition of text difficulty from consumers health information. In: IEEE. (ed.) Computer-Based Medical Systems (2006)
Gaudinat, A., Grabar, N., Boyer, C.: Machine learning approach for automatic quality criteria detection of health webpages. In: McCray, A. (ed.) MEDINFO 2007, Brisbane, Australia (to appear, 2007)
Williams, K., Calvo, R.A.: A framework for text categorization. In: 7th Australian document computing symposium (2002)
Salton, G.: Developments in automatic text retrieval. Science 253, 974–979 (1991)
Koller, D., Sahami, M.: Toward optimal feature selection. In: International Conference on Machine Learning, pp. 284–292 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gaudinat, A., Grabar, N., Boyer, C. (2007). Automatic Retrieval of Web Pages with Standards of Ethics and Trustworthiness Within a Medical Portal: What a Page Name Tells Us. In: Bellazzi, R., Abu-Hanna, A., Hunter, J. (eds) Artificial Intelligence in Medicine. AIME 2007. Lecture Notes in Computer Science(), vol 4594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73599-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-73599-1_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73598-4
Online ISBN: 978-3-540-73599-1
eBook Packages: Computer ScienceComputer Science (R0)