Identifying Disease Diagnosis Factors by Proximity-Based Mining of Medical Texts

Liu, Rey-Long; Tung, Shu-Yu; Lu, Yun-Ling

doi:10.1007/978-3-642-20042-7_18

Identifying Disease Diagnosis Factors by Proximity-Based Mining of Medical Texts

Rey-Long Liu²²,
Shu-Yu Tung²³ &
Yun-Ling Lu²³

Conference paper

1447 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6592))

Abstract

Diagnosis of diseases requires a large amount of discriminating diagnosis factors, including the risk factors, symptoms, and signs of the diseases, as well as the examinations and tests to detect the signs of the diseases. Relationships between individual diseases and the discriminating diagnosis factors may thus form a diagnosis knowledge map, which may even evolve when new medical findings are produced. However, manual construction and maintenance of a diagnosis knowledge map are both costly and difficult, and state-of-the-art text mining techniques have difficulties in identifying the diagnosis factors from medical texts. In this paper, we present a novel text mining technique PDFI (Proximity-based Diagnosis Factors Identifier) that improves various kinds of identification techniques by encoding term proximity contexts to them. Empirical evaluation is conducted on a broad range of diseases that have texts describing their symptoms and diagnosis in MedlinePlus, which aims at providing reliable and up-to-date healthcare information for diseases. The results show that PDFI significantly improves a state-of-the-art identifier in ranking candidate diagnosis factors for the diseases. The contribution is of practical significance in developing an intelligent system to provide disease diagnosis support to healthcare consumers and professionals.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cummins, R., O’riordan, C.: Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, USA, pp. 251–258 (2009)
Google Scholar
Domedel-Puig, N., Wernisch, L.: Applying GIFT, a Gene Interactions Finder in Text, to Fly Literature. Bioinformatics 21, 3582–3583 (2005)
Article Google Scholar
Gerani, S., Carman, M.J., Crestani, F.: Proximity-Based Opinion Retrieval. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, pp. 403–410 (2010)
Google Scholar
Himmel, W., Reincke, U., Michelmann, H.W.: Text Mining and Natural Lan-guage Processing Approaches for Automatic Categorization of Lay Requests to Web-Based Expert Forums. Journal of Medical Internet Research 1(3), e25 (2009)
Article Google Scholar
Kim, S., Yoon, J., Yang, J.: Kernel Approaches for Genic Interaction Extraction. Bioinformatics 24, 118–126 (2008)
Article Google Scholar
Liu, R.-L.: Text Classification for Healthcare Information Support. In: Proceedings of the 20th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems, pp. 44–53. Kyoto University, Kyoto (2007)
Google Scholar
Mladeniá, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature Selection Using Linear Classifier Weights: Interaction with Classification Models. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 234–241 (2004)
Google Scholar
Ono, T., Hishigaki, H., Tanigami, A., Takagi, T.: Automated Extraction of Information on Protein-Protein Interactions from the Biological Literature. Bioinformatics 17, 155–161 (2001)
Article Google Scholar
Özgür, A., Vu, T., Erkan, G., Radev, D.R.: Identifying Gene-Disease Associations Using Centrality on a Literature Mined Gene-Interaction Network. Bioinformatics 24, i277–i285 (2008)
Article Google Scholar
Suebnukarn, S., Haddawy, P.: Modeling individual and collaborative problem-solving in medical problem-based learning. User Modeling and User-Adapted Interaction 16, 211–248 (2006)
Article Google Scholar
Svore, K.M., Kanani, P.H., Khan, N.: How Good is a Span of Terms? Exploiting Proximity to Improve Web Retrieval. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, pp. 154–161 (2010)
Google Scholar
Takeuchi, K., Collier, N.: Bio-medical Entity Extraction Using Support Vector Machines. Artificial Intelligence in Medicine 33, 125–137 (2005)
Article Google Scholar
Temkin, J.M., Gilder, M.R.: Extraction of Protein Interaction Information from Unstructured Text Using a Context-Free Grammar. Bioinformatics 19, 2046–2053 (2003)
Article Google Scholar
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning, Nashville, Tennessee, pp. 412–420 (1997)
Google Scholar
Zhao, J., Yun, Y.: A Proximity Language Model for Information Retrieval. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, USA, pp. 291–298 (2009)
Google Scholar
Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C.: Recognizing Names in Biomedical Texts: a Machine Learning Approach. Bioinformatics 20, 1178–1190 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan
Rey-Long Liu
Winbond Electronics Corporation, HsinChu, Taiwan
Shu-Yu Tung & Yun-Ling Lu

Authors

Rey-Long Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Yu Tung
View author publications
You can also search for this author in PubMed Google Scholar
Yun-Ling Lu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wroclaw University of Technology, 50-370, Wroclaw, Poland
Ngoc Thanh Nguyen
Department of Computer Engineering, Yeungnam University, Dae-Dong, 712-749, Gyeungsan, Korea
Chong-Gun Kim
Institute of Informatics, Automation and Robotics, Wroclaw University of Technology, 50-370, Wrocław, Poland
Adam Janiak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, RL., Tung, SY., Lu, YL. (2011). Identifying Disease Diagnosis Factors by Proximity-Based Mining of Medical Texts. In: Nguyen, N.T., Kim, CG., Janiak, A. (eds) Intelligent Information and Database Systems. ACIIDS 2011. Lecture Notes in Computer Science(), vol 6592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20042-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-20042-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20041-0
Online ISBN: 978-3-642-20042-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics