Advertisement

Improving Text Retrieval in Medical Collections Through Automatic Categorization

  • Rodrigo F. Vale
  • Berthier A. Ribeiro-Neto
  • Luciano R. S. de Lima
  • Alberto H. F. Laender
  • Hermes R. F. Junior
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2857)

Abstract

A current and important research issue is the retrieval of relevant medical information. In fact, while the medical knowledge expands at a rate never observed before, its diffusion is slow. One of the main reasons is the difficulty in locating the relevant information in the modern and large medical text collections of today. In this work, we introduce a framework, based on Bayesian networks, that allows combining information derived from the text of the medical documents with information on the diseases related to these documents (obtained from an automatic categorization method). This leads to a new ranking formula which we evaluate using a medical reference collection, the OHSUMED collection. Our results indicate that this combination of evidences might yield considerable gains in retrieval performance. When the queries are strongly related to diseases, these gains might be as high as 84%. This shows that information generated by an automatic categorization procedure can be used effectively to improve the quality of the answers provided by an information retrieval (IR) system specialized in the medical domain.

Keywords

Information Retrieval Bayesian Network Retrieval Performance Automatic Categorization Text Retrieval 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apte, C., Damerau, F., Weiss, S.M.: Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems 12(3), 233–251 (1994)CrossRefGoogle Scholar
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley Longman, Harlow (1999)Google Scholar
  3. 3.
    Broglio, J., Callan, J.P., Croft, W.B., Nachbar, D.W.: Document retrieval and routing using the inquery system. In: Proceedings of the Third Text Retrieval Conference - TREC-3, National Institute of Standards and Technology, Gaithersburg, Maryland, USA, pp. 241–256 (1995) (NIST Special Publication 500-225)Google Scholar
  4. 4.
    Callan, J.: Document filtering with inference networks. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 262–269 (1996)Google Scholar
  5. 5.
    Cohen, W.W., Singer, Y.: Context-Sensitive Learning Methods for Text Categorization. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 307–315 (1996)Google Scholar
  6. 6.
    Haines, D., Croft, W.B.: Relevance feedback and inference networks. In: Proceedings of the 16th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, USA, pp. 2–11 (1993)Google Scholar
  7. 7.
    Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 192–201 (1994)Google Scholar
  8. 8.
    Lam, W., Ruiz, M., Srinivasan, P.: Automatic Text Categorization and its Application to Text Retrieval. IEEE Transactions on Knowledge and Data Engineering 11(6), 865–879 (1999)CrossRefGoogle Scholar
  9. 9.
    Larkey, L.S., Croft, W.B.: Automatic assignment of ICD9 codes to discharge summaries. Technical report, Center for Intelligent Information Retrieval at University of Massachusetts, Amherst, Massachusetts (1995)Google Scholar
  10. 10.
    Larkey, L.S., Croft, W.B.: Combining Classifiers in Text Categorization. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 289–297 (1996)Google Scholar
  11. 11.
    Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 298–306 (1996)Google Scholar
  12. 12.
    Lima, L.R.S., Laender, A.H.F., Ribeiro-Neto, B.: A Hierarchical Approach to the Automatic Categorization of Medical Documents. In: Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, Bethesda, Maryland, USA, pp. 132–139 (1998)Google Scholar
  13. 13.
    Pearl, J.: Probabilistic Reasoning in Intellingent System: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)Google Scholar
  14. 14.
    Pestotnik, S.L.: Medical informatics: Meeting the information challenges of a changing health care system. Journal of Informed Pharmacotherapy 2(1) (2000)Google Scholar
  15. 15.
    Ribeiro-Neto, B., Laender, A.H.F., Lima, L.R.S.: An experimental study in automatically categorizing medical documents. Journal of the American Society for Information Science and Technology 52(5), 391–401 (2001)CrossRefGoogle Scholar
  16. 16.
    Ribeiro-Neto, B., Muntz, R.: A Belief Network Model for IR. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 253–260 (1996)Google Scholar
  17. 17.
    Ribeiro-Neto, B., Silva, I., Muntz, R.: Bayesian network models for information retrieval. In: Crestani, F., Pasi, G. (eds.) Soft Computing in Information Retrieval, pp. 259–291. Physica-Verlag, Heidelberg (2000)Google Scholar
  18. 18.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic retrieval. Information Processing & Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  19. 19.
    Satomura, Y., Amaral, M.B.: Automated diagnostic indexing by natural language processing. Medical Informatics 17(3), 149–163 (1992)CrossRefGoogle Scholar
  20. 20.
    Silva, I., Ribeiro-Neto, B., Calado, P., Moura, E., Ziviani, N.: Link-based and Content-based Evidential Information in a Belief Network Model. In: Proceedings of the 23rd Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103, Athens, Greece (2000)Google Scholar
  21. 21.
    Turtle, H., Croft, W.B.: Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems 9(3), 187–222 (1991)CrossRefGoogle Scholar
  22. 22.
    Yang, Y.: Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval. In: Proceedings of the 17th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 13–22 (1994)Google Scholar
  23. 23.
    Yang, Y., Chute, C.: An Application of Least Squares Fit Mapping to Text Information Retrieval. In: Proceedings of the 16th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 281–290 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Rodrigo F. Vale
    • 1
  • Berthier A. Ribeiro-Neto
    • 2
  • Luciano R. S. de Lima
    • 3
  • Alberto H. F. Laender
    • 2
  • Hermes R. F. Junior
    • 1
  1. 1.Akwan Information TechnologiesBelo HorizonteBrazil
  2. 2.Computer Science DepartmentFederal University of Minas GeraisBelo HorizonteBrazil
  3. 3.Medical Informatics GroupSarah Hospital NetworkBelo HorizonteBrazil

Personalised recommendations