Skip to main content

Abbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis

  • Conference paper
Natural Language and Information Systems (NLDB 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5039))

Abstract

Abbreviations are very common and are widely used in both written and spoken language. However, they are not always explicitly defined and in many cases they are ambiguous. In this research, we present a process that attempts to solve the problem of abbreviation ambiguity. Various features have been explored, including context-related methods and statistical methods. The application domain is Jewish Law documents written in Hebrew, which are known to be rich in ambiguous abbreviations. Various variants of the one sense per discourse hypothesis (by varying the scope of discourse) have been implemented. Several common machine learning methods have been tested to find a successful integration of these variants. The best results have been achieved by SVM, with 96.09% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdi, H., Valentin, D., Edelman, B.: Neural networks. Sage, Thousand, Oaks (1999)

    Google Scholar 

  2. Adar, E.: S-RAD: A Simple and Robust Abbreviation Dictionary. Technical Report, HP Laboratories (2002)

    Google Scholar 

  3. Ashkenazi, S., Jarden, D.: Ozar Rashe Tevot: Thesaurus of Hebrew Abbreviations (in Hebrew). Kiryat Sefere LTD., Jerusalem (1994)

    Google Scholar 

  4. Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  5. Chang, C., Lin, C.: LIBSVM: a Library for Support Vector Machines. Software in Python (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  6. Frantzi, K., Ananiadou, S.: The C value domain independent method for multiword term extraction. JNLP 6(3), 145–179 (1999)

    Google Scholar 

  7. Gale, W., Church, K., Yarowsky, D.: One Sense per Discourse. In: Proceedings of the 4th DARPA speech in Natural Language Workshop, pp. 233–237 (1992)

    Google Scholar 

  8. Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving Abbreviations to their Senses in Medline. Bioinformatics 21(18), 3658–3664 (2005)

    Article  Google Scholar 

  9. Good, I. J.: The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press, Cambridge (1965)

    MATH  Google Scholar 

  10. Hacohen, Y. M.: Mishnah Berurah (in Hebrew). Hotzaat Leshem, Jerusalem (1995)

    Google Scholar 

  11. Hacohen, Y. M.: Mishnah Berurah. English Translation, Pisgah Foundation. Feldheim Publishers, Jerusalem (1990)

    Google Scholar 

  12. HaCohen-Kerner, Y., Kass, A., Peretz, A.: Baseline Methods for Automatic Disambiguation of Abbreviations in Jewish Law Documents. In: Vicedo, J. L., Martinez-Barco, P., Munoz, R., Noeda, M. S. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 58–69. Springer, Heidelberg (2004)

    Google Scholar 

  13. Ide, N., Véronis, J.: Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1–40 (1998)

    Google Scholar 

  14. Joint Commission on Accreditation of Healthcare Organizations: Medication errors related to potentially dangerous abbreviation. Sentinel Event Alert 23 (2001)

    Google Scholar 

  15. Liu, H., Aronson, A. R., Friedman, C.: A Study of Abbreviations in MEDLINE Abstracts. In: Proc AMIA Symp., pp. 464–469 (2002)

    Google Scholar 

  16. Miller, G. A.: The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity of Information. Psychological Science 63, 81–97 (1956)

    Google Scholar 

  17. Okazaki, N., Ananiadou, S.: Building an Abbreviation Dictionary using a Term Recognition Approach. Bioinformatics 22(24), 3089–3095 (2006)

    Article  Google Scholar 

  18. Okazaki, N., Ananiadou, S.: Clustering Acronyms in Biomedical Text for Disambiguation. In: Proceedings of fifth international conference on Language Resources and Evaluation (LREC), pp. 959–962 (2006)

    Google Scholar 

  19. Ovadia, Y.: Yechave Daat (in Hebrew). Chazon Ovadia, Jerusalem (1977)

    Google Scholar 

  20. Ovadia, Y.: Yabia Omer (in Hebrew). Chazon Ovadia, Jerusalem (1986)

    Google Scholar 

  21. Pakhomov, S.: Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts. Association for Computational Linguistics (ACL), pp. 160-167 (2002)

    Google Scholar 

  22. Pakhomov, S., Pedersen, T., Chute, C. G.: Abbreviation and Acronym Disambiguation in Clinical Discourse. In: American Medical Informatics Association Annual Symposium, pp. 589–593 (2005)

    Google Scholar 

  23. Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet: Similarity - Measuring the Relatedness of Concepts. In: Proceedings of the 9th National Conference on Artificial Intelligence, pp. 1024–1025 (2004)

    Google Scholar 

  24. Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M., Morrell, M., Rumshisky, A.: Extraction and Disambiguation of Acronym-Meaning Pairs in Medline (unpublished manuscript) (2001)

    Google Scholar 

  25. Quinlan, J. R.: C4.5: Programs For Machine Learning. Morgan Kaufmann, Los Altos (1993)

    Google Scholar 

  26. Salton, G.: The SMART Information Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs (1971)

    Google Scholar 

  27. Witten, H., Frank, E.: Weka 3.4.12: Machine Learning Software in Java(2007), http://www.cs.waikato.ac.nz/~ml/weka

  28. Yarowsky, D.: One Sense per Collocation. In: Proceedings of the Workshop on Human Language Technology, pp. 266–271 (1993)

    Google Scholar 

  29. Yu, H., Hripcsak, G., Friedman, C.: Mapping Abbreviations to Full Forms in Biomedical Articles. J. Am. Med. Inform. Assoc. 9(3), 262–272 (2002)

    Article  Google Scholar 

  30. Yu, Z., Tsuruoka, Y., Tsujii, J.: Automatic Resolution of Ambiguous Abbreviations in Biomedical Texts using SVM and One Sense per Discourse Hypothesis. In: SIGIR 2003 Workshop on Text Analysis and Search for Bioinformatics (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Epaminondas Kapetanios Vijayan Sugumaran Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

HaCohen-Kerner, Y., Kass, A., Peretz, A. (2008). Abbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds) Natural Language and Information Systems. NLDB 2008. Lecture Notes in Computer Science, vol 5039. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69858-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69858-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69857-9

  • Online ISBN: 978-3-540-69858-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics