Abbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis

HaCohen-Kerner, Yaakov; Kass, Ariel; Peretz, Ariel

doi:10.1007/978-3-540-69858-6_5

Yaakov HaCohen-Kerner¹,
Ariel Kass¹ &
Ariel Peretz¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5039))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1385 Accesses
4 Citations

Abstract

Abbreviations are very common and are widely used in both written and spoken language. However, they are not always explicitly defined and in many cases they are ambiguous. In this research, we present a process that attempts to solve the problem of abbreviation ambiguity. Various features have been explored, including context-related methods and statistical methods. The application domain is Jewish Law documents written in Hebrew, which are known to be rich in ambiguous abbreviations. Various variants of the one sense per discourse hypothesis (by varying the scope of discourse) have been implemented. Several common machine learning methods have been tested to find a successful integration of these variants. The best results have been achieved by SVM, with 96.09% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdi, H., Valentin, D., Edelman, B.: Neural networks. Sage, Thousand, Oaks (1999)
Google Scholar
Adar, E.: S-RAD: A Simple and Robust Abbreviation Dictionary. Technical Report, HP Laboratories (2002)
Google Scholar
Ashkenazi, S., Jarden, D.: Ozar Rashe Tevot: Thesaurus of Hebrew Abbreviations (in Hebrew). Kiryat Sefere LTD., Jerusalem (1994)
Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 273–297 (1995)
MATH Google Scholar
Chang, C., Lin, C.: LIBSVM: a Library for Support Vector Machines. Software in Python (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Frantzi, K., Ananiadou, S.: The C value domain independent method for multiword term extraction. JNLP 6(3), 145–179 (1999)
Google Scholar
Gale, W., Church, K., Yarowsky, D.: One Sense per Discourse. In: Proceedings of the 4th DARPA speech in Natural Language Workshop, pp. 233–237 (1992)
Google Scholar
Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving Abbreviations to their Senses in Medline. Bioinformatics 21(18), 3658–3664 (2005)
Article Google Scholar
Good, I. J.: The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press, Cambridge (1965)
MATH Google Scholar
Hacohen, Y. M.: Mishnah Berurah (in Hebrew). Hotzaat Leshem, Jerusalem (1995)
Google Scholar
Hacohen, Y. M.: Mishnah Berurah. English Translation, Pisgah Foundation. Feldheim Publishers, Jerusalem (1990)
Google Scholar
HaCohen-Kerner, Y., Kass, A., Peretz, A.: Baseline Methods for Automatic Disambiguation of Abbreviations in Jewish Law Documents. In: Vicedo, J. L., Martinez-Barco, P., Munoz, R., Noeda, M. S. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 58–69. Springer, Heidelberg (2004)
Google Scholar
Ide, N., Véronis, J.: Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1–40 (1998)
Google Scholar
Joint Commission on Accreditation of Healthcare Organizations: Medication errors related to potentially dangerous abbreviation. Sentinel Event Alert 23 (2001)
Google Scholar
Liu, H., Aronson, A. R., Friedman, C.: A Study of Abbreviations in MEDLINE Abstracts. In: Proc AMIA Symp., pp. 464–469 (2002)
Google Scholar
Miller, G. A.: The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity of Information. Psychological Science 63, 81–97 (1956)
Google Scholar
Okazaki, N., Ananiadou, S.: Building an Abbreviation Dictionary using a Term Recognition Approach. Bioinformatics 22(24), 3089–3095 (2006)
Article Google Scholar
Okazaki, N., Ananiadou, S.: Clustering Acronyms in Biomedical Text for Disambiguation. In: Proceedings of fifth international conference on Language Resources and Evaluation (LREC), pp. 959–962 (2006)
Google Scholar
Ovadia, Y.: Yechave Daat (in Hebrew). Chazon Ovadia, Jerusalem (1977)
Google Scholar
Ovadia, Y.: Yabia Omer (in Hebrew). Chazon Ovadia, Jerusalem (1986)
Google Scholar
Pakhomov, S.: Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts. Association for Computational Linguistics (ACL), pp. 160-167 (2002)
Google Scholar
Pakhomov, S., Pedersen, T., Chute, C. G.: Abbreviation and Acronym Disambiguation in Clinical Discourse. In: American Medical Informatics Association Annual Symposium, pp. 589–593 (2005)
Google Scholar
Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet: Similarity - Measuring the Relatedness of Concepts. In: Proceedings of the 9th National Conference on Artificial Intelligence, pp. 1024–1025 (2004)
Google Scholar
Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M., Morrell, M., Rumshisky, A.: Extraction and Disambiguation of Acronym-Meaning Pairs in Medline (unpublished manuscript) (2001)
Google Scholar
Quinlan, J. R.: C4.5: Programs For Machine Learning. Morgan Kaufmann, Los Altos (1993)
Google Scholar
Salton, G.: The SMART Information Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs (1971)
Google Scholar
Witten, H., Frank, E.: Weka 3.4.12: Machine Learning Software in Java(2007), http://www.cs.waikato.ac.nz/~ml/weka
Yarowsky, D.: One Sense per Collocation. In: Proceedings of the Workshop on Human Language Technology, pp. 266–271 (1993)
Google Scholar
Yu, H., Hripcsak, G., Friedman, C.: Mapping Abbreviations to Full Forms in Biomedical Articles. J. Am. Med. Inform. Assoc. 9(3), 262–272 (2002)
Article Google Scholar
Yu, Z., Tsuruoka, Y., Tsujii, J.: Automatic Resolution of Ambiguous Abbreviations in Biomedical Texts using SVM and One Sense per Discourse Hypothesis. In: SIGIR 2003 Workshop on Text Analysis and Search for Bioinformatics (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Jerusalem College of Technology (Machon Lev), 21 Havaad Haleumi St., P.O.B. 16031, 91160, Jerusalem, Israel
Yaakov HaCohen-Kerner, Ariel Kass & Ariel Peretz

Authors

Yaakov HaCohen-Kerner
View author publications
You can also search for this author in PubMed Google Scholar
Ariel Kass
View author publications
You can also search for this author in PubMed Google Scholar
Ariel Peretz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Epaminondas Kapetanios Vijayan Sugumaran Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

HaCohen-Kerner, Y., Kass, A., Peretz, A. (2008). Abbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds) Natural Language and Information Systems. NLDB 2008. Lecture Notes in Computer Science, vol 5039. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69858-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-69858-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69857-9
Online ISBN: 978-3-540-69858-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics