Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4822))

Included in the following conference series:

Abstract

The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G., Lesk.: Computer evaluation of indexing and text processing. Prentice Hall, Englewood Cliffs (1971)

    Google Scholar 

  2. Smucker, M.D., Allan, J.: Find-similar: similarity browsing as a search tool. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 461–468. ACM Press, New York (2006)

    Chapter  Google Scholar 

  3. Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: SIGIR 2004 (2004)

    Google Scholar 

  4. Mooney, R.J., Bunescu, R.: Mining Knowledge from Text Using Information Extraction. SIGKDD Explorations 7(1), 3–10 (2005)

    Article  Google Scholar 

  5. Buitelaar, P., Cimiano, P.: Bernardo Magnini Ontology Learning from Text: An Overview. In: Buitelaar, P., Cimiano, P., Magnini, B. (eds.) Ontology Learning from Text: Methods, Evaluation and Applications Frontiers in Artificial Intelligence and Applications Series, vol. 123, IOS Press, Amsterdam, Trento, Italy (2005)

    Google Scholar 

  6. Cilibrasi, R., Vitanyi, P.M.B.: Similarity of objects and the meaning of words. In: Cai, J.-Y., Cooper, S.B., Li, A. (eds.) TAMC 2006. LNCS, vol. 3959, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Mihalcea, R., Corley, C., Strapparave, C.: Corpus based and knowledge based measures of text semantic similarity. In: Proceedings of the American Association for Artificial Intelligence (AAAI 2006) (2006)

    Google Scholar 

  8. Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering 18(8), 1138–1150 (2006)

    Article  Google Scholar 

  9. Metzler, D., Bernstein, Y., Croft, W.B., Moffat, A., Zobel, J.: Similarity Measures for Tracking Information Flow. In: Proceedings of the CIKM 2005, pp. 571–524 (2005)

    Google Scholar 

  10. Tatu, M., Moldovan, D.: A semantic approach to recognizing textual entailment. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 371–378 (2005)

    Google Scholar 

  11. Hamzah, M.P., Sembok, T.M.: Enhance retrieval of Malay documents by exploiting implicit semantic relationship between words. Enformatika 10, 89–94 (2005)

    Google Scholar 

  12. Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proceedings of the 12th European Conference on Machine Learning (2001)

    Google Scholar 

  13. Karov, Edement.: Similarity-based Word Sense Disambiguation. Computational Linguitics 24(1), 41–59 (1998)

    Google Scholar 

  14. Leacock, C., Chodorow, M.: Combining local context and WordNet sense similarity for word sense identification. WordNet, An Electronic Lexical Database. The MIT Press, Cambridge (1998)

    Google Scholar 

  15. Resnik, P.: Using information content to evaluate the semantic similarity. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (1995)

    Google Scholar 

  16. Lesk, Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone (1986)

    Google Scholar 

  17. Miller, G.A.: WordNet: a lexical database for English. Communication of the ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  18. Wiemer-Hastings, P.: Adding syntactic information to LSA. In: Proceedings of the 2nd Annual Conference on Cognitive Science, pp. 989–993 (2000)

    Google Scholar 

  19. Ahmad, F., Yusoff, M., Sembok, T.M.T.: Experiments with a Stemming Algorithm for Malay Words. JASIS 47(12), 909–918 (1996)

    Article  Google Scholar 

  20. Othman, A.: Pengakar perkataan melayu untuk sistem capaian dokumen. MSc Thesis. National University of Malaysia (1993)

    Google Scholar 

  21. Xu, J., Croft, W.B.: Corpus-based stemming using coocurrence of word variants. ACM Transactions on Information Systems 16(1), 61–81 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dion Hoe-Lian Goh Tru Hoang Cao Ingeborg Torvik Sølvberg Edie Rasmussen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Noah, S.A., Amruddin, A.Y., Omar, N. (2007). Semantic Similarity Measures for Malay Sentences. In: Goh, D.HL., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds) Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. ICADL 2007. Lecture Notes in Computer Science, vol 4822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77094-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77094-7_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77093-0

  • Online ISBN: 978-3-540-77094-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics