Skip to main content

Adaptive Learning for Lemmatization in Morphology Analysis

  • Conference paper
  • First Online:
  • 867 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 513))

Abstract

Morphological analysis is used to study the internal structure words by reducing the number of vocabularies used while retaining the semantic meaning of the knowledge in NLP system. Most of the existing algorithms are focusing on stemmatization instead of lemmatization process. Even with technology advancement, yet none of the available lemmatization algorithms able to produce 100 % accurate result. The base words produced by the current algorithm might be unusable as it alters the overall meaning it tried to represent, which will directly affect the outcome of NLP systems. This paper proposed a new method to handle lemmatization process during the morphological analysis. The method consists three layers of lemmatization process, which incorporate the used of Stanford parser API, WordNet database and adaptive learning technique. The lemmatized words yields from the proposed method are more accurate, thus it will improve the semantic knowledge represented and stored in the knowledge base.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Smirnov, I.: Ilia Smirnov DePaul University, 3 December 2008 (1980)

    Google Scholar 

  2. Bohnet, B., Nivre, J., Boguslavsky, I., Ginter, F., Hajič, J.: Joint morphological and syntactic analysis for richly inflected languages. Trans. Assoc. Comput. Linguist. 1(2012), 415–428 (2013)

    Google Scholar 

  3. Patil, L.H., Atique, M.: A Semantic approach for effective document clustering using WordNet. CoRR abs/1303.0489, pp. 1–5 (2013)

    Google Scholar 

  4. Patil, L.H.: A novel feature selection based on information gain using WordNet, pp. 625–629 (2013)

    Google Scholar 

  5. Ferreira, R., Freitas, F., Cabral, L.D.S., Lins, R.D., Lima, R., Franca, G., Simskez, S.J., Favaro, L.: A four dimension graph model for automatic text summarization. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 389–396 (2013)

    Google Scholar 

  6. Corro, L.D., Gemulla, R.: ClausIE: clause-based open information extraction. In: 22nd International Conference on World Wide Web, WWW 2013, pp. 355–365 (2013)

    Google Scholar 

  7. NLP Meets the Jabberwocky: Natural Language Processing in Information Retrieval. http://www.scism.lsbu.ac.uk/inmandw/ir/jaberwocky.htm. Accessed 22 April 2014

  8. Irregular Verbs — Rules! http://www.chompchomp.com/rules/irregularrules01.htm. Accessed 23 April 2014

  9. Lovins, B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11, 22–31 (1968)

    Google Scholar 

  10. Emerging Issues in the Natural and (2013)

    Google Scholar 

  11. Frakes, W.B., Tech, V., Fox, C.J.: Strength and similarity of affix removal stemming algorithms stemmer strength metrics stemmer similarity metrics the wordlist descriptive stemmer data. ACM SIGIR Forum 37(1), 26–30 (2003)

    Article  Google Scholar 

  12. Natural Language Toolkit — NLTK 3.0 documentation. http://www.nltk.org/. Accessed 13 April 2014

  13. The Stanford NLP (Natural Language Processing) Group. http://nlp.stanford.edu/software/lex-parser.shtml. Accessed 13 April 2014

  14. Augat, M., Ladlow, M.: CS65: An NLTK Package for Lexical-Chain Based Word Sense Disambiguation (2004)

    Google Scholar 

  15. Chintala, D.R., Reddy, E.M.: An approach to enhance the CPI using Porter stemming algorithm. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(7), 1148–1156 (2013)

    Google Scholar 

  16. Porter Stemmer. http://www.comp.lancs.ac.uk/computing/research/stemming/Links/porter.htm. Accessed 03 June 2014

  17. Ali, N.H.: Porter stemming algorithm for semantic checking, pp. 253–258 (2012)

    Google Scholar 

  18. Snowball: A language for stemming algorithms. http://snowball.tartarus.org/texts/introduction.html. Accessed 19 March 2014

  19. Paice/Husk Stemmer. http://www.comp.lancs.ac.uk/computing/research/stemming/Links/paice.html. Accessed 14 April 2014

  20. What is Paice/Husk Stemming? http://www.comp.lancs.ac.uk/computing/research/stemming/general/paice.htm. Accessed 28 April 2014

  21. Muchemi, L., Popowich, F.: An Ontology-based Architecture for Natural Language Access to Relational Databases, pp. 1–11. Springer, Heidelberg (2011)

    Google Scholar 

  22. MORPHY(7WN) manual page. http://wordnet.princeton.edu/wordnet/man/morphy.7WN.html. Accessed 18 April 2014

  23. Kanis, J., Müller, L.: Automatic lemmatizer construction with focus on OOV words lemmatization. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 132–139. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  24. Hart, L.: The linguistics of sentiment analysis (2013)

    Google Scholar 

  25. Dhanalakshmi, V., Anandkumar, M., Rekha, R.U., Arunkumar, C., Soman, K.P.: Morphological analyzer for agglutinative languages using machine learning approaches (2009)

    Google Scholar 

  26. Durrett, G., Denero, J.: Supervised learning of complete morphological paradigms. In: Proceedings of NAACL-HLT, pp. 1185–1195 (2013)

    Google Scholar 

  27. Kirschenbaum, A., Wittenburg, P., Heyer, G.: Unsupervised morphological analysis of small corpora: First experiments with Kilivila. vol. 3, no. 3 (2012)

    Google Scholar 

  28. Mugdan, J., Booij, G., Lehmann, Ch.: Morphology. A Handbook on Inflection and Word Formatio, pp. 1893–1900. Walter De Gruyter, New York (2004)

    Google Scholar 

  29. Van Den Bosch, A., Marsi, E., Soudi, A.: Memory-based morphological analysis and part-of-speech tagging of Arabic, Sect. 4, pp. 1–15 (1999)

    Google Scholar 

  30. Kohonen, O., Virpioja, S., Lagus, K.: Semi-supervised learning of concatenative morphology (2005)

    Google Scholar 

  31. Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27(2), 153–198 (2001)

    Article  MathSciNet  Google Scholar 

  32. Keshava, S., Haven, N., Pitler, E.: A Simpler, intuitive approach to morpheme induction. In: Proceedings of 2nd Pascal Challenges Workshop, pp. 31–35 (2006)

    Google Scholar 

  33. Tang, X.: English Morphological Analysis with Machine-learned Rules, pp. 35–41 (2005)

    Google Scholar 

  34. Yang, M., Zheng, J., Kathol, A.: A semisupervised learning approach for morpheme segmentation for an Arabic dialect (2007)

    Google Scholar 

  35. Van Leeuwen, J.: Algorithms that Learn. Algorithms in Ambient Intelligence Philips Research, Chap. 1, vol. 2, pp. 151–166. Springer (2004)

    Google Scholar 

  36. To, H., Ichise, R., Le, H.: An Adaptive machine learning framework with user interaction for ontology matching. In: Proceedings of the International Joint Conferences on Artifical Intelligence, Workshop on Information Integration on the Web, pp. 35–40 (2009)

    Google Scholar 

  37. Comput. J. 50(4) (2007)

    Google Scholar 

Download references

Acknowledgments

The work presented in this paper has been supported by the Long Term Research Grant Scheme (LRGS) project funded by the Ministry of Higher Education (MoHE), Malaysia under Grants No. LRGS/TD/2011/UiTM/ICT/03.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mary Ting .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ting, M., Kadir, R.A., Sembok, T.M.T., Ahmad, F., Azman, A. (2015). Adaptive Learning for Lemmatization in Morphology Analysis. In: Fujita, H., Selamat, A. (eds) Intelligent Software Methodologies, Tools and Techniques. SoMeT 2014. Communications in Computer and Information Science, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-319-17530-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17530-0_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17529-4

  • Online ISBN: 978-3-319-17530-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics