Adaptive Learning for Lemmatization in Morphology Analysis

Ting, Mary; Kadir, Rabiah Abdul; Sembok, Tengku Mohd Tengku; Ahmad, Fatimah; Azman, Azreen

doi:10.1007/978-3-319-17530-0_24

Adaptive Learning for Lemmatization in Morphology Analysis

Mary Ting³,
Rabiah Abdul Kadir⁴,
Tengku Mohd Tengku Sembok⁵,
Fatimah Ahmad⁵ &
…
Azreen Azman³

Conference paper
First Online: 01 January 2015

867 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 513))

Abstract

Morphological analysis is used to study the internal structure words by reducing the number of vocabularies used while retaining the semantic meaning of the knowledge in NLP system. Most of the existing algorithms are focusing on stemmatization instead of lemmatization process. Even with technology advancement, yet none of the available lemmatization algorithms able to produce 100 % accurate result. The base words produced by the current algorithm might be unusable as it alters the overall meaning it tried to represent, which will directly affect the outcome of NLP systems. This paper proposed a new method to handle lemmatization process during the morphological analysis. The method consists three layers of lemmatization process, which incorporate the used of Stanford parser API, WordNet database and adaptive learning technique. The lemmatized words yields from the proposed method are more accurate, thus it will improve the semantic knowledge represented and stored in the knowledge base.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Smirnov, I.: Ilia Smirnov DePaul University, 3 December 2008 (1980)
Google Scholar
Bohnet, B., Nivre, J., Boguslavsky, I., Ginter, F., Hajič, J.: Joint morphological and syntactic analysis for richly inflected languages. Trans. Assoc. Comput. Linguist. 1(2012), 415–428 (2013)
Google Scholar
Patil, L.H., Atique, M.: A Semantic approach for effective document clustering using WordNet. CoRR abs/1303.0489, pp. 1–5 (2013)
Google Scholar
Patil, L.H.: A novel feature selection based on information gain using WordNet, pp. 625–629 (2013)
Google Scholar
Ferreira, R., Freitas, F., Cabral, L.D.S., Lins, R.D., Lima, R., Franca, G., Simskez, S.J., Favaro, L.: A four dimension graph model for automatic text summarization. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 389–396 (2013)
Google Scholar
Corro, L.D., Gemulla, R.: ClausIE: clause-based open information extraction. In: 22nd International Conference on World Wide Web, WWW 2013, pp. 355–365 (2013)
Google Scholar
NLP Meets the Jabberwocky: Natural Language Processing in Information Retrieval. http://www.scism.lsbu.ac.uk/inmandw/ir/jaberwocky.htm. Accessed 22 April 2014
Irregular Verbs — Rules! http://www.chompchomp.com/rules/irregularrules01.htm. Accessed 23 April 2014
Lovins, B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11, 22–31 (1968)
Google Scholar
Emerging Issues in the Natural and (2013)
Google Scholar
Frakes, W.B., Tech, V., Fox, C.J.: Strength and similarity of affix removal stemming algorithms stemmer strength metrics stemmer similarity metrics the wordlist descriptive stemmer data. ACM SIGIR Forum 37(1), 26–30 (2003)
Article Google Scholar
Natural Language Toolkit — NLTK 3.0 documentation. http://www.nltk.org/. Accessed 13 April 2014
The Stanford NLP (Natural Language Processing) Group. http://nlp.stanford.edu/software/lex-parser.shtml. Accessed 13 April 2014
Augat, M., Ladlow, M.: CS65: An NLTK Package for Lexical-Chain Based Word Sense Disambiguation (2004)
Google Scholar
Chintala, D.R., Reddy, E.M.: An approach to enhance the CPI using Porter stemming algorithm. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(7), 1148–1156 (2013)
Google Scholar
Porter Stemmer. http://www.comp.lancs.ac.uk/computing/research/stemming/Links/porter.htm. Accessed 03 June 2014
Ali, N.H.: Porter stemming algorithm for semantic checking, pp. 253–258 (2012)
Google Scholar
Snowball: A language for stemming algorithms. http://snowball.tartarus.org/texts/introduction.html. Accessed 19 March 2014
Paice/Husk Stemmer. http://www.comp.lancs.ac.uk/computing/research/stemming/Links/paice.html. Accessed 14 April 2014
What is Paice/Husk Stemming? http://www.comp.lancs.ac.uk/computing/research/stemming/general/paice.htm. Accessed 28 April 2014
Muchemi, L., Popowich, F.: An Ontology-based Architecture for Natural Language Access to Relational Databases, pp. 1–11. Springer, Heidelberg (2011)
Google Scholar
MORPHY(7WN) manual page. http://wordnet.princeton.edu/wordnet/man/morphy.7WN.html. Accessed 18 April 2014
Kanis, J., Müller, L.: Automatic lemmatizer construction with focus on OOV words lemmatization. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 132–139. Springer, Heidelberg (2005)
Chapter Google Scholar
Hart, L.: The linguistics of sentiment analysis (2013)
Google Scholar
Dhanalakshmi, V., Anandkumar, M., Rekha, R.U., Arunkumar, C., Soman, K.P.: Morphological analyzer for agglutinative languages using machine learning approaches (2009)
Google Scholar
Durrett, G., Denero, J.: Supervised learning of complete morphological paradigms. In: Proceedings of NAACL-HLT, pp. 1185–1195 (2013)
Google Scholar
Kirschenbaum, A., Wittenburg, P., Heyer, G.: Unsupervised morphological analysis of small corpora: First experiments with Kilivila. vol. 3, no. 3 (2012)
Google Scholar
Mugdan, J., Booij, G., Lehmann, Ch.: Morphology. A Handbook on Inflection and Word Formatio, pp. 1893–1900. Walter De Gruyter, New York (2004)
Google Scholar
Van Den Bosch, A., Marsi, E., Soudi, A.: Memory-based morphological analysis and part-of-speech tagging of Arabic, Sect. 4, pp. 1–15 (1999)
Google Scholar
Kohonen, O., Virpioja, S., Lagus, K.: Semi-supervised learning of concatenative morphology (2005)
Google Scholar
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27(2), 153–198 (2001)
Article MathSciNet Google Scholar
Keshava, S., Haven, N., Pitler, E.: A Simpler, intuitive approach to morpheme induction. In: Proceedings of 2nd Pascal Challenges Workshop, pp. 31–35 (2006)
Google Scholar
Tang, X.: English Morphological Analysis with Machine-learned Rules, pp. 35–41 (2005)
Google Scholar
Yang, M., Zheng, J., Kathol, A.: A semisupervised learning approach for morpheme segmentation for an Arabic dialect (2007)
Google Scholar
Van Leeuwen, J.: Algorithms that Learn. Algorithms in Ambient Intelligence Philips Research, Chap. 1, vol. 2, pp. 151–166. Springer (2004)
Google Scholar
To, H., Ichise, R., Le, H.: An Adaptive machine learning framework with user interaction for ontology matching. In: Proceedings of the International Joint Conferences on Artifical Intelligence, Workshop on Information Integration on the Web, pp. 35–40 (2009)
Google Scholar
Comput. J. 50(4) (2007)
Google Scholar

Download references

Acknowledgments

The work presented in this paper has been supported by the Long Term Research Grant Scheme (LRGS) project funded by the Ministry of Higher Education (MoHE), Malaysia under Grants No. LRGS/TD/2011/UiTM/ICT/03.

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, 43400, Serdang, Selangor, Malaysia
Mary Ting & Azreen Azman
Institute of Visual Informatic, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Rabiah Abdul Kadir
Department of Computer Science, Faculty of Defence Science and Technology, National Defence University of Malaysia, Kem Sungai Besi, 57000, Kuala Lumpur, Malaysia
Tengku Mohd Tengku Sembok & Fatimah Ahmad

Authors

Mary Ting
View author publications
You can also search for this author in PubMed Google Scholar
Rabiah Abdul Kadir
View author publications
You can also search for this author in PubMed Google Scholar
Tengku Mohd Tengku Sembok
View author publications
You can also search for this author in PubMed Google Scholar
Fatimah Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Azreen Azman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mary Ting .

Editor information

Editors and Affiliations

Iwate Prefectural University, Takizawa, Japan
Hamido Fujita
Universiti Teknologi Malaysia, Johor Baharu, Malaysia
Ali Selamat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ting, M., Kadir, R.A., Sembok, T.M.T., Ahmad, F., Azman, A. (2015). Adaptive Learning for Lemmatization in Morphology Analysis. In: Fujita, H., Selamat, A. (eds) Intelligent Software Methodologies, Tools and Techniques. SoMeT 2014. Communications in Computer and Information Science, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-319-17530-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-17530-0_24
Published: 07 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17529-4
Online ISBN: 978-3-319-17530-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics