Abstract
Morphological analysis is used to study the internal structure words by reducing the number of vocabularies used while retaining the semantic meaning of the knowledge in NLP system. Most of the existing algorithms are focusing on stemmatization instead of lemmatization process. Even with technology advancement, yet none of the available lemmatization algorithms able to produce 100 % accurate result. The base words produced by the current algorithm might be unusable as it alters the overall meaning it tried to represent, which will directly affect the outcome of NLP systems. This paper proposed a new method to handle lemmatization process during the morphological analysis. The method consists three layers of lemmatization process, which incorporate the used of Stanford parser API, WordNet database and adaptive learning technique. The lemmatized words yields from the proposed method are more accurate, thus it will improve the semantic knowledge represented and stored in the knowledge base.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Smirnov, I.: Ilia Smirnov DePaul University, 3 December 2008 (1980)
Bohnet, B., Nivre, J., Boguslavsky, I., Ginter, F., Hajič, J.: Joint morphological and syntactic analysis for richly inflected languages. Trans. Assoc. Comput. Linguist. 1(2012), 415–428 (2013)
Patil, L.H., Atique, M.: A Semantic approach for effective document clustering using WordNet. CoRR abs/1303.0489, pp. 1–5 (2013)
Patil, L.H.: A novel feature selection based on information gain using WordNet, pp. 625–629 (2013)
Ferreira, R., Freitas, F., Cabral, L.D.S., Lins, R.D., Lima, R., Franca, G., Simskez, S.J., Favaro, L.: A four dimension graph model for automatic text summarization. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 389–396 (2013)
Corro, L.D., Gemulla, R.: ClausIE: clause-based open information extraction. In: 22nd International Conference on World Wide Web, WWW 2013, pp. 355–365 (2013)
NLP Meets the Jabberwocky: Natural Language Processing in Information Retrieval. http://www.scism.lsbu.ac.uk/inmandw/ir/jaberwocky.htm. Accessed 22 April 2014
Irregular Verbs — Rules! http://www.chompchomp.com/rules/irregularrules01.htm. Accessed 23 April 2014
Lovins, B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11, 22–31 (1968)
Emerging Issues in the Natural and (2013)
Frakes, W.B., Tech, V., Fox, C.J.: Strength and similarity of affix removal stemming algorithms stemmer strength metrics stemmer similarity metrics the wordlist descriptive stemmer data. ACM SIGIR Forum 37(1), 26–30 (2003)
Natural Language Toolkit — NLTK 3.0 documentation. http://www.nltk.org/. Accessed 13 April 2014
The Stanford NLP (Natural Language Processing) Group. http://nlp.stanford.edu/software/lex-parser.shtml. Accessed 13 April 2014
Augat, M., Ladlow, M.: CS65: An NLTK Package for Lexical-Chain Based Word Sense Disambiguation (2004)
Chintala, D.R., Reddy, E.M.: An approach to enhance the CPI using Porter stemming algorithm. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(7), 1148–1156 (2013)
Porter Stemmer. http://www.comp.lancs.ac.uk/computing/research/stemming/Links/porter.htm. Accessed 03 June 2014
Ali, N.H.: Porter stemming algorithm for semantic checking, pp. 253–258 (2012)
Snowball: A language for stemming algorithms. http://snowball.tartarus.org/texts/introduction.html. Accessed 19 March 2014
Paice/Husk Stemmer. http://www.comp.lancs.ac.uk/computing/research/stemming/Links/paice.html. Accessed 14 April 2014
What is Paice/Husk Stemming? http://www.comp.lancs.ac.uk/computing/research/stemming/general/paice.htm. Accessed 28 April 2014
Muchemi, L., Popowich, F.: An Ontology-based Architecture for Natural Language Access to Relational Databases, pp. 1–11. Springer, Heidelberg (2011)
MORPHY(7WN) manual page. http://wordnet.princeton.edu/wordnet/man/morphy.7WN.html. Accessed 18 April 2014
Kanis, J., Müller, L.: Automatic lemmatizer construction with focus on OOV words lemmatization. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 132–139. Springer, Heidelberg (2005)
Hart, L.: The linguistics of sentiment analysis (2013)
Dhanalakshmi, V., Anandkumar, M., Rekha, R.U., Arunkumar, C., Soman, K.P.: Morphological analyzer for agglutinative languages using machine learning approaches (2009)
Durrett, G., Denero, J.: Supervised learning of complete morphological paradigms. In: Proceedings of NAACL-HLT, pp. 1185–1195 (2013)
Kirschenbaum, A., Wittenburg, P., Heyer, G.: Unsupervised morphological analysis of small corpora: First experiments with Kilivila. vol. 3, no. 3 (2012)
Mugdan, J., Booij, G., Lehmann, Ch.: Morphology. A Handbook on Inflection and Word Formatio, pp. 1893–1900. Walter De Gruyter, New York (2004)
Van Den Bosch, A., Marsi, E., Soudi, A.: Memory-based morphological analysis and part-of-speech tagging of Arabic, Sect. 4, pp. 1–15 (1999)
Kohonen, O., Virpioja, S., Lagus, K.: Semi-supervised learning of concatenative morphology (2005)
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27(2), 153–198 (2001)
Keshava, S., Haven, N., Pitler, E.: A Simpler, intuitive approach to morpheme induction. In: Proceedings of 2nd Pascal Challenges Workshop, pp. 31–35 (2006)
Tang, X.: English Morphological Analysis with Machine-learned Rules, pp. 35–41 (2005)
Yang, M., Zheng, J., Kathol, A.: A semisupervised learning approach for morpheme segmentation for an Arabic dialect (2007)
Van Leeuwen, J.: Algorithms that Learn. Algorithms in Ambient Intelligence Philips Research, Chap. 1, vol. 2, pp. 151–166. Springer (2004)
To, H., Ichise, R., Le, H.: An Adaptive machine learning framework with user interaction for ontology matching. In: Proceedings of the International Joint Conferences on Artifical Intelligence, Workshop on Information Integration on the Web, pp. 35–40 (2009)
Comput. J. 50(4) (2007)
Acknowledgments
The work presented in this paper has been supported by the Long Term Research Grant Scheme (LRGS) project funded by the Ministry of Higher Education (MoHE), Malaysia under Grants No. LRGS/TD/2011/UiTM/ICT/03.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ting, M., Kadir, R.A., Sembok, T.M.T., Ahmad, F., Azman, A. (2015). Adaptive Learning for Lemmatization in Morphology Analysis. In: Fujita, H., Selamat, A. (eds) Intelligent Software Methodologies, Tools and Techniques. SoMeT 2014. Communications in Computer and Information Science, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-319-17530-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-17530-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17529-4
Online ISBN: 978-3-319-17530-0
eBook Packages: Computer ScienceComputer Science (R0)