Abstract
This work presents the concept-based text classification for organizing of traditional Thai medicine recipes. These recipes were translated from the Northeastern Thai palm leaf manuscripts. It is noted that each medicine recipe is presented with the ancient Isan language. The proposed method is called ‘concept-based text classification’, because we utilize ‘concepts’ as document features, where a concept is a surrogate of a word group having a same meaning. The main mechanisms in the method are the k-Nearest Neighbor algorithm and an ancient Isan dictionary, called Isan-Thai Markup Language (ITML). The objective of this work is to assign the Thai medicine recipes into predefined 5 groups. They are the groups of medicine recipe for headache and fever, stomachache and abdomen, skin, abscess, and faint and vertigo, respectively. After testing by recall, precision, and F-measure, it returns the satisfactory results of automatic text classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Iijima, A.: A Historical Approach to the Palm-Leaf Manuscripts Preserved in Wat Mahathat, Yasothon (Thailand), http://www.laomanuscripts.net/downloads/literaryheritageoflaos26_iijima_en.pdf
Manmart, L., Chamnongsri, N., Wuwongse, V.: Metadata Development for Palm Leaf Manuscripts in Thailand. In: Proceedings of International Conference on Dublin Core and Metadata Applications (2012)
Shi, Z., Setlur, S., Govindaraju, V.: Digital Enhancement of Palm Leaf Manuscript Images using Normalization Techniques. In: Proceeding of the 5th International Conference on Knowledge-based Computer Systems (2004)
Polpinij, J.: Concept-Based Cross Language Retrieval for Thai Medicine Recipes. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds.) ICADL 2014. LNCS, vol. 8839, pp. 320–327. Springer, Heidelberg (2014)
van der Vlist, E.: XML Schema. O’Reilly (2002)
Haruechaiyasak, C., Kongyoung, S., Dailey, M.N.: A comparative study on Thai word segmentation approaches. In: Proceeding of the 5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) (2008)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley (1999)
Sivic, J.: Efficient visual search of videos cast as text retrieval. IEEE Transactions on Pattern Analysis ans Machine Intelligence 31, 591–605 (2009)
Yang, Y., Pederson, J.O.: A Comparative Study on Features selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML), Nashville, Tennessee (1997)
Soucy, P., Mineau, G.W.: Beyond TFIDF Weighting for Text Categorization in the Vector Space Model. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI) (2005)
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proceedings of the European Conference on Machine (1998)
Li, B., Chen, Y., Yu, S.: A Comparative Study on Automatic Categorization Methods for Chinese Search Engine. In: Proceedings of the 8th Joint International Computer Conference, pp. 117–120. Zhejiang University Press, Hangzhou (2002)
Li, B., Yu, S., Lu, Q.: An Improved k-Nearest Neighbor Algorithm for text categorization. In: Proceedigns of the 20th International Conference on Computer Processing of Oriental Language (2003)
García, V., Sánchez, J.S., Mollineda, R.A., Alejo, R., Sotoca, J.M.: The class imbalance problem in pattern classification and learning. In: Congreso Español de Informática (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sibunruang, C., Polpinij, J. (2015). Concept-Based Text Classification of Thai Medicine Recipes Represented with Ancient Isan Language. In: Unger, H., Meesad, P., Boonkrong, S. (eds) Recent Advances in Information and Communication Technology 2015. Advances in Intelligent Systems and Computing, vol 361. Springer, Cham. https://doi.org/10.1007/978-3-319-19024-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-19024-2_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19023-5
Online ISBN: 978-3-319-19024-2
eBook Packages: EngineeringEngineering (R0)