Abstract
Using sliding-window rule application and extraction filtering techniques, we propose a framework for extracting semantic frames from Thai textual phrases with unknown boundaries based on patterns of triggering terms. A supervised rule learning algorithm is used for constructing multi-slot extraction rules from hand-tagged training phrases. A filtering module is introduced for predicting rule application across phrase boundaries based on instantiation features of rule internal wildcards. The framework is applied to text documents in three domains with different target-phrase density and average lengths. The experimental results show that the filtering module improves precision and preserves high recall satisfactorily, yielding extraction performance comparable to frame extraction with manually identified phrase boundaries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chang, C.-H., et al.: A Survey of Web Information Extraction Systems. IEEE Trans. Knowledge and Data Engineering 18, 1411–1428 (2006)
Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34, 233–272 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Intarapaiboon, P., Nantajeewarawat, E., Theeramunkong, T. (2009). Information Extraction from Thai Text with Unknown Phrase Boundaries. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-01307-2_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)