Abstract
The paper describes a method of dividing complex sentences into segments, easily detectable and linguistically motivated units that may be subsequently combined into clauses and thus provide a structure of a complex sentence with regard to the mutual relationship of individual clauses. The method has been developed for Czech as a language representing languages with relatively high degree of word-order freedom. The paper introduces important terms, describes a segmentation chart, the data structure used for the description of mutual relationship between individual segments and separators. It also contains a simple set of rules applied for the segmentation of a small set of Czech sentences. The segmentation results are evaluated against a small hand-annotated corpus of Czech complex sentences.
This paper is a result of the project supported by the grant No. 1ET100300517.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Oliva, K.: A Parser for Czech Implemented in Systems Q. In: Explizite Beschreibung der Sprache und automatische Textbearbeitung, MFF UK Praha (1989)
Kuboň, V.: Problems of Robust Parsing of Czech. Ph.D. Thesis, MFF UK, Prague (2001)
Zeman, D.: Parsing with a Statistical Dependency Model. Ph.D. Thesis. MFF UK, Prague (2004)
Abney, S.: Partial Parsing via Finite-State Cascades. Journal of Natural Language Engineering 2(4), 337–344 (1995)
Ciravegna, F., Lavelli, A.: Full Text Parsing using Cascades of Rules: An Information Extraction Procedure. In: Proceedings of EACL 1999, University of Bergen (1999)
Brants, T.: Cascaded Markov Models. In: Proceedings of EACL 1999, University of Bergen (1999)
Debusmann, R., Duchier, D., Rossberg, A.: Modular grammar design with typed parametric principles. In: Proceedings of FG-MOL 2005, Edinburgh (2005)
Jones, B.E.M.: Exploiting the Role of Punctuation in Parsing Natural Text. In: Proceedings of the COLING 1994, pp. 421–425. University of Kyoto, Kyoto (1994)
Hajič, J., Vidová-Hladká, B., Zeman, D.: Core Natural Language Processing Technology Applicable to Multiple Languages. In: The Workshop 1998 Final Report. Center for Language and Speech Processing, Johns Hopkins University, Baltimore (1998)
Šmilauer, V.: Učebnice větného rozboru. SPN, Praha (1958)
Holan, T., Kuboň, V., Oliva, K., Plátek, M.: On Complexity of Word Order. Les grammaires de dépendance – Traitement automatique des langues 41(1), 273–300 (2000)
Hajič, J.: Disambiguation of Rich Inflection (Computational Morphology of Czech). UK, Nakladatelství Karolinum, Praha (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuboň, V., Lopatková, M., Plátek, M., Pognan, P. (2006). Segmentation of Complex Sentences. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_19
Download citation
DOI: https://doi.org/10.1007/11846406_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)