Automatic Structuring of Written Texts

Veber, Marek; Horák, Aleš; Julinek, Rostislav; Smrž, Pavel

doi:10.1007/3-540-48239-3_18

Marek Veber³,
Aleš Horák³,
Rostislav Julinek³ &
…
Pavel Smrž³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

International Workshop on Text, Speech and Dialogue

471 Accesses

Abstract

This paper deals with automatic structuring and sentence boundary labelling in natural language texts. We describe the implemented structure tagging algorithm and heuristic rules that are used for automatic or semiautomatic labelling. Inside the detected sentence the algorithm performs a decomposition to clauses and then marks the parts of text which do not form a sentence, i.e. headings, signatures, tables and other structured data. We also pay attention to the processing of matched symbols in the text, especially to the analysis of direct speech notation.

The research is sponsored by the Czech Ministry of Education under the grant VS 97028.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In the 3rd Conference on Applied Natural Language Processing, Trento, Italy 1991.
Google Scholar
Riley, M., D.: Some applications of tree-based modeling to speech and language indexing. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 339–352, Morgan Kaufmann 1989.
Google Scholar
Palmer, D., D., Hearst, M., A.: Adaptive Sentence Boundary Disambiguation. In The Proceedings of the ANLP’ 1994, Stuttgart, Germany, October 1994.
Google Scholar
Pala, K., Rychlý, P., Smrž, P.: DESAM — Approaches to Disambiguation. Technical Report FIMU-RS-97-09, Faculty of Informatics, Masaryk University, Brno, 1997.
Google Scholar
Pala, K., Rychlý, P., Smrž, P.: DESAM — Annotated Corpus for Czech. In Proceedings of SOFSEM’97.
Google Scholar
Ševeček, P.: LEMMA morphological analyzer and lemmatizer for Czech, program in “C”, Brno, 1996. (manuscript).
Google Scholar
Julinek, R.: Automatic Detection of Sentence Boundaries, Master thesis, Masaryk University, Brno, April 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
Marek Veber, Aleš Horák, Rostislav Julinek & Pavel Smrž

Authors

Marek Veber
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Horák
View author publications
You can also search for this author in PubMed Google Scholar
Rostislav Julinek
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Smrž
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineerig, Faculty of Applied Sciences, University of West Bohemia in Plzeň, Universitní 22, 306 14, Pizeň, Czech Republic
Václav Matousek , Pavel Mautner & Jana Ocelíková , &
Department of Programming Systems and Communication, Faculty of Informatics, Masaryk University Brno, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veber, M., Horák, A., Julinek, R., Smrž, P. (1999). Automatic Structuring of Written Texts. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_18

Download citation

DOI: https://doi.org/10.1007/3-540-48239-3_18
Published: 01 October 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics