Skip to main content

Automatic Structuring of Written Texts

  • Conference paper
  • First Online:
Book cover Text, Speech and Dialogue (TSD 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

  • 471 Accesses

Abstract

This paper deals with automatic structuring and sentence boundary labelling in natural language texts. We describe the implemented structure tagging algorithm and heuristic rules that are used for automatic or semiautomatic labelling. Inside the detected sentence the algorithm performs a decomposition to clauses and then marks the parts of text which do not form a sentence, i.e. headings, signatures, tables and other structured data. We also pay attention to the processing of matched symbols in the text, especially to the analysis of direct speech notation.

The research is sponsored by the Czech Ministry of Education under the grant VS 97028.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In the 3rd Conference on Applied Natural Language Processing, Trento, Italy 1991.

    Google Scholar 

  2. Riley, M., D.: Some applications of tree-based modeling to speech and language indexing. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 339–352, Morgan Kaufmann 1989.

    Google Scholar 

  3. Palmer, D., D., Hearst, M., A.: Adaptive Sentence Boundary Disambiguation. In The Proceedings of the ANLP’ 1994, Stuttgart, Germany, October 1994.

    Google Scholar 

  4. Pala, K., Rychlý, P., Smrž, P.: DESAM — Approaches to Disambiguation. Technical Report FIMU-RS-97-09, Faculty of Informatics, Masaryk University, Brno, 1997.

    Google Scholar 

  5. Pala, K., Rychlý, P., Smrž, P.: DESAM — Annotated Corpus for Czech. In Proceedings of SOFSEM’97.

    Google Scholar 

  6. Ševeček, P.: LEMMA morphological analyzer and lemmatizer for Czech, program in “C”, Brno, 1996. (manuscript).

    Google Scholar 

  7. Julinek, R.: Automatic Detection of Sentence Boundaries, Master thesis, Masaryk University, Brno, April 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Veber, M., Horák, A., Julinek, R., Smrž, P. (1999). Automatic Structuring of Written Texts. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_18

Download citation

  • DOI: https://doi.org/10.1007/3-540-48239-3_18

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66494-9

  • Online ISBN: 978-3-540-48239-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics