Skip to main content

Text Segmentation into Paragraphs Based on Local Text Cohesion

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Abstract

The problem of automatic text segmentation is subcategorized into two different problems: thematic segmentation into rather large topically self-contained sections and splitting into paragraphs, i.e., lexico-grammatical segmentation of lower level. In this paper we consider the latter problem. We propose a method of reasonably splitting text into paragraph based on a text cohesion measure. Specifically, we propose a method of quantitative evaluation of text cohesion based on a large linguistic resource - a collocation network. At each step, our algorithm compares word occurrences in a text against a large DB of collocations and semantic links between words in the given natural language. The procedure consists in evaluation of the cohesion function, its smoothing, normalization, and comparing with a specially constructed threshold.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bolshakov, I. A. Multifunctional thesaurus for computerized preparation of Russian texts. Automatic Documentation and Mathematical Linguistics. Allerton Press Inc. Vol. 28, No. 1, 1994, p. 13–28.

    Google Scholar 

  2. Bolshakov, I. A. Multifunction thesaurus for Russian word processing. Proc. of 4th Conf. on Applied Natural Language Processing, Stuttgart, 13–15 October, 1994, p. 200–202.

    Google Scholar 

  3. Fellbaum, Ch. (ed.) WordNet as Electronic Lexical Database. MIT Press, 1998.

    Google Scholar 

  4. Ferret, O. How to Thematically Segment Texts by Using Lexical Cohesion? Proc. of Co-ling-ACL-98, v. 2, 1998, p. 1481–1483.

    Google Scholar 

  5. Ferret, O., B. Grau, N. Masson. Thematic segmentation of texts: two methods for two kinds of texts. Proc. of Coling-ACL-1998, v. 1, p. 392–396.

    Google Scholar 

  6. Jobbins, A. C., L. J. Evett. Text segmentation using reiteration and collocation. In: Proc. of Coling-ACL-1998, v. 1, p. 614–618.

    Google Scholar 

  7. Hearst, A. M. Multi-paragraph segmentation of expository text. Proc. ACL-94. Las Cruces, N. M., USA, 1994, p. 9–16.

    Google Scholar 

  8. Hearst, A. M., C. Plaunt. Subtopic Structuring for Full-Length Document Access. Proc. ACM-SIGIR’93, 1993, p. 59–68.

    Google Scholar 

  9. Heinonen, O. Optimal multiparagraph text segmentation by Dynamic Programming. Proc. of Coling-ACL-98, v. 2, 1998, p. 1484–1486.

    Google Scholar 

  10. Litman, D., R.J. Passonneau. Combining Multiple Knowledge Sources For Discourse Segmentation. Proc. 31th Annual Meeting ACL Conference, 1993, Columbus, p. 108–115.

    Google Scholar 

  11. Kaufmann, S. Second Order Cohesion. Proc. PACLING’99 Conf., 1999, p. 209–222.

    Google Scholar 

  12. Kozima, H. Text segmentation based on similarity between words. Proc. of ACL-93, Columbus, Ohio, USA, 1993, p. 286–288.

    Google Scholar 

  13. Kurohashi, S., M. Nagao. Automatic Detection of Discourse Structure By Checking Sur-face Information in Sentences. Proc. Coling-94, Kyoto, 1994, p. 1123–1127.

    Google Scholar 

  14. Mel’cuk. I. Dependency Syntax: Theory and Practice. SUNY Press, NY. 1988.

    Google Scholar 

  15. Nomoto, T., Y. Nitta. A Grammatico-Statistical Approach to Discourse Partitioning. Proc. Coling-94, Kyoto, 1994, p. 1145–1150.

    Google Scholar 

  16. Oppenheim, A.V., R.V. Shafer. Discrete-Time Signal Processing. Prentice Hall. NJ, 1989.

    MATH  Google Scholar 

  17. Salton, G., A. Singhal, M. Mitra, C. Buckley. Automatic Text Structuring and Summarization. Information Processing & Management. V. 33(2), 1997, p. 193–207.

    Article  Google Scholar 

  18. Smadja, F. Retreiving Collocations from text: Xtract. Computational Linguistics. Vol. 19, No. 1, 1993, p. 143–177.

    Google Scholar 

  19. Suzuki, Y. et al. Segmentation and Event Detection of New Stories Using Term Weighting. Proc. PACLING’ 99 Conf., 1999, p. 149–154.

    Google Scholar 

  20. Vossen, Piek (ed.). EuroWordNet General Document. Vers. 3 final. 2000, http://www.hum.uva.nl/ewn.

  21. Zadrozny, W., K. Jensen. Semantic of Paragraphs. Computational Linguistics. V. 17(2), 1991, p. 171–209.

    Google Scholar 

  22. Zobel, J. Writing for computer science. Springer. 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bolshakov, I.A., Gelbukh, A. (2001). Text Segmentation into Paragraphs Based on Local Text Cohesion. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_20

Download citation

  • DOI: https://doi.org/10.1007/3-540-44805-5_20

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42557-1

  • Online ISBN: 978-3-540-44805-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics