Skip to main content

Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?

  • Conference paper
Advances in Natural Language Processing (GoTAL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

Abstract

The goal of this paper is to demonstrate that usual evaluation methods for text segmentation are not adapted for every task linked to text segmentation. To do so we differentiated the task of finding text boundaries in a corpus of concatenated texts from the task of finding transitions between topics inside the same text. We worked on a corpus of twenty two French political discourses trying to find boundaries between them when they are concatenated, and to find topic boundaries inside them when they are not. We compared the results of our distance based method to the well known c99 algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AzĆ©, J., Heitz, T., Mela, A., Mezaour, A., Peinl, P., Roche, M.: PrĆ©sentation de deft 2006 (defi fouille de textes). In: Proceedings of DEFT 2006, vol.Ā 1, pp. 3ā€“12 (2006)

    Google ScholarĀ 

  2. Bestgen, Y., PiĆ©rard, S.: Comment Ć©valuer les algorithmes de segmentation automatiques? essai de construction dā€™un matriel de rĆ©fĆ©rence. In: Proceedings of TALN 2006 (2006)

    Google ScholarĀ 

  3. ChauchĆ©, J.: Un outil multidimensionnel de lā€™analyse du discours. In: Proceedings of Coling 1984, vol.Ā 1, pp. 11ā€“15 (1984)

    Google ScholarĀ 

  4. ChauchƩ, J., Prince, V.: Classifying texts through natural language parsing and semantic filtering. In: Proceedings of LTC 2003 (2007)

    Google ScholarĀ 

  5. ChauchĆ©, J., Prince, V., Jaillet, S., Teisseire, M.: Classification automatique de textes partir de leur analyse syntaxico-sĆ©mantique. In: Proceedings of TALN 2003, 55ā€“65 (2003)

    Google ScholarĀ 

  6. Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of NAACL-2000, 26ā€“33 (2000)

    Google ScholarĀ 

  7. Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.: Latent semantic analysis for text segmentation. In: Proceedings of EMNLP, pp. 109ā€“117 (2001)

    Google ScholarĀ 

  8. Hearst, M.A.: Text-tilling: segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 59ā€“66 (1997)

    Google ScholarĀ 

  9. Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: Proceedings of the ACM SIGIR-1993 International Conference On Research and Development in Information Retrieval, 59ā€“68 (1993)

    Google ScholarĀ 

  10. Helfman, J.: Similarity patterns in language. Visual Languages, 173ā€“175 (1994)

    Google ScholarĀ 

  11. Ji, X., Zha, H.: Domain-independant segmentation using anisotropic diffusion and dynamic programming. In: Proceedings of ACM/SIGIR Conference of Research and Developpement in Information Retrieval (2003)

    Google ScholarĀ 

  12. Kan, M., Klavans, J.L., McKeown, K.R.: Linear segmentation and segment significance. In: Proceedings of WVLC-6, pp. 197ā€“205 (1998)

    Google ScholarĀ 

  13. Karatzas, D.: Text Segmentation in Web Images Using Color Perception and Topological Features. ECS Publications, UK (2003)

    Google ScholarĀ 

  14. Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: Proceedings of theTwentieth International Conference on Research and Development in Information Access (ACMSIGIR), pp. 178ā€“185 (1997)

    Google ScholarĀ 

  15. LabadiĆ©, A.: ChauchĆ©: Segmentation thĆ©matique par calcul de distance sĆ©mantique. In: Proceedings of DEFT 2006, vol.Ā 1, pp. 45ā€“59 (2006)

    Google ScholarĀ 

  16. Larousse: ThƩsaurus Larousse - des idƩes aux mots, des mots aux idƩes. Larousse, Paris (1992)

    Google ScholarĀ 

  17. Lelu, A., C.M., Aubain, S.: CoopĆ©ration multiniveau dā€™approches non-supervises et supervises pour la detection des ruptures thĆ©matiques dans les discours prĆ©sidentiels franais. In: Proceedings of DEFT 2006(2006)

    Google ScholarĀ 

  18. Llopis, F., Ferrandez, A., Vicedo, J.L.: G. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol.Ā 2276, pp. 373ā€“380. Springer, Heidelberg (2002)

    ChapterĀ  Google ScholarĀ 

  19. McCoy, K., Cheng, J.: Focus of attention: Constraining what can be said next. In: Paris, C., Swartout, W., Mann, W. (eds.) Natural Language Generation in Artificial Intelligence and Computational Linguistics (1991)

    Google ScholarĀ 

  20. Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational LinguisticsĀ 17, 20ā€“48 (1991)

    Google ScholarĀ 

  21. Ponte, J.M., Croft, W.B.: Text segmentation by topic. In: European Conference on Digital Libraries, pp. 113ā€“125 (1997)

    Google ScholarĀ 

  22. Prince, V., LabadiĆ©, A.: Text segmentation based on document understanding for information retrieval. In: Kedad, Z., Lammari, N., MĆ©tais, E., Meziane, F., Rezgui, Y. (eds.) NLDB 2007. LNCS, vol.Ā 4592, pp. 295ā€“304. Springer, Heidelberg (2007)

    ChapterĀ  Google ScholarĀ 

  23. Reynar, J.C.: Topic Segmentation: Algorithms and Applications. Phd thesis, University of Pennsylvania (1998)

    Google ScholarĀ 

  24. Roget, P.: Thesaurus of English Words and Phrases. Longman, London (1852)

    Google ScholarĀ 

  25. Sitbon, L., Bellot, P.: Evaluation de mĆ©thodes de segmentation thĆ©matique linĆ©aire non supervisĆ©s aprĆØs adaptation au franais. In: Proceedings of TALN 2004 (2004)

    Google ScholarĀ 

  26. Wu, Z., Tseng, G.: Chinese text segmentation for text retrieval: Achievements and problems. Journal of the American Society for Information ScienceĀ 44, 532ā€“542 (1993)

    ArticleĀ  Google ScholarĀ 

  27. Yang, C.C., Li, K.W.: A heuristic method based on a statistical approach for chinese text segmentation. Journal of the American Society for Information Science and TechnologyĀ 56, 1438ā€“1447 (2005)

    ArticleĀ  Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

LabadiƩ, A., Prince, V. (2008). Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?. In: Nordstrƶm, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85287-2_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85286-5

  • Online ISBN: 978-3-540-85287-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics