Abstract
The goal of this paper is to demonstrate that usual evaluation methods for text segmentation are not adapted for every task linked to text segmentation. To do so we differentiated the task of finding text boundaries in a corpus of concatenated texts from the task of finding transitions between topics inside the same text. We worked on a corpus of twenty two French political discourses trying to find boundaries between them when they are concatenated, and to find topic boundaries inside them when they are not. We compared the results of our distance based method to the well known c99 algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AzĆ©, J., Heitz, T., Mela, A., Mezaour, A., Peinl, P., Roche, M.: PrĆ©sentation de deft 2006 (defi fouille de textes). In: Proceedings of DEFT 2006, vol.Ā 1, pp. 3ā12 (2006)
Bestgen, Y., PiĆ©rard, S.: Comment Ć©valuer les algorithmes de segmentation automatiques? essai de construction dāun matriel de rĆ©fĆ©rence. In: Proceedings of TALN 2006 (2006)
ChauchĆ©, J.: Un outil multidimensionnel de lāanalyse du discours. In: Proceedings of Coling 1984, vol.Ā 1, pp. 11ā15 (1984)
ChauchƩ, J., Prince, V.: Classifying texts through natural language parsing and semantic filtering. In: Proceedings of LTC 2003 (2007)
ChauchĆ©, J., Prince, V., Jaillet, S., Teisseire, M.: Classification automatique de textes partir de leur analyse syntaxico-sĆ©mantique. In: Proceedings of TALN 2003, 55ā65 (2003)
Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of NAACL-2000, 26ā33 (2000)
Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.: Latent semantic analysis for text segmentation. In: Proceedings of EMNLP, pp. 109ā117 (2001)
Hearst, M.A.: Text-tilling: segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 59ā66 (1997)
Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: Proceedings of the ACM SIGIR-1993 International Conference On Research and Development in Information Retrieval, 59ā68 (1993)
Helfman, J.: Similarity patterns in language. Visual Languages, 173ā175 (1994)
Ji, X., Zha, H.: Domain-independant segmentation using anisotropic diffusion and dynamic programming. In: Proceedings of ACM/SIGIR Conference of Research and Developpement in Information Retrieval (2003)
Kan, M., Klavans, J.L., McKeown, K.R.: Linear segmentation and segment significance. In: Proceedings of WVLC-6, pp. 197ā205 (1998)
Karatzas, D.: Text Segmentation in Web Images Using Color Perception and Topological Features. ECS Publications, UK (2003)
Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: Proceedings of theTwentieth International Conference on Research and Development in Information Access (ACMSIGIR), pp. 178ā185 (1997)
LabadiĆ©, A.: ChauchĆ©: Segmentation thĆ©matique par calcul de distance sĆ©mantique. In: Proceedings of DEFT 2006, vol.Ā 1, pp. 45ā59 (2006)
Larousse: ThƩsaurus Larousse - des idƩes aux mots, des mots aux idƩes. Larousse, Paris (1992)
Lelu, A., C.M., Aubain, S.: CoopĆ©ration multiniveau dāapproches non-supervises et supervises pour la detection des ruptures thĆ©matiques dans les discours prĆ©sidentiels franais. In: Proceedings of DEFT 2006(2006)
Llopis, F., Ferrandez, A., Vicedo, J.L.: G. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol.Ā 2276, pp. 373ā380. Springer, Heidelberg (2002)
McCoy, K., Cheng, J.: Focus of attention: Constraining what can be said next. In: Paris, C., Swartout, W., Mann, W. (eds.) Natural Language Generation in Artificial Intelligence and Computational Linguistics (1991)
Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational LinguisticsĀ 17, 20ā48 (1991)
Ponte, J.M., Croft, W.B.: Text segmentation by topic. In: European Conference on Digital Libraries, pp. 113ā125 (1997)
Prince, V., LabadiĆ©, A.: Text segmentation based on document understanding for information retrieval. In: Kedad, Z., Lammari, N., MĆ©tais, E., Meziane, F., Rezgui, Y. (eds.) NLDB 2007. LNCS, vol.Ā 4592, pp. 295ā304. Springer, Heidelberg (2007)
Reynar, J.C.: Topic Segmentation: Algorithms and Applications. Phd thesis, University of Pennsylvania (1998)
Roget, P.: Thesaurus of English Words and Phrases. Longman, London (1852)
Sitbon, L., Bellot, P.: Evaluation de mĆ©thodes de segmentation thĆ©matique linĆ©aire non supervisĆ©s aprĆØs adaptation au franais. In: Proceedings of TALN 2004 (2004)
Wu, Z., Tseng, G.: Chinese text segmentation for text retrieval: Achievements and problems. Journal of the American Society for Information ScienceĀ 44, 532ā542 (1993)
Yang, C.C., Li, K.W.: A heuristic method based on a statistical approach for chinese text segmentation. Journal of the American Society for Information Science and TechnologyĀ 56, 1438ā1447 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
LabadiƩ, A., Prince, V. (2008). Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?. In: Nordstrƶm, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)