Comparative Discourse Analysis of Parallel Texts

  • P. Van Der Eijk
Part of the Text, Speech and Language Technology book series (TLTB, volume 11)


A quantitative representation of discourse structure can be computed by measuring lexical cohesion relations among adjacent text elements. These representations have previously been proposed to deal with sub-topic text segmentation. In a parallel corpus, similar representations can be derived for versions of a text in various languages. These can be used for parallel segmentation and as an alternative measure of text-translation similarity 1.


Word Form Dynamic Time Warping Language Version Discourse Structure Text Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Brown, P., Lai, J. and Mercer, R. 1991. Aligning sentences in parallel corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational. Linguistics, pp. 169–176.Google Scholar
  2. Digital Equipment Corporation. 1993. Digital Extended Math Library for DEC OSF/1 AXP, August 1993. Maynard, Massachusetts.Google Scholar
  3. Gale, W. and Church, K. 1993. A program for aligning sentences in bilingual corpora. Computational Linguistics, 19 (1): 75–102.Google Scholar
  4. Gale, W., Church, K. and Yarowsky, D. 1992. Using bilingual materials to develop word sense disambiguation methods. In Fourth International Conference on Theoretical and Methodological Issues in Machine Translation,pp. 101–112, Montréal.Google Scholar
  5. Hearst, M. 1993. TextTiling: a quantitative approach to discourse segmentation.Google Scholar
  6. Technical Report 93/24, Project Sequoia, University of California, Berkeley. Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17 (1): 21–48.Google Scholar
  7. O’Shaughnessy, D. 1987. Speech Communication. Addison-Wesley.Google Scholar
  8. Salton, G. and McGill, M. 1983. Introduction to Modern Structured Information Retrieval. McGraw-Hill.Google Scholar
  9. Alphen, P. 1992. HMM-based continuous-speech recognition. Ph.D. thesis, Universiteit van Amsterdam.Google Scholar
  10. Eijk, P. 1993. Automating the acquisition of bilingual terminology. In Sixth Conference of the European Chapter of the Association for Computational Linguistics, pp. 113–119.Google Scholar
  11. Yarowsky, D. 1992. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In Proceedings of the 14th International Conference on Computational Linguistics (COLING), pp. 454–460.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1999

Authors and Affiliations

  • P. Van Der Eijk

There are no affiliations available

Personalised recommendations