Comparative Discourse Analysis of Parallel Texts
A quantitative representation of discourse structure can be computed by measuring lexical cohesion relations among adjacent text elements. These representations have previously been proposed to deal with sub-topic text segmentation. In a parallel corpus, similar representations can be derived for versions of a text in various languages. These can be used for parallel segmentation and as an alternative measure of text-translation similarity 1.
KeywordsWord Form Dynamic Time Warping Language Version Discourse Structure Text Segmentation
Unable to display preview. Download preview PDF.
- Brown, P., Lai, J. and Mercer, R. 1991. Aligning sentences in parallel corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational. Linguistics, pp. 169–176.Google Scholar
- Digital Equipment Corporation. 1993. Digital Extended Math Library for DEC OSF/1 AXP, August 1993. Maynard, Massachusetts.Google Scholar
- Gale, W. and Church, K. 1993. A program for aligning sentences in bilingual corpora. Computational Linguistics, 19 (1): 75–102.Google Scholar
- Gale, W., Church, K. and Yarowsky, D. 1992. Using bilingual materials to develop word sense disambiguation methods. In Fourth International Conference on Theoretical and Methodological Issues in Machine Translation,pp. 101–112, Montréal.Google Scholar
- Hearst, M. 1993. TextTiling: a quantitative approach to discourse segmentation.Google Scholar
- Technical Report 93/24, Project Sequoia, University of California, Berkeley. Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17 (1): 21–48.Google Scholar
- O’Shaughnessy, D. 1987. Speech Communication. Addison-Wesley.Google Scholar
- Salton, G. and McGill, M. 1983. Introduction to Modern Structured Information Retrieval. McGraw-Hill.Google Scholar
- Alphen, P. 1992. HMM-based continuous-speech recognition. Ph.D. thesis, Universiteit van Amsterdam.Google Scholar
- Eijk, P. 1993. Automating the acquisition of bilingual terminology. In Sixth Conference of the European Chapter of the Association for Computational Linguistics, pp. 113–119.Google Scholar
- Yarowsky, D. 1992. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In Proceedings of the 14th International Conference on Computational Linguistics (COLING), pp. 454–460.Google Scholar