Skip to main content

Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications

  • Conference paper
Intercultural Collaboration (IWIC 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4568))

Included in the following conference series:

  • 1369 Accesses

Abstract

Parallel corpora encode extremely valuable linguistic knowledge, the revealing of which is facilitated by the recent advances in multilingual corpus linguistics. The linguistic decisions made by the human translators in order to faithfully convey the meaning of the source text can be traced and used as evidence on linguistic facts which, in a monolingual context, might be unavailable to (or overlooked by) a computer program. Multilingual technologies, which to a large extent are language independent, provide a powerful support for systematic and consistent cross-lingual studies and allow for easier building of annotated linguistic resources for languages where such resources are scarce or missing. In this paper we will briefly present some underlying multilingual technologies and methodologies we developed for exploiting parallel corpora and we will discuss their relevance for cross-linguistic studies and applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barbu-Mititelu, V., Ion, R.: Cross-language Transfer of Syntactic Relations Using Parallel Corpora. In: Proceedings of the Workshop on Cross-Language Knowledge Induction, EUROLAN 2005, Cluj-Napoca,Romania, pp. 46–51 (2005)

    Google Scholar 

  2. Brants, T.: TnT a statistical part-of-speech tagger. In: Proceedings of the 6th ANLP Conference, Seattle, WA, pp. 224–231 (2000)

    Google Scholar 

  3. Bertagna, F., Monachini, M., Soria, C., Calzolari, N., Huang, C.-R., Hsieh, S.-K., Marchetti, A., Tesconi, M.: Fostering Intercultural Collaboration: a Web Service Architecture for Cross-Fertilization of Distributed Wordnets. In [17] (2007)

    Google Scholar 

  4. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)

    Google Scholar 

  5. Ceauşu, A.: Maximum Entropy Tiered Tagging. In: Proceedings of the Eleventh ESSLLI Student Session, Malaga, Spain, pp. 173–179 (2006)

    Google Scholar 

  6. Ceauşu, A., Ştefănescu, D., Tufiş, D.: Acquis Communautaire sentence alignment using Support Vector Machines. In: Proceedings of the 5th LREC Conference, Genoa, Italy, pp. 2134–2137 (2006)

    Google Scholar 

  7. Erjavec, T.: MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Proceedings of the 4th LREC Conference, Lisbon, Portugal, pp. 1535–1538 (2004)

    Google Scholar 

  8. Fan, R.-E., Chen, P.-H., Lin, C.-J.: Working set selection using the second order information for training SVM. Technical report, Department of Computer Science, National Taiwan University (2005), www.csie.ntu.edu.tw/~cjlin/papers/quadworkset.pdf

  9. Fellbaum, C., Vossen, P.: Connecting the Universal to the Specic: Towards the Global Grid. In [17] (2007)

    Google Scholar 

  10. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  11. Gale, W.A., Church, K.W.: A Program for Aligning Sentences in Bilingual Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, USA, pp. 177–184 (1991)

    Google Scholar 

  12. Hashimoto, C., Bond, F., Flickinger, D.: The Lextype DB: A Web-based Framework for Supporting Collaborative Multilingual Grammar and Treebank Development. In [17] (2007)

    Google Scholar 

  13. Hayashi, Y.: Conceptual Framework of an Upper Ontology for Describing Linguistic Services. In [17] (2007)

    Google Scholar 

  14. Inaba, R., Murakami, Y., Nadamoto, A., Ishida, T.: Multilingual Communication Support Using the Language Grid. In [17] (2007)

    Google Scholar 

  15. Ion, R.: Metode de dezambiguizare semantică automată. Aplicaţii pentru limbile engleză şi română, PhD Thesis, Romanian Academy, Bucharest, Romania, pp. 145 (2006)

    Google Scholar 

  16. Ion, R., Tufiş, D.: Multilingual Word Sense Disambiguation Using Aligned Wordnets. Romanian Journal on Information Science and Technology, Tufiş D. (ed.) Special Issue on BalkaNet, Romanian Academy, 7(2-3), 198-214 (2004)

    Google Scholar 

  17. Ishida, T., Fussell, S.R., Vossen, P.T.J.M. (eds.): IWIC 2007. LNCS, vol. 4568. Springer, Heidelberg (2007)

    Google Scholar 

  18. Koda, T.: Cross-cultural Study of Avatars’ Facial Expressions and Design Considerations within Asian Countries. In [17] (2007)

    Google Scholar 

  19. Magnini, B., Cavaglià, G.: Integrating Subject Field Codes into WordNet. In: Proceedings of LREC 2000, Athens, Greece, pp. 1413–1418 (2000)

    Google Scholar 

  20. Martin, J., Mihalcea, R., Pedersen, T.: Word Alignment for Languages with Scarce Resources. In: Proceeding of the ACL 2005 Workshop on Building and Using Parallel Corpora: Data-driven Machine Translation and Beyond. Association for Computational Linguistics, Ann Arbor, Michigan, pp. 65–74 (2005)

    Google Scholar 

  21. Moore, R.C.: Fast and Accurate Sentence Alignment of Bilingual Corpora. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 135–244. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  22. Niles, I., Pease, A.: Towards a Standard Upper Ontology. In: Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Ogunquit, Maine, pp. 17–19 (2001)

    Google Scholar 

  23. Och, F., Ney, J.: Improved Statistical Alignment Models. In: Proceedings of ACL 2000, Hong Kong, China, pp. 440–447 (2000)

    Google Scholar 

  24. Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)

    Article  Google Scholar 

  25. Schneider, P., Mattenklott, A.: Emotion Eliciting Events in the Workplace: An Intercultural Comparison. In [17] (2007)

    Google Scholar 

  26. Smadja, F.A., McKeown, K.R.: Automatically extracting and representing collocations for language generation. In: Proceedings of the 28th annual meeting on Association for Computational Linguistics, Pittsburgh, Pennsylvania, pp. 252–259 (1990)

    Google Scholar 

  27. Sornlertlamvanich, V., Charoenporn, T., Robkop, K., Isahara, H.: Collaborative Platform for Multilingual Resource Development and Intercultural Communication. In [17] (2007)

    Google Scholar 

  28. Tapanainen, P., Järvinen, T.: A dependency parser for English. Technical Report no. TR-1, Department of General Linguistics, University of Helsinki, Finland (1997)

    Google Scholar 

  29. Todiraşcu, A.: Towards an automatic extraction of collocations; the case of the verb MAKE/DO (în Romanian). In: Proceedings of the National Workshop on Romanian Language Processing, Iaşi, pp. 95–101 (November 3-4, 2006)

    Google Scholar 

  30. Tufiş, D., Barbu, A., Ion, R.: Extracting Multilingual Lexicons from Parallel Corpora. Computers and the Humanities 38(2), 163–189 (2004)

    Article  Google Scholar 

  31. Tufiş, D., Ion, R., Ceauşu, A., Ştefănescu, D.: Improved Lexical Alignment by Combining Multiple Reified Alignments. In: Proceedings of the 11th Conference of the European Association for Computational Linguistics (EACL), Trento, Italy, pp. 153–160 (2006)

    Google Scholar 

  32. Tufiş, D., Ion, R., Ceauşu, A., Stefănescu, D.: Combined Aligners. In: Proceeding of the ACL 2005 Workshop on Building and Using Parallel Corpora: Data-driven Machine Translation and Beyond. Association for Computational Linguistics, Ann Arbor, Michigan, pp. 107–110 (2005)

    Google Scholar 

  33. Tufiş, D., Ion, R., Ide, N.: Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets. In: Proceedings of the 20th COLING Conference, Geneva, pp. 1312–1318 (2004)

    Google Scholar 

  34. Tufiş, D.: Tiered Tagging and Combined Classifiers. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 28–33. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  35. Tufiş, D., Cristea, D., Stamou, S.: BalkaNet: Aims, Methods, Results and Perspectives. A General Overview. Romanian Journal on Information Science and Technology, Tufiş, D. (ed.) Special Issue on BalkaNet, Romanian Academy, 7(2-3), 9-34 (2004)

    Google Scholar 

  36. Tufiş, D., Barbu-Mititelu, V., Bozianu, L., Mihăilă, C.: Romanian WordNet: New Developments and Applications. In: Proceedings of the 3rd Conference of the Global WordNet Association, Jeju, Republic of Korea, pp. 337–344 (2006)

    Google Scholar 

  37. Vossen, P. (ed.): A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht (1998)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Toru Ishida Susan R. Fussell Piek T. J. M. Vossen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tufiş, D. (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In: Ishida, T., Fussell, S.R., Vossen, P.T.J.M. (eds) Intercultural Collaboration. IWIC 2007. Lecture Notes in Computer Science, vol 4568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74000-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74000-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73999-9

  • Online ISBN: 978-3-540-74000-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics