Skip to main content

Novelty Extraction from Special and Parallel Corpora

  • Conference paper
Human Language Technology. Challenges of the Information Society (LTC 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5603))

Included in the following conference series:

Abstract

How can corpora assist translators in ways in which resources like translation memories or term databases cannot? Our tests on English, Polish and Swedish parts of the JRC-Acquis Multilingual Parallel show that corpora can provide support for term standardization and variation, and, most importantly, for tracing novel expressions. A corpus tool with an explicit dictionary representation is particularly suitable for the last task. Culler is a tool which allows one to select expressions with words absent from its dictionary. Even if the extracted material may be stained with some noise, it has an undeniable value for translators and lexicographers. The quality of extraction depends in a rather obvious way on the dictionary and text processing but also on the query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Saint Robert de, M.-J.: Language Resources at the Languages Service of the United Nations Office at Geneva. In: Proceedings of LREC 2002 Workshop in Language Resources (LR) for Translation Work, Research and Training (2002)

    Google Scholar 

  2. Maia, B.: Corpora for terminology extraction - the differing perspectives and objectives of researchers, teachers and language service providers. In: Proceedings of LREC 2002 Workshop in Language Resources (LR) for Translation Work, Research and Training (2002)

    Google Scholar 

  3. Dura, E.: Concordances of Snippets. In: Coling Workshop on Using and Enhancing Electronic Dictionaries, Geneva (2004)

    Google Scholar 

  4. Dura, E.: Culler - a User Friendly Corpus Query System. In: Proceedings of the Fourth International Workshop on Dictionary Writing Systems. Euralex (2006)

    Google Scholar 

  5. Culler, http://www.nla.se/culler/ , http://bergelmir.iki.his.se/culler/

  6. Materials of the Workshop in Language Resources (LR) for Translation Work, Research and Training (LREC 2002), http://www.ifi.unizh.ch/cl/yuste/postworkshop

  7. Proceedings of the Fourth International Workshop on Dictionary Writing Systems (Euralex (2006), http://tshwanedje.com/publications/dws2006.pdf6

  8. Gawronska, B., Erlendsson, B., Olsson, B.: Tracking Biological Relations in Texts: a Referent Grammar Based Approach. In: Proceedings of the workshop Biomedical Ontologies and Text Processing, 4th European Conference on Computational Biology (ECCB 2005), Madrid, Spain, pp. 15–22 (2005)

    Google Scholar 

  9. Gawronska, B., Erlendsson, B.: Syntactic, Semantic and Referential Patterns in Biomedical Texts: towards in-depth text comprehension for the purpose of bioinformatics. In: Sharp, B. (ed.) Natural Language Understanding and Cognitive Science. Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science NLUCS 2005, Miami, USA, pp. 68–77 (2005)

    Google Scholar 

  10. Fillmore, C.: Multiword Expressions: An Extremist Approach. A lecture delivered at the conference Collocations and idioms: linguistic, computational, and psycholinguistic perspectives, Berlin (Magnus-Haus) September 18-20 (2003), http://www.bbaw.de/forschung/kollokationen/documents/coll_fillmore_mwe.pdf

  11. Dura, E., Erlendsson, B., Gawronska, B., Olsson, B.: Towards Information Fusion in Pathway Evaluation: Encoding Relations in Biomedical Texts. In: Proceedings of the 9th International Conference on Information Fusion, Florence, Italy, pp. 240–247 (2006)

    Google Scholar 

  12. Kübler, N.: Corpora and LSP Translation. In: Zanettin, F., Bernardini, S., Stewart, D. (eds.) Corpora in Tranlator Education, pp. 25–42. St. Jerome Publishing, Manchester (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dura, E., Gawronska, B. (2009). Novelty Extraction from Special and Parallel Corpora. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04235-5_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04234-8

  • Online ISBN: 978-3-642-04235-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics