Abstract
Industrial localisation is changing from the periodic translation of large bodies of content to a long-tail of small, heterogeneous translations processed in an agile and demand-driven manner. Software localisation and crowd-source translation already practice continuous fine-grained distribution of translation work. This requires close integration and round-trip interoperability between content creation and localisation processes, while at the same time recording the provenance of translated content to maximise it reuse in future translation tasks, and, increasingly, in training Statistical Machine Translation (SMT) engines. This work adopts a Linked Data approach to integrating the content translation round-trip process with the logging of process quality assurance provenance. This integration supports a pull-based interoperability model that supports continuous synchronising of content and process meta-data between the generating organisation and any number of language service providers or translators. We present a platform architecture for sharing, searching and interlinking of Linked Localisation and Language Data (termed L3Data) on the web. This is accomplished using a semantic schema for L3Data that is compatible with existing localisation data exchange standards and can be used to support the round-trip sharing of language resources. The paper describes our approach to development of L3Data schema and data management processes, web-based tools and data sharing infrastructure that use it. An initial proof of concept prototype is presented which implements a web application that segments and machine translates content for crowd-sourced post-editing and rating.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allee V (2002) The Future of Knowledge: Increasing Prosperity through Value Networks. Butterworth-Heinemann
Allemang D (2010) Semantic web and the linked data enterprise. In: Woods D (ed) Linking enterprise data, Springer, pp 3–23
Bizer C, Heath T, Berners-Lee T (2009) Linked data - the story so far. International Journal on Semantic Web and Information Systems 5:1–22
Buitelaar P, Cimiano P, Haase P, Sintek M (2009) Towards linguistically grounded ontologies. In: Proceedings of the 6th European Semantic Web Conference (ESWC 2009), Heraklion, Greece, LNCS, vol 5554, pp 111–125
Cruz-Lara S, Gupta S, García J, Romary L (2005) Multilingual information framework for handling textual data in digital media. In: Proceedings of the 3rd International Conference on Active Media Technology (AMT 2005), Kagawa, Japan, pp 81–84
van Genabith J (2009) Next generation localisation. Localisation Focus: The International Journal of Localisation 8:4–10
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) MOSES: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007). Companion Volume Proceedings of the Demo and Poster Sessions, Prague, Czech Republic, pp 177–180
Lewis D, Curran S, Jones D, Moran J, Feeney K (2010) An open service framework for next generation localisation. In: LREC 2010 Workshop on Web Services and Processing Pipelines in HLT: Tool Evaluation, LR Production and Validation, Valetta, Malta, pp 52–59
Localisation Industry Standards Association (2005) TMX 1.4b Specification OSCAR Recommendation. http://www.lisa.org/fileadmin/standards/tmx1.4/tmx.htm, retrieved on 25 Feb 2010
Localization Industry Standards Association (2008) Systems to manage terminology, knowledge, and content – TermBase eXchange (TBX). http://www.lisa.org/TBX-Specification.33.0.html, retrieved on 25 Feb 2010
Marcus A (2006) A demand-based view of support: From the funnel to the cloud. Tech. rep., Service Innovation Consortium, San Carlos, CA, retrieved 18/8/11
Moreau L, Freire J, Futrelle J, McGrath R, Myers J, Paulson P (2008) The open provenance model: An overview. In: Freire J, Koop D, Moreau L (eds) Provenance and Annotation of Data and Processes, LNCS, vol 5272, Springer Berlin / Heidelberg, pp 323–326
Windhouwer M, Wright SE (this vol.) Linking to linguistic data categories in ISOcat. pp 99–107
XLIFF, OASIS (2007) Xliff 1.2. a white paper on version 1.2 of the xml localisation interchange file format (xliff). http://xml.coverpages.org/XLIFF-Core-WhitePaper200710-CSv12.pdf, revision: 1.0, 17 Oct, retrieved on 25 Feb 2010
Zydroń A (2011) Reference model for open architecture for XML authoring and localization 1.0 OASIS committee specification. http://www.oasis-open.org/committees/oaxal/, retrieved 18/8/11
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lewis, D. et al. (2012). Linking Localisation and Language Resources. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds) Linked Data in Linguistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28249-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-28249-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28248-5
Online ISBN: 978-3-642-28249-2
eBook Packages: Computer ScienceComputer Science (R0)