‘Garbage Let’s Take Away’: Producing Understandable and Translatable Government Documents: A Case Study from Japan
Government departments increasingly communicate information to citizens digitally via web sites, and, in many societies, the linguistic diversity of these citizens is also growing. In Japan, a largely monolingual society, municipal governments now routinely address the necessity of providing practical and legal information to residents with limited Japanese by machine-translating their public service web sites into selected languages. Cost constraints often mean the translation is left un-edited and, as a result, may be unclear, misleading or even incomprehensible. While machine translation from Japanese is particularly challenging because of its structural uniqueness, the state of the art in the field generally is such that poor output is a universal problem. The solution we propose draws on recent advances in controlled authoring, document structuring and machine translation evaluation. It is realised as a prototype tool that enables non-professional writers to create documents where individual sentences and overall flow are both clear. The tool is designed to enhance machine-translatability into English without compromising the readability of the Japanese original. The originality of the tool is to provide an interactive sentence checker that is context-sensitive to the individual functional elements of a document template specialised for the public administration domain. Where natural Japanese sentences give bad translation results, we pre-process them internally into a form which yields acceptable machine translation output. Evaluation of the tool will target three concerns: its usability by non-professional authors; the acceptability of the Japanese document; and the comprehensibility of the English translation. We suggest that such an authoring framework could facilitate government communication with citizens in many societies beyond Japan.
KeywordsGovernment communication Controlled language Document structure Authoring tool Machine translation DITA
This work was supported by the Research Grant Program of KDDI Foundation, Japan. The MT system J-SERVER Professional TransGateway V3 was offered by Kodensha Co. Paris’s stay in Japan to work with Miyata, Kageura and Hartley was funded by the Japanese Society for the Promotion of Science and CSIRO.
- 1.Adriaens, G., & Schreurs, D. (1992). From Cogram to Alcogram: Toward a controlled English grammar checker. In Proceedings COLING1992, Nantes, France.Google Scholar
- 2.AECMA (1995). A guide for the preparation of aircraft maintenance documents in the aerospace maintenance language AECMA Simplified English. AECMA Document, PSC-85-16598, Paris: AECMA.Google Scholar
- 3.Bellamy, L., Carey, M., & Schlotfeldt, J. (2012). DITA best practices: A roadmap for writing, editing, and architecting in DITA. Upper Saddle River, NJ: IBM Press.Google Scholar
- 5.Bertot, J., Jaeger, P., & Hansen, D. (2012). The impact of policies on government social media usage: Issues, challenges and recommendations. Government Information Quarterly, 29(2012):30–40. (Elsevier).Google Scholar
- 7.Bouayad-Agha, N., Power, R., & Belz, A. (2002). PILLS: Multilingual generation of medical information documents with overlapping content. In Proceedings LREC 2002, Las Palmas, Spain.Google Scholar
- 8.Brown, P., Della Pietra, S., Della Pietra, V., & Mercer, R. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.Google Scholar
- 10.Colineau, N., Paris, C., & Linden, K. V. (2002). An evaluation of procedural instructional text. In Proceedings International Natural Language Generation Conference, New York.Google Scholar
- 11.Colineau, N., Paris, C., & Linden, K. V. (2012). Government to citizen communications: From generic to tailored documents in public administration. Information Polity, 17(2), 177–193.Google Scholar
- 13.Day, D., Priestley, M., & Schell, D. (2005). Introduction to the Darwin Information Typing Architecture: Toward portable technical information. IBM Corporation. http://www.ibm.com/developerworks/xml/library/x-dita1/x-dita1-pdf.pdf. Accessed 18 Jan 2015.
- 14.DiMarco, C., Bray, P., Covvey, H. D., Cowan, D., DiCuccio, V., Hovy, E., & Yang, C. (2008). Authoring and generation of individualised patient education materials. Journal on Information Technology in Healthcare, 6(1), 63–71.Google Scholar
- 15.Hartley, A. (2010). Enabling multilingual applications of ‘controlled language’: The DITA framework. Asia-Pacific Association for Machine Translation Journal, 48, 15–18.Google Scholar
- 17.Hartley, A., Paris, C. (2001). Translation, controlled languages, generation. In E. Steiner, C. Yallop (Eds.), Exploring translation and multilingual text production: Beyond content (pp. 307–325), Berlin: De Gruyter Mouton.Google Scholar
- 18.Hartley, A., Tatsumi, M., Isahara, H., Kageura, K., & Miyata, R. (2012). Readability and translatability judgments for ‘Controlled Japanese.’ In Proceedings EAMT2012, Trento, IT.Google Scholar
- 20.Japan Technical Communicators Association (Ed.). (2011). 日本語スタイルガイド (Style guide for Japanese documents) (2nd ed.). Tokyo: JTCA Publication.Google Scholar
- 22.Kamprath, C., Adolphson, E., Mitamura, T., & Nyberg, E. (1998). Controlled language for multilingual document production: Experience with Caterpillar Technical English. In Proceedings CLAW1998, Pittsburgh, PA.Google Scholar
- 23.Kando, N. (1997). Text-level structure of research articles and its implication for text-based information processing systems. In Proceedings. 19th British Computer Society Annual Colloquium on Information Retrieval Research, Aberdeen, Scotland, UK.Google Scholar
- 24.Kittredge, R. (2003). Sublanguages and controlled languages. In R. Mitkov (Ed.), Oxford handbook of computational linguistics (pp. 430–437). Oxford: Oxford University Press.Google Scholar
- 25.Kruijff, G.-J., Teich, E., Bateman, J., Kruijff-Korbayova, I., Skoumalova, H., Sharoff, S., Sokolova, E., Hartley, T., Staykova, K., & Hana, J. (2000). Multilinguality in a text generation system for three Slavic languages. In Proceedings COLING2000, Saarbruecken, Germany.Google Scholar
- 27.Ministry of Internal Affairs and Communications. (2014). 地域におけるICT利活用の現状等に関する調査研究 報告書 (Report of survey on utilisation of ICT in the regions). http://www.soumu.go.jp/johotsusintokei/linkdata/h26_07_houkoku.pdf. Accessed 24 May 2015.
- 28.Mitamura, T., & Nyberg, E. (2001). Automatic rewriting for controlled language translation. In Proceedings NLPRS2001 Workshop on Automatic Paraphrasing: Theory and Application, Tokyo, Japan.Google Scholar
- 29.Mitamura, T., Baker, K., Nyberg, E., & Svoboda, D. (2003). Diagnostics for interactive controlled language checking. In Proceedings EAMT2003 Workshop on Controlled Language Applications, Dublin.Google Scholar
- 30.Nagao, M. (1984). A framework of a mechanical translation between Japanese and English by analogy principle. In A. Elithorn & R. Banerji (Eds.), Artificial and human intelligence. New York: Elsevier North-Holland Inc.Google Scholar
- 31.Nagao, M., Tanaka, N., & Tsujii, J. (1984). 制限文法にもとづく文章作成援助システム (Support system for writing texts based on controlled grammar). Information Processing Society of Japan, NL-44, 33–40.Google Scholar
- 33.Nyberg, E., & Mitamura, T. (2000). The KANTOO machine translation environment. In Proceedings AMTA2000, Cuernavaca, Mexico.Google Scholar
- 34.Nyberg, E., Mitamura, T., & Huijsen, W. (2003). Controlled language for authoring and translation. In H. Somers (Ed.), Computers and the translator. Amsterdam: Benjamins.Google Scholar
- 35.OASIS. (2010). Darwin Information Typing Architecture (DITA) Version 1.2. http://docs.oasis-open.org/dita/v1.2/os/spec/DITA1.2-spec.html. Accessed 31 May 2015.
- 36.O’Brien, S. (2003). Controlling controlled English: An analysis of several controlled language rule sets. In Proceedings EAMT2003 Workshop on Controlled Language Applications, Dublin.Google Scholar
- 38.Ogura, E., Kudo, M., & Yanagi, H. (2010). シンプリファイド・テクニカル・ジャパニーズ英訳を視野に入れて日本語を作る (Simplified Technical Japanese: Writing translation-ready Japanese documents). Information Processing Society of Japan, DD-78(5), 1–8.Google Scholar
- 39.Paris, C., Linden, K. V., Colineau, N., & Lu, S. (2005). Automatically generating effective on-line help. International Journal on E-Learning, 4(1), 83–103.Google Scholar
- 41.Paris, C., Thomas, P., & Wan, S. (2012). Differences in language and style between two social media communities. In Proceedings ICWSM2012, Dublin.Google Scholar
- 42.PLAIN (Plain Language and Information Network). (2011). Federal Plain Language Guidelines. http://www.plainlanguage.gov. Accessed 31 May 2015.
- 43.Power, R., Scott, D., & Hartley, A. (2003). Multilingual generation of controlled languages. In Proceedings EAMT2003 Workshop on Controlled Language Applications, Dublin.Google Scholar
- 44.Pym, P. (1990). Pre-editing and the use of simplified writing for MT. In P. Mayorcas (Ed.), Translating and the computer 10 (pp. 80–95). London: Aslib.Google Scholar
- 45.Roturier, J. (2009). Controlled language for MT in action. In Proceedings Translingual Europe, Prague.Google Scholar
- 46.Sato, S., & Nagao, M. (1990). Toward memory-based translation. In Proceedings COLING1990, Stroudsburg, PA.Google Scholar
- 47.Sato, S., Tsuchiya, M., Murayama, M., Asaoka, M., & Wang, Q. (2003). 日本語文の規格化 (Standardization of Japanese sentences). Information Processing Society of Japan, NL-4, 133–140.Google Scholar
- 48.Shirai, S., Ikehara, S., Yokoo, A., & Ooyama, Y. (1998). Automatic rewriting method for internal expressions in Japanese to English MT and its effects. In Proceedings CLAW1998, Pittsburgh, PA.Google Scholar
- 49.Smart, J. F. (2006). SMART Controlled English. In Proceedings CLAW2006, Cambridge, MA.Google Scholar
- 50.Tatsumi, M., Miyata, R., Hartley, A., Kageura, K., & Isahara, H. (2013). Towards acceptable quality machine translation without post-editing for municipal websites: An evaluation of Japanese controlled language rules. MT Summit 2013 QTLaunchPad Workshop on Human-Centric Machine Translation and Evaluation, Nice, France.Google Scholar
- 52.Yoshida, S., & Matsuyama, A. (1985). 日本語の規格化:係り受け関係の規格化とそれへの変換ルール (Standardizing Japanese: Standardizing dependency relations and transformation rules). Information Processing Society of Japan, NL-31, 1–6.Google Scholar