Appendix 6.1. Translation as a Tool in Social Research and the Scope of the Translator/Interpreter Industry
Translation as a Tool in Social Research
These notes refer to aspects of translation that extend beyond those suggested by the title of this chapter, but it is convenient and useful to include these notes here. The pervasive role of translation in social research and the political economy makes it an extremely important tool in these areas.
Linguistic and other anthropologists engage in translation in connection with their field research among indigenous populations and their efforts to revitalize endangered languages. Translation is an important tool in defending minority languages that are not endangered but whose independence is threatened by dominant national languages (e.g., Catalan , Euskara, and Galician in Spain). Moreover, the translation process may be applied in such a way as to use a major international language to serve a national or regional language, such as in translating an important medical or other technical text from English into Galician or Catalan. The globalizing economy requires translations of financial instruments and transactions between nations, and of the associated information technology. Another area of the need for translation as a tool results from the massive international movements of populations, including voluntary economic international transfers, refugee populations and populations seeking asylum, and tourist populations.
A few cautions need to be observed in doing translations. It is important to know the culture of the people using the second language and take account of it in the translation into or from the language. In connection with writing a historical work in Galician, Roseman (2014) both composed Galician directly and translated from English into Galician. She argues for the need to take account of subtle variations in the culture between the two languages when translating narrative, historical, and social texts, and presents some examples of such differences between English and Galician. At times it is prudent, however, not to translate specific terms, but rather to retain the terms as they are in the original or conventional language. This may apply to some technical terms or idiomatic expressions for which there is no equivalent in the second language, or it may be done to assure that a technical term that is universally used in some widely spoken language is understood correctly (e.g., a medical or anthropological technical term). Then, a further explanation of the term may be given in the second language. Another tack is to use the second-language word, but then provide further explanation in that language.
On the other hand, it is desirable at times to avoid routine translation into the dominant national language (e.g., Basque into Spanish, Corsican into French ). Some militants oppose the routine translation of regional languages into the dominant national language because they see this practice as an expression of defeat of the regional language and a form of domination by the national language. In this spirit some field workers use the indigenous language exclusively among the natives once they have learned it, and record their field notes directly in the indigenous language. As mentioned, Roseman wrote parts of her book about the Galicians directly in Galician without preparing an English version first and then translating it.
Translation/Interpreting Now a Sizeable Industry
As suggested above, translation/interpreting has become a sizable industry in the United States and elsewhere. Several factors are contributing to the demand for and use of translators. These include the massive international migration and refugee movements, the frequency and intensity of civil and international conflict and violence, and, especially, increasing globalization represented by the growing communication among countries and their residents, increased international travel, increased flow of commodities, and the formation of regional economic unions and trading areas. The general need for translators has been intensified by the growing linguistic diversity of Western populations resulting from the flow of immigrants and refugees from Africa, Asia, and Latin America .
Translators/interpreters are used widely at international conferences, in war zones, in court proceedings, in voting programs, in airline services, and in telephone interviewing and surveys. They work at the United Nations, the European Union , UNESCO , and the European Parliament, and for various national governments, at international conferences of professional associations, and for many tourist companies. They have, in effect, entered the political arena as they sit at meetings with political officials negotiating the relations of nations involved in commercial, territorial, military, and intellectual-property disputes. We saw them involved in the Serbian-Bosnian dispute, and see them now in Iraq, Afghanistan, and elsewhere in the Middle East.
Translation in Particular Social Domains
Given the wide variety of situations in which translators/interpreters work, it is to be expected that they specialize in interpreting in particular social domains. I list five such domains as illustrations: Conference interpreting, healthcare interpreting, survey interview interpreting, courtroom or legal interpreting, and community liaison interpreting. The different domains of interpreting vary in the demands made on the interpreter and the relation of the interpreter to the client. Conference interpreting calls for highly trained interpreters and there are numerous materials setting forth professional standards and guidelines available for their training. The job is quite demanding and so they usually work in teams. Yet it is a relatively impersonal activity. Healthcare interpreting involves the interaction of three parties, the patient, the health provider, and the interpreter, so it is a far more intimate situation than conference interpreting. The healthcare interpreter plays an important role in this interaction and could affect the health provider’s interpretation of the health problem, the patient’s understanding of it, and the prescribed treatment.
Survey-interview interpreting calls for on-the-spot interpreting, which is inherently at odds with the goal of a standardized survey interview. It is a major challenge to achieve an acceptable degree of standardization in such interviews. In courtroom and legal interpreting, accuracy, completeness, and neutrality are essential, so that a high quality in the interpretation must be achieved. In any case, it is evident that the interpreter can be a factor in the progress of legal proceedings and even affect their outcome. Community-liaison interpreting is the name for interpreting employed in other everyday social encounters, such as with a repair person at home or at a business location, with a bank agent, with a police officer, or with some other public official. Like healthcare interpreting, community interpreting tends to be off-the-cuff interpreting.
Translation Contributing to English’s Role as World lingua franca
Translation and interpreting take on special relevance in multilingual situations where a majority language has a monopoly on public life. Largely as a result of technological developments in communication, English has become a lingua franca in much of the world. Translation into English has contributed to this domination by English as a lingua franca (Taviano 2013; Baker 2014). Choices made in the course of translation appear to contribute to imposing English norms and communication technology on the translations. The role of English as lingua franca generates pressure on translators to use the specialized terminology of English in their translated texts. In translations from English, communicative norms in languages other than English are likely to be adapted to English ones. Moreover, the communicative norms and specialized terminology of English are increasingly likely to be used in the target texts, and the linguistic resources of the non-English speakers become subject to gradual erosion (Cronin 2013).
The primacy of English is enhanced by the process of machine translation (Raley 2003). In machine translation, the technology places particular emphasis on functionality and practical understanding, and such results are only possible when input and output are restricted in terms of style, vocabulary, and content. Inasmuch as English has been the leading linguistic vehicle for the technological developments contributing to the input materials for machine translation, it is not surprising that the primacy of English is enhanced by the process of machine translation.
Appendix 6.2. Machine Translation
Nature, Use, and Effectiveness
Given the multilingual conditions in many countries and in regional unions of countries (e.g., the European Union ), the increasing globalization and internationalization, and technological advances, much interest and effort have been directed toward development and enhancement of machine-language translation technology. Machine translation refers to the process of translating text or speech from one language to another by the use of computer software. The following note summarizes these developments.
In the United States this work has been largely sponsored by the U.S. Department of Defense because of its interest in linguistic training and translation skills for the military services. Many private companies and universities have also been actively involved in this effort over many years. Machine translation methods are in various stages of development and no system provides a very-high-quality translation of unrestricted text, although many fully-automated systems yield products of reasonable quality for particular domains (e.g., legal documents). The quality of the product is greatly improved if the domain in restricted and controlled. Results can also be improved by human intervention, as by identifying clearly which words in the text are proper names.
There are limitations in the general application of machine translation methods to all areas, including an area of specific interest to us, namely national censuses, surveys, and other demographic collection instruments. However, because of the scarcity of bilingual translators and interviewers who are also knowledgeable in survey methodology, it is important to pursue efforts to improve machine-language translation methods in this area, perhaps combining them with one another and with a human translator. Machine technology may provide a first rough translation of a document so that the human translator can spend time in refining the machine translation. In this way, time, money, and effort are saved in producing the final translation.
The goals of machine-program development are to produce (1) a high quality program capable of translation; (2) a system capable of translating any text, that is, a general purpose translation device; and (3) a system requiring no human intervention. At present, no language translation system meets all three criteria at once. A simple approach is to substitute words in one language for words in another, but that would fail to convey the meaning of many phrases, which have to be translated as semantic units. For that reason more sophisticated methods have been developed.
There are four language translation systems currently in use, designated as rule-based (or grammar-based), corpus-based (or example-based), statistics-based, and neural-based (Wikipedia 2016, 2017). The combination of systems combines the strengths and avoids the weaknesses of the individual methods. A statistical language model aids in selecting the most effective combination for the possible output.
Rule-based systems use detailed knowledge of the language, including the grammar, and are able to create rather acceptable translations, but require much more time, money, and effort in their development than the other methods. Rule-based methods use methods based on linguistic rules; that is, words are translated in a linguistic way, according to which the most suitable words in the target language will replace the ones in the source language. Generally, the rule-based translations create an intermediate symbolic representation from which the text in the target language is created. These intermediate representations may be handled in different ways, either as interlingual machine translation, transfer-based translation, or dictionary-based translation. In the interlingual approach, the source language is converted into a neutral representation of a language that simulates the meaning of the source language but is independent of any real language and can be used repeatedly for translation into different languages. The target language is then translated from the “interlingua.” The transfer-based method is similar to the interlingual translation approach but differs in that it depends partly on the pair of languages involved in the translation. Dictionary-based translation uses a method based on dictionary-entries.
These methods require much information about the linguistics of the source and target languages, including extensive lexicons with morphological, syntactic, and semantic information and large sets of rules. Lexical selection rules must be written for all possible ambiguities. A skilled linguist is needed to design the grammars and rules that these methods use. The problem with the method is assembling enough appropriate data to support the method. In spite of their complexities the rule-based methods can perform reasonably well for translations between closely related languages.
Corpus-based , or example-based, machine translation systems match texts in large databases of parallel texts, tagging words in parallel texts and translating sentences and phrases based on matching words and phrases with common tags. This method combines a word-for-word dictionary translation, a glossary data-base for phrase translation, and both general and domain-specific databases for sentences. Examples of domain-specific sentences, phrases, and words are collected to form the corpus that serves as training data for speech recognition for this system (called TONGUES). The corpus that is used is one that contains texts that have already been translated. Given a sentence that is to be translated, sentences from this corpus are selected that contain similar structural components within the sentences. The similar sentences are then used to translate the structural components within sentences into the target language and these phases are put together to form a complete translation. This device is used for face-to-face communication (e.g., door-to-door interviews). It cannot be used for telephone surveys. Corpus-based systems address a single topic or domain and can be developed rather quickly, but the results are of lesser quality than rule-based systems.
Statistical machine translation uses statistical methods based on bilingual text corpora. Where such corpora are available, good results can be obtained translating similar texts, but such corpora are still quite rare for many pairs of languages. One way these translation systems work is by detecting patterns in hundreds of millions of documents that have been previously translated by humans and making intelligent guesses based on the findings. Generally, the more human documents available in a given language, the more likely it is that the translation will be of good quality. A newer approach in statistical translation is to use a minimal corpus size and focus instead on syntactic structure through pattern recognition. Limitations of this machine-translation method are its dependence on huge amounts of parallel texts, its problem with morphology-rich languages, and its inability to correct single isolated errors.
Hybrid systems combine the strengths of statistical and rule- based translations. There are different approaches to this method. A general distinction between them depends on whether the rule-based method takes precedence and statistics are used to adjust or correct the output from the rule-based method, or the statistics method takes precedence and rules are used to adjust or correct the statistical output.
Another variation is the use of multiparallel corpora. Multiparallel corpora are bodies of text that have been translated into three or more languages. In applying the method, a text which has been translated into two or more languages is compared with the text that has been translated into three of more languages to provide a more accurate translation into the third language as compared with the use of only two corpora.
Google’s Neural System
Over the past several years Google has been developing its own systems of machine translation named Google Translate (Wikipedia 2017). Since 2007 it had been using statistical methods (SMT), with its proprietary in-house GSMT technology. By 2016 it had developed the neural machine translation system (GNMT) and soon after began using Google Translate to translate several languages with GNMT instead of GSMT.
At first, Google used a phase-based method for its key algorithm but advances in machine intelligence have improved their capabilities at speech recognition and image recognition (Google 2017). These advances have made possible the development of a sentence-based neural machine transition system. Phase-Based Machine Translation breaks an input sentence into words and phrases to be translated largely independently, but neural machine translation considers the entire input sentence as a unit for translation. The GNMT system carries out interlingual machine translation by encoding the semantics of the sentence rather than by memorizing phase-by-phase translations. It applies an example-based machine translation method in which the system learns from millions of examples; with time the system learns to create better and more natural translations.
According to Google, the current GNMT version has made improvements in handling rare words, can work on large data sets, can translate directly from one language to another without first translating into English, and is sufficiently fast and accurate to produce acceptable translations (Google 2017). Google’s tests show that it offers a substantial improvement over phase-based translations for translations between English and French, Spanish, and Chinese. The new translation system was first enabled in 2016 for eight languages (to and from English and French , Spanish, German , Portuguese , Chinese, Japanese, Korean, and Turkish ) and early in 2017 three more languages will be added ( Russian, Hindi , and Vietnamese ).
Some Continuing Problems
A number of general difficulties in applying machine translation resist easy solutions. One results from the fact that words can have two or more meanings. Finding an acceptable translation for this problem of word disambiguation has engaged researchers since the middle of the last century. A number of machine solutions have been proposed. One applies statistical methods to the words surrounding the ambiguous word, possibly guessing to resolve an ambiguity. The other is based on a comprehensive knowledge of the ambiguous word, possibly requiring the use of machine software to do research to resolve an ambiguity (Wikipedia 2016). A second problem with machine translation is its inability to translate non-standard language with the same accuracy as standard language . Neither rule-based methods nor statistical-based methods include input from common non-standard sources. As a result, errors are made in translating material both from a vernacular source and into a vernacular language .
A third problem is the difficulty machine translation has in handling so-called named entities. Named entities relate to entities such as names of persons, organizations, companies, and places as well expressions of time, space, and quantity. Names differ from language to language and change over time. The machine translation may erroneously translate them as common nouns or omit them altogether. One way of managing the situation is to use transliteration instead of translation, that is, employing letters in the target language that most closely resemble those in the source language. Transliteration by machine has its problems, among them that some words that should not have been transliterated are transliterated, and others that should have been are not. Another possible solution is to develop a “do-not-translate” list, the names on which can be transliterated separately. Both of these solutions rely on correct identification of named entities. A third approach to handling the named-entity problem is to use a class-based model, that is, to replace them with a token to represent the class to which they belong.
No translation is perfect, even that made by humans. Even Google’s Neural Machine Translation system makes significant errors that a human translator would not make. It may drop words, mistranslate proper names and rare words, or translate sentences out of context rather than taking account of the context of the paragraph.