Abstract
We introduce a monolingual query method with additional webpage data to improve the translation quality for more and more official use requirement of statistical machine translation outputs. The motivation behind this method is that we can improve the readability of sentence once for all if we replace translation sentences with the most related sentences generated by human. Based on vector space representations for translated sentences, we perform a query on search engine for additional reference text data. Then we rank all translation sentences to make necessary replacement from the query results. Various vector representations for sentence, TFIDF, latent semantic indexing, and neural network word embedding, are conducted and the experimental results show an alternative solution to enhance the current machine translation with a performance improvement about 0.5 BLEU in French-to-English task and 0.7 BLEU in English-to-Chinese task.
H. Zhao—This paper was partially supported by Cai Yuanpei Program (CSC No. 201304490199 and No. 201304490171), National Natural Science Foundation of China (No. 61170114 and No. 61272248), National Basic Research Program of China (No. 2013CB329401), Major Basic Research Program of Shanghai Science and Technology Committee (No. 15JC1400103), Art and Science Interdisciplinary Funds of Shanghai Jiao Tong University (No. 14JCRZ04), and Key Project of National Society Science Foundation of China (No. 15-ZDA041).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We are aware that there are many other effective method such as [36] who used a parse tree and matrix-vector operations to retain word order information. However, this work is about machine translation sentence processing, we need robust and simple strategy to handle various possible defective sentences.
- 2.
References
Huang, S., Chen, H., Dai, X.-Y., Chen, J.: Non-linear learning for statistical machine translation. In: ACL, pp. 825–835 (2015)
Yu, H., Zhu, X.: Recurrent neural network based rule sequence model for statistical machine translation. In: ACL, pp. 132–138 (2015)
Lu, S., Chen, Z., Xu, B.: Learning new semi-supervised deep auto-encoder features for statistical machine translation. In: ACL, pp. 122–132 (2014)
Xiong, D., Zhang, M.: A sense-based translation model for statistical machine translation. In: ACL, pp. 1459–1469 (2014)
Neubig, G., Duh, K.: On the elements of an accurate tree-to-string machine translation system. In: ACL, pp. 143–149 (2014)
Riezler, S., Simianer, P., Haas, C.: Response-based learning for grounded machine translation. In: ACL, pp. 881–891 (2014)
Wang, R., Zhao, H., Lu, B.-L.: Bilingual continuous-space language model growing for statistical machine translation, pp. 1209–1220. IEEE (2015)
Zhang, J., Utiyama, M., Sumita, E., Zhao, H.: Learning local word reorderings for hierarchical phrase-based statistical machine translation. Mach. Transl. 1–18 (2016)
Wang, R., Utiyama, M., Goto, I., Sumita, E., Zhao, H., Lu, B.-L.: Converting continuous-space language models into N-gram language models with efficient bilingual pruning for statistical machine translation. ACM (2016)
Wang, R., Zhao, H., Ploux, S., Lu, B.-L., Utiyama, M.: A Bilingual Graph-Based Semantic Model for Statistical Machine Translation
Zang, S., Zhao, H., Wu, C., Wang, R.: A novel word reordering method for statistical machine translation. In: FSKD, pp. 843–848 (2015)
Zhang, J., Utiyama, M., Sumita, E., Zhao, H.: Learning word reorderings for hierarchical phrase-based statistical machine translation. In: ACL-IJCNLP, pp. 542–548 (2015)
Wang, R., Zhao, H., Lu, B.-L., Utiyama, M., Sumita, E.: Neural network based bilingual language model growing for statistical machine translation. In: EMNLP, pp. 189–195 (2014)
Zhang, J., Utiyama, M., Sumita, E., Zhao, H.: Learning hierarchical translation spans. In: EMNLP, pp. 183–188 (2014)
Wang, R., Utiyama, M., Goto, I., Sumita, E., Zhao, H., Lu, B.-L.: Converting continuous-space language models into N-gram language models for statistical machine translation. In: EMNLP, pp. 845–850 (2013)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)
Guzmán, F., Joty, S., Mà rquez, L., Nakov, P.: Pairwise neural machine translation evaluation. In: ACL, pp. 805–814 (2015)
Graham, Y.: Improving evaluation of machine translation quality estimation. In: ACL, pp. 1804–1813 (2015)
Miceli-Barone, A.V., Attardi, G.: Non-projective dependency-based pre-reordering with recurrent neural network for machine translation. In: ACL, pp. 846–856 (2015)
Zhang, J., Utiyama, M., Sumita, E., Zhao, H.: Learning word reorderings for hierarchical phrase-based statistical machine translation. In: ACL, pp. 542–548 (2015)
Nakagawa, T.: Efficient top-down BTG parsing for machine translation preordering. In: ACL, pp. 208–218 (2015)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. JMLR 3, 1137–1155 (2003)
Vaswani, A., Zhao, Y., Fossum, V., Chiang, D.: Decoding with large-scale neural language models improves translation. In: EMNLP, pp. 1387–1392 (2013)
Devlin, J., Zbib, R., Huang, Z., Lamar, T., Schwartz, R., Makhoul, J.: Fast and robust neural network joint models for statistical machine translation. In: ACL, pp. 1370–1380 (2014)
Liu, S., Yang, N., Li, M., Zhou, M.: A recursive recurrent neural network for statistical machine translation. In: ACL, pp. 1491–1500 (2014)
Simard, M., Ueffing, N., Isabelle, P., Kuhn, R.: Rule-based translation with statistical phrase-based post-editing. In: SMT, pp. 203–206 (2007)
Dugast, L., Senellart, J., Koehn, P.: Statistical post-editing on SYSTRAN’s rule-based translation system. In: SMT, pp. 220–223 (2007)
Simard, M., Goutte, C., Isabelle, P.: Statistical phrase-based post-editing. In: NAACL, pp. 508–515 (2007)
Isabelle, P., Goutte, C., Simard, M.: Domain adaptation of MT systems through automatic post-editing. In: MT Summit, pp. 255–261 (2007)
Lagarda, A.-L., Alabau, V., Casacuberta, F., Silva, R., Diaz-de-Liano, E.: Statistical post-editing of a rule-based machine translation system. In: ACL, pp. 217–220 (2009)
Béchara, H., Ma, Y., van Genabith, J.: Statistical post-editing for a statistical MT system. In: MT Summit, pp. 308–315 (2011)
Huang, Z., Devlin, J., Matsoukas, S.: BBN’s Systems for the Chinese-English Sub-task of the NTCIR-10 PatentMT Evaluation
Deerwester, S.S., Dumals, T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent semantic analysis. JASIS 41, 391–407 (1990)
Tomas, M., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space (2013). arXiv preprint arXiv:1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Huang, E., Socher, R., Manning, C., Ng, A.: Improving word representations via global context and multiple word prototypes. In: ACL, pp. 873–882 (2012)
Le, Q.V., Tomas, M.: Distributed representations of sentences and documents. Eprint Arxiv, pp. 1188–1196 (2014)
Werbos, P.J.: Backpropagation through time: what it does and how to do it, pp. 1550–1560. IEEE (1990)
Eisele, A., Chen, Y.: MultiUN: a multilingual corpus from United Nation documents. In: LREC, pp. 2868–2872 (2010)
Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: LREC (2012)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Koehn, P.F., Och, J., Marcu, D.: Statistical phrase-based translation. In: NAACL, pp. 127–133 (2003)
Och, F.J.: Minimum error rate training in statistical machine translation. In: ACL, pp. 701–711 (2003)
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: ICSLP, pp. 901–904 (2002)
Soricut, R., Marcu, D.: Sentence level discourse parsing using syntactic and lexical information. In: ACL, pp. 149–156 (2003)
Zhang, Z., Zhao, H., Qin, L.: Probabilistic graph-based dependency parsing with convolutional neural network. In: ACL, pp. 1382–1392 (2016)
Cai, D., Zhao, H.: Neural word segmentation learning for Chinese. In: ACL, pp. 409–420 (2016)
Tseng, H., Chang, P., Andrew, G.: A conditional random field word segmenter. In: SIGHAN, Daniel Jurafsky and Christopher Manning (2005)
Chang, P.-C., Galley, M., Manning, C.: Optimizing Chinese word segmentation for machine translation performance. In: WMT, pp. 224–232 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Pang, C., Zhao, H., Li, Z. (2016). I Can Guess What You Mean: A Monolingual Query Enhancement for Machine Translation. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-47674-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)