I Can Guess What You Mean: A Monolingual Query Enhancement for Machine Translation

Pang, Chenxi; Zhao, Hai; Li, Zhongyi

doi:10.1007/978-3-319-47674-2_5

Chenxi Pang^18,19,
Hai Zhao^18,19 &
Zhongyi Li^18,19

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10035))

Included in the following conference series:

1747 Accesses

Abstract

We introduce a monolingual query method with additional webpage data to improve the translation quality for more and more official use requirement of statistical machine translation outputs. The motivation behind this method is that we can improve the readability of sentence once for all if we replace translation sentences with the most related sentences generated by human. Based on vector space representations for translated sentences, we perform a query on search engine for additional reference text data. Then we rank all translation sentences to make necessary replacement from the query results. Various vector representations for sentence, TFIDF, latent semantic indexing, and neural network word embedding, are conducted and the experimental results show an alternative solution to enhance the current machine translation with a performance improvement about 0.5 BLEU in French-to-English task and 0.7 BLEU in English-to-Chinese task.

H. Zhao—This paper was partially supported by Cai Yuanpei Program (CSC No. 201304490199 and No. 201304490171), National Natural Science Foundation of China (No. 61170114 and No. 61272248), National Basic Research Program of China (No. 2013CB329401), Major Basic Research Program of Shanghai Science and Technology Committee (No. 15JC1400103), Art and Science Interdisciplinary Funds of Shanghai Jiao Tong University (No. 14JCRZ04), and Key Project of National Society Science Foundation of China (No. 15-ZDA041).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We are aware that there are many other effective method such as [36] who used a parse tree and matrix-vector operations to retain word order information. However, this work is about machine translation sentence processing, we need robust and simple strategy to handle various possible defective sentences.
2.
A sophisticated approach is cutting sentence into several relative independent parts according to parse tree of sentence [45, 46], which can be regarded as a further improvement over the current simple segmentation strategy.

References

Huang, S., Chen, H., Dai, X.-Y., Chen, J.: Non-linear learning for statistical machine translation. In: ACL, pp. 825–835 (2015)
Google Scholar
Yu, H., Zhu, X.: Recurrent neural network based rule sequence model for statistical machine translation. In: ACL, pp. 132–138 (2015)
Google Scholar
Lu, S., Chen, Z., Xu, B.: Learning new semi-supervised deep auto-encoder features for statistical machine translation. In: ACL, pp. 122–132 (2014)
Google Scholar
Xiong, D., Zhang, M.: A sense-based translation model for statistical machine translation. In: ACL, pp. 1459–1469 (2014)
Google Scholar
Neubig, G., Duh, K.: On the elements of an accurate tree-to-string machine translation system. In: ACL, pp. 143–149 (2014)
Google Scholar
Riezler, S., Simianer, P., Haas, C.: Response-based learning for grounded machine translation. In: ACL, pp. 881–891 (2014)
Google Scholar
Wang, R., Zhao, H., Lu, B.-L.: Bilingual continuous-space language model growing for statistical machine translation, pp. 1209–1220. IEEE (2015)
Google Scholar
Zhang, J., Utiyama, M., Sumita, E., Zhao, H.: Learning local word reorderings for hierarchical phrase-based statistical machine translation. Mach. Transl. 1–18 (2016)
Google Scholar
Wang, R., Utiyama, M., Goto, I., Sumita, E., Zhao, H., Lu, B.-L.: Converting continuous-space language models into N-gram language models with efficient bilingual pruning for statistical machine translation. ACM (2016)
Google Scholar
Wang, R., Zhao, H., Ploux, S., Lu, B.-L., Utiyama, M.: A Bilingual Graph-Based Semantic Model for Statistical Machine Translation
Google Scholar
Zang, S., Zhao, H., Wu, C., Wang, R.: A novel word reordering method for statistical machine translation. In: FSKD, pp. 843–848 (2015)
Google Scholar
Zhang, J., Utiyama, M., Sumita, E., Zhao, H.: Learning word reorderings for hierarchical phrase-based statistical machine translation. In: ACL-IJCNLP, pp. 542–548 (2015)
Google Scholar
Wang, R., Zhao, H., Lu, B.-L., Utiyama, M., Sumita, E.: Neural network based bilingual language model growing for statistical machine translation. In: EMNLP, pp. 189–195 (2014)
Google Scholar
Zhang, J., Utiyama, M., Sumita, E., Zhao, H.: Learning hierarchical translation spans. In: EMNLP, pp. 183–188 (2014)
Google Scholar
Wang, R., Utiyama, M., Goto, I., Sumita, E., Zhao, H., Lu, B.-L.: Converting continuous-space language models into N-gram language models for statistical machine translation. In: EMNLP, pp. 845–850 (2013)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)
Google Scholar
Guzmán, F., Joty, S., Màrquez, L., Nakov, P.: Pairwise neural machine translation evaluation. In: ACL, pp. 805–814 (2015)
Google Scholar
Graham, Y.: Improving evaluation of machine translation quality estimation. In: ACL, pp. 1804–1813 (2015)
Google Scholar
Miceli-Barone, A.V., Attardi, G.: Non-projective dependency-based pre-reordering with recurrent neural network for machine translation. In: ACL, pp. 846–856 (2015)
Google Scholar
Zhang, J., Utiyama, M., Sumita, E., Zhao, H.: Learning word reorderings for hierarchical phrase-based statistical machine translation. In: ACL, pp. 542–548 (2015)
Google Scholar
Nakagawa, T.: Efficient top-down BTG parsing for machine translation preordering. In: ACL, pp. 208–218 (2015)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. JMLR 3, 1137–1155 (2003)
MATH Google Scholar
Vaswani, A., Zhao, Y., Fossum, V., Chiang, D.: Decoding with large-scale neural language models improves translation. In: EMNLP, pp. 1387–1392 (2013)
Google Scholar
Devlin, J., Zbib, R., Huang, Z., Lamar, T., Schwartz, R., Makhoul, J.: Fast and robust neural network joint models for statistical machine translation. In: ACL, pp. 1370–1380 (2014)
Google Scholar
Liu, S., Yang, N., Li, M., Zhou, M.: A recursive recurrent neural network for statistical machine translation. In: ACL, pp. 1491–1500 (2014)
Google Scholar
Simard, M., Ueffing, N., Isabelle, P., Kuhn, R.: Rule-based translation with statistical phrase-based post-editing. In: SMT, pp. 203–206 (2007)
Google Scholar
Dugast, L., Senellart, J., Koehn, P.: Statistical post-editing on SYSTRAN’s rule-based translation system. In: SMT, pp. 220–223 (2007)
Google Scholar
Simard, M., Goutte, C., Isabelle, P.: Statistical phrase-based post-editing. In: NAACL, pp. 508–515 (2007)
Google Scholar
Isabelle, P., Goutte, C., Simard, M.: Domain adaptation of MT systems through automatic post-editing. In: MT Summit, pp. 255–261 (2007)
Google Scholar
Lagarda, A.-L., Alabau, V., Casacuberta, F., Silva, R., Diaz-de-Liano, E.: Statistical post-editing of a rule-based machine translation system. In: ACL, pp. 217–220 (2009)
Google Scholar
Béchara, H., Ma, Y., van Genabith, J.: Statistical post-editing for a statistical MT system. In: MT Summit, pp. 308–315 (2011)
Google Scholar
Huang, Z., Devlin, J., Matsoukas, S.: BBN’s Systems for the Chinese-English Sub-task of the NTCIR-10 PatentMT Evaluation
Google Scholar
Deerwester, S.S., Dumals, T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent semantic analysis. JASIS 41, 391–407 (1990)
Article Google Scholar
Tomas, M., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space (2013). arXiv preprint arXiv:1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Google Scholar
Huang, E., Socher, R., Manning, C., Ng, A.: Improving word representations via global context and multiple word prototypes. In: ACL, pp. 873–882 (2012)
Google Scholar
Le, Q.V., Tomas, M.: Distributed representations of sentences and documents. Eprint Arxiv, pp. 1188–1196 (2014)
Google Scholar
Werbos, P.J.: Backpropagation through time: what it does and how to do it, pp. 1550–1560. IEEE (1990)
Google Scholar
Eisele, A., Chen, Y.: MultiUN: a multilingual corpus from United Nation documents. In: LREC, pp. 2868–2872 (2010)
Google Scholar
Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: LREC (2012)
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Article MATH Google Scholar
Koehn, P.F., Och, J., Marcu, D.: Statistical phrase-based translation. In: NAACL, pp. 127–133 (2003)
Google Scholar
Och, F.J.: Minimum error rate training in statistical machine translation. In: ACL, pp. 701–711 (2003)
Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: ICSLP, pp. 901–904 (2002)
Google Scholar
Soricut, R., Marcu, D.: Sentence level discourse parsing using syntactic and lexical information. In: ACL, pp. 149–156 (2003)
Google Scholar
Zhang, Z., Zhao, H., Qin, L.: Probabilistic graph-based dependency parsing with convolutional neural network. In: ACL, pp. 1382–1392 (2016)
Google Scholar
Cai, D., Zhao, H.: Neural word segmentation learning for Chinese. In: ACL, pp. 409–420 (2016)
Google Scholar
Tseng, H., Chang, P., Andrew, G.: A conditional random field word segmenter. In: SIGHAN, Daniel Jurafsky and Christopher Manning (2005)
Google Scholar
Chang, P.-C., Galley, M., Manning, C.: Optimizing Chinese word segmentation for machine translation performance. In: WMT, pp. 224–232 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Chenxi Pang, Hai Zhao & Zhongyi Li
Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Chenxi Pang, Hai Zhao & Zhongyi Li

Authors

Chenxi Pang
View author publications
You can also search for this author in PubMed Google Scholar
Hai Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai Zhao .

Editor information

Editors and Affiliations

Tsinghua University , Beijing, China
Maosong Sun
Fudan University , Shanghai, China
Xuanjing Huang
Dalian University of Technology , Dalian, China
Hongfei Lin
Tsinghua University , Beijing, China
Zhiyuan Liu
Tsinghua University , Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pang, C., Zhao, H., Li, Z. (2016). I Can Guess What You Mean: A Monolingual Query Enhancement for Machine Translation. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-47674-2_5
Published: 10 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics