# Complexity of alignment and decoding problems: restrictions and approximations

- 228 Downloads

## Abstract

We study the computational complexity of the Viterbi alignment and relaxed decoding problems for IBM model 3, focusing on the problem of finding a solution which has significant overlap with an optimal. That is, an approximate solution is considered good if it looks like some optimal solution with a few mistakes, where mistakes can be wrong values (such as a word aligned incorrectly or a wrong word in decoding), as well as insertions and deletions (spurious/missing words in decoding). In this setting, we show that it is computationally hard to find a solution which is correct on more than half (plus an inverse polynomial fraction) of the words. More precisely, if there is a polynomial-time algorithm computing an alignment for IBM model 3 which agrees with some Viterbi alignment on \(l/2+l^\epsilon \) words, where *l* is the length of the English sentence, or producing a decoding with \(l/2+l^\epsilon \) correct words, then P \(=\) NP. We also present a similar structure inapproximability result for phrase-based alignment. As these strong lower bounds are for the general definitions of the Viterbi alignment and decoding problems, we also consider, from a parameterized complexity perspective, which properties of the input make these problems intractable. As a first step in this direction, we show that Viterbi alignment has a fixed-parameter tractable algorithm with respect to limiting the range of words in the target sentence to which a source word can be aligned. We note that by comparison, limiting maximal fertility—even to three—does not affect NP-hardness of the result.

## Keywords

Viterbi alignment Decoding IBM Model 3 Phrase alignment Approximation Lower bounds Edit distance## Notes

### Acknowledgments

We are very grateful to the anonymous referees and the editor of the Machine Translation journal for suggesting a more relevant setting to apply our techniques, and pointing us to the literature. We also want to thank Todd Wareham, Valentine Kabanets and Russell Impagliazzo for numerous discussions and suggestions, and to Venkat Guruswami for telling us about then-unpublished work of Sheldon and Young.

## References

- Berman L, Hartmanis J (1977) On isomorphisms and density of NP and other complete sets. SIAM J Comput 6(2):305–322CrossRefMathSciNetMATHGoogle Scholar
- Birch A, Callison-Burch C, Osborne M, Koehn P (2006) Constraining the phrase-based, joint probability statistical translation model. In: HLT-NAACL 2006: proceedings of the workshop on statistical machine translation, New York, pp 154–157Google Scholar
- Brown PF, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311Google Scholar
- Buss JF, Islam TM (2008) The complexity of fixed-parameter problems: guest column. SIGACT News 39(1):33–46. doi: 10.1145/1360443.1360454 CrossRefGoogle Scholar
- Cesati M (2006) Compendium of parameterized problems. http://www.sprg.uniroma2.it/home/cesati/research/compendium/compendium.pdf
- DeNero J (2010) Phrase alignment models for statistical machine translation. PhD Thesis, UC Berkeley, CAGoogle Scholar
- DeNero J, Klein D (2008) The complexity of phrase alignment problems. In: ACL-08: HLT. Proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies: short papers, Columbus, pp 25–28Google Scholar
- Downey RG, Fellows MR (1992) Fixed-parameter intractability. In: Proceedings of the seventh annual conference on structure in complexity theory, Victoria, pp 36–49Google Scholar
- Feige U, Langberg M, Nissim K (2000) On the hardness of approximating NP witnesses. In: Approx 2000: approximation algorithms for combinatorial optimization. Proceedings of third international workshop. Lecture notes in computer science 1913. Springer, New York, pp 120–131Google Scholar
- Gal A, Halevi S, Lipton RJ, Petrank E (1999) Computing from partial solutions. In: COCO ’99: proceedings of the fourteenth annual IEEE conference on computational complexity, Atlanta, pp 34–45Google Scholar
- Guruswami V, Rudra A (2008) Soft decoding, dual BCH codes, and better list-decodable \(\varepsilon \)-biased codes. In: CCC 2008: proceedings of the twenty-third annual IEEE conference on computational complexity, College Park, pp 163–174Google Scholar
- Hamilton M, Müller M, van Rooij I, Wareham T (2007) Approximating solution structure. In: Demaine E, Gutin GZ, Marx D, Stege U (eds) Structure theory and FPT algorithmics for graphs, digraphs and hypergraphs, No. 07281 in Dagstuhl seminar proceedings. Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI). Schloss Dagstuhl, Germany, DagstuhlGoogle Scholar
- Knight K (1999) Decoding complexity in word-replacement translation models. Comput Linguist 25(4):607–615Google Scholar
- Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Machine translation: from real users to research: 6th conference of the Association for Machine Translation in the Americas. Springer, Berlin, pp 115–124Google Scholar
- Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: conference combining human language technology conference series and the North American Chapter of the Association for Computational Linguistics conference series. Proceedings, Edmonton, pp 48–54Google Scholar
- Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Q 2:83–97CrossRefGoogle Scholar
- Kumar R, Sivakumar D (1999) Proofs, codes, and polynomial-time reducibilities. In: COCO ’99: proceedings of the fourteenth annual IEEE conference on computational complexity, Atlanta, pp 46–53Google Scholar
- Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3):1–49CrossRefGoogle Scholar
- MacCartney B, Galley M, Manning CD (2008) A phrase-based alignment model for natural language inference. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 802–811Google Scholar
- Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: EMNLP-2002: proceedings of the 2002 conference on empirical methods in natural language processing, Philadelphia, pp 133–139Google Scholar
- Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51CrossRefMATHGoogle Scholar
- Ravi S, Knight K (2010) Does giza++ make search errors? Comput Linguist 36(3):295–302CrossRefMathSciNetGoogle Scholar
- Sheldon D, Young NE (2013) Hamming approximation of NP witnesses. Theory Comput 9(22):685–702CrossRefMathSciNetMATHGoogle Scholar
- Søgaard A (2009) On the complexity of alignment problems in two synchronous grammar formalisms. In: Proceedings of the third workshop on syntax and structure in statistical translation (SSST-3) at NAACL HLT 2009, Boulder, pp 60–68Google Scholar
- Udupa R, Maji H (2005) Theory of alignment generators and applications to statistical machine translation. In: Proceedings of the 19th international joint conference on artificial intelligence, Edinburgh, pp 1142–1147Google Scholar
- Udupa R, Maji HK (2006) Computational complexity of statistical machine translation. In: EACL-2006: 11th conference of the European chapter of the Association for Computational Linguistics, proceedings of the conference, Trento, pp 25–32Google Scholar
- van Rooij I, Wareham T (2012) Intractability and approximation of optimization theories of cognition. J Math Psychol 56(4):232–247CrossRefMATHGoogle Scholar