Expanding Queries with Term and Phrase Translations in Patent Retrieval

  • Charles Jochim
  • Christina Lioma
  • Hinrich Schütze
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6653)


Patent retrieval is a branch of Information Retrieval (IR) that aims to enable the challenging task of retrieving highly technical and often complicated patents. Typically, patent granting bodies translate patents into several major foreign languages, so that language boundaries do not hinder their accessibility. Given such multilingual patent collections, we posit that the patent translations can be exploited for facilitating patent retrieval.

Specifically, we focus on the translation of patent queries from German and French, the morphology of which poses an extra challenge to retrieval. We compare two translation approaches that expand the query with (i) translated terms and (ii) translated phrases. Experimental evaluation on a standard CLEF-IP European Patent Office dataset reveals a novel finding: phrase translation may be more suited to French, and term translation may be more suited to German. We trace this finding to language morphology, and we conclude that tailoring the query translation per language can lead to improved results in patent retrieval.


patent retrieval cross-language information retrieval query translation statistical machine translation relevance feedback query expansion 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Atkinson, K.H.: Toward a more rational patent search paradigm. In: 1st ACM Workshop on Patent IR, pp. 37–40 (2008)Google Scholar
  2. 2.
    Azzopardi, L., Vanderbauwhede, W., Joho, H.: Search system requirements of patent analysts. In: SIGIR, pp. 775–776 (2010)Google Scholar
  3. 3.
    Ballesteros, L., Croft, W.B.: Phrasal translation and query expansion techniques for cross-language information retrieval. In: SIGIR, pp. 84–91 (1997)Google Scholar
  4. 4.
    Bashir, S., Rauber, A.: Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In: CIKM, pp. 1863–1866 (2009)Google Scholar
  5. 5.
    Bashir, S., Rauber, A.: Improving retrievability of patents in prior-art search. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 457–470. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Braune, F., Fraser, A.: Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora. In: COLING (2010)Google Scholar
  7. 7.
    Chinnakotla, M.K., Raman, K., Bhattacharyya, P.: Multilingual prf: english lends a helping hand. In: SIGIR, pp. 659–666 (2010)Google Scholar
  8. 8.
    Croft, W.B., Lafferty, J.: Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht (2003)CrossRefzbMATHGoogle Scholar
  9. 9.
    Fujii, A., Utiyama, M., Yamamoto, M., Utsuro, T.: Overview of the patent translation task at the NTCIR-7 workshop. In: NTCIR (2008)Google Scholar
  10. 10.
    Gao, W., Niu, C., Nie, J.-Y., Zhou, M., Wong, K.-F., Hon, H.-W.: Exploiting query logs for cross-lingual query suggestions. TOIS 28(2) (2010)Google Scholar
  11. 11.
    Jochim, C., Lioma, C., Schütze, H., Koch, S., Ertl, T.: Preliminary study into query translation for patent retrieval. In: PaIR, Toronto, Canada. ACM, New York (2010)Google Scholar
  12. 12.
    Kettunen, K.: Choosing the best MT programs for CLIR purposes – can MT metrics be helpful? In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 706–712. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: ACL, pp. 177–180 (2007)Google Scholar
  14. 14.
    Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL, pp. 48–54 (2003)Google Scholar
  15. 15.
    Larkey, L.S., Connell, M.E.: Structured queries, language modeling, and relevance modeling in cross-language information retrieval. Inf. Process. Manage. 41(3), 457–473 (2005), doi:10.1016/j.ipm.2004.06.008CrossRefzbMATHGoogle Scholar
  16. 16.
    Lavrenko, V., Croft, W.B.: Relevance-based language models. In: SIGIR, pp. 120–127 (2001)Google Scholar
  17. 17.
    Oard, D.W., Diekema, A.R.: Cross-language information retrieval. Annual Review of Information Science and Technology 33, 223–256 (1998)Google Scholar
  18. 18.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)CrossRefzbMATHGoogle Scholar
  19. 19.
    Roda, G., Tait, J., Piroi, F., Zenz, V.: CLEF-IP 2009: Retrieval experiments in the intellectual property domain. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mostefa, D., Penas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 385–409. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Tait, J. (ed.): 1st ACM Workshop on Patent IR (2008)Google Scholar
  21. 21.
    Tait, J. (ed.): 2nd ACM Workshop on Patent IR (2009)Google Scholar
  22. 22.
    Wang, J., Oard, D.W.: Combining bidirectional translation and synonymy for cross-language information retrieval. In: SIGIR, pp. 202–209 (2006)Google Scholar
  23. 23.
    Xue, X., Croft, W.B.: Automatic query generation for patent search. In: CIKM, pp. 2037–2040 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Charles Jochim
    • 1
  • Christina Lioma
    • 1
  • Hinrich Schütze
    • 1
  1. 1.Institute for Natural Language Processing, Computer ScienceStuttgart UniversityStuttgartGermany

Personalised recommendations