Advertisement

Patent retrieval: a literature review

  • Walid ShalabyEmail author
  • Wlodek Zadrozny
Survey Paper

Abstract

With the ever increasing number of filed patent applications every year, the need for effective and efficient systems for managing such tremendous amounts of data becomes inevitably important. Patent retrieval (PR) is considered the pillar of almost all patent analysis tasks. PR is a subfield of information retrieval (IR) which is concerned with developing techniques and methods that effectively and efficiently retrieve relevant patent documents in response to a given search request. In this paper, we present a comprehensive review on PR methods and approaches. It is clear that recent successes and maturity in IR applications such as Web search cannot be transferred directly to PR without deliberate domain adaptation and customization. Furthermore, state-of-the-art performance in automatic PR is still around average in terms of recall. These observations motivate the need for interactive search tools which provide cognitive assistance to patent professionals with minimal effort. These tools must also be developed in hand with patent professionals considering their practices and expectations. We additionally touch on related tasks to PR such as patent valuation, litigation, licensing, and highlight potential opportunities and open directions for computational scientists in these domains.

Keywords

Information retrieval Patent retrieval Patent mining Patent prior art search Survey 

Notes

Acknowledgements

This work was supported by the National Science Foundation (Grant No. 1624035). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

  1. 1.
    Al-Shboul B, Myaeng SH (2014) Wikipedia-based query phrase expansion in patent class search. Inf Retr 17(5–6):430–451CrossRefGoogle Scholar
  2. 2.
    Allison JR, Lemley MA, Schwartz DL (2013) Understanding the realities of modern patent litigation. Tex L Rev 92:1769Google Scholar
  3. 3.
    Allison JR, Lemley MA, Schwartz DL (2015) Our divided patent system. Univ Chic Law Rev 82(3):1073–1154Google Scholar
  4. 4.
    Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New YorkGoogle Scholar
  5. 5.
    Bashir S, Rauber A (2010) Improving retrievability of patents in prior-art search. In: Advances in information retrieval, Springer, pp 457–470Google Scholar
  6. 6.
    Bouadjenek MR, Sanner S, Ferraro G (2015) A study of query reformulation for patent prior art search with partial patent applications. In: Proceedings of the 15th international conference on artificial intelligence and law, ACM, pp 23–32Google Scholar
  7. 7.
    Brin S, Page L (2012) Reprint of: the anatomy of a large-scale hypertextual web search engine. Comput Netw 56(18):3825–3833CrossRefGoogle Scholar
  8. 8.
    Cao G, Nie JY, Gao J, Robertson S (2008) Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 243–250Google Scholar
  9. 9.
    Carbonell J, Goldstein J (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 335–336Google Scholar
  10. 10.
    Chen Y, Spangler S, Kreulen J, Boyer S, Griffin TD, Alba A, Behal A, He B, Kato L, Lelescu A, et al (2009) Simple: a strategic information mining platform for licensing and execution. In: IEEE international conference on data mining workshops, 2009. ICDMW’09, IEEE, pp 270–275Google Scholar
  11. 11.
    Chen YL, Chiu YT (2011) An IPC-based vector space model for patent retrieval. Inf Process Manag 47(3):309–322CrossRefGoogle Scholar
  12. 12.
    Cormack GV, Grossman MR (2014) Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 153–162Google Scholar
  13. 13.
    Cronen-Townsend S, Zhou Y, Croft WB (2002) Predicting query performance. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 299–306Google Scholar
  14. 14.
    Czarnitzki D, Hussinger K, Leten B (2011) The market value of blocking patent citations. ZEW - Zentrum für Europäische Wirtschaftsforschung/Center for European Economic ResearchGoogle Scholar
  15. 15.
    D’hondt E, Verberne S (2010) Clef-ip 2010: Prior art retrieval using the different sections in patent documents. In: CLEF (Notebook Papers/LABs/Workshops)Google Scholar
  16. 16.
    Eisinger D, Tsatsaronis G, Bundschus M, Wieneke U, Schroeder M (2013) Automated patent categorization and guided patent search using IPC as inspired by mesh and pubmed. J Biomed Semant 4(1):1CrossRefGoogle Scholar
  17. 17.
    Fafalios P, Tzitzikas Y (2014) Exploratory professional search through semantic post-analysis of search results. In: Professional search in the modern world, Springer, pp 166–192Google Scholar
  18. 18.
    Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books, CambridgeCrossRefGoogle Scholar
  19. 19.
    Fujii A (2007) Enhancing patent retrieval by citation analysis. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 793–794Google Scholar
  20. 20.
    Fujii A, Iwayama M, Kando N (2004) Overview of patent retrieval task at ntcir-4. In: NTCIRGoogle Scholar
  21. 21.
    Fujii A, Iwayama M, Kando N (2005) Overview of patent retrieval task at ntcir-5. In: In Proceedings of the fifth NTCIR workshop meeting on evaluation of information access technologies: information retrieval, question answering and cross-lingual information access, pp 269–277Google Scholar
  22. 22.
    Fujii A, Iwayama M, Kando N (2007) Overview of the patent retrieval task at the ntcir-6 workshop. In: NTCIRGoogle Scholar
  23. 23.
    Ganguly D, Leveling J, Magdy W, Jones GJ (2011) Patent query reduction using pseudo relevance feedback. In: Proceedings of the 20th ACM international conference on Information and knowledge management, ACM, pp 1953–1956Google Scholar
  24. 24.
    Giachanou A, Salampasis M, Paltoglou G (2015) Multilayer source selection as a tool for supporting patent search and classification. Inf Retr J 18(6):559–585CrossRefGoogle Scholar
  25. 25.
    Gobeill J, Pasche E, Teodoro D, Ruch P (2009) Simple pre and post processing strategies for patent searching in CLEF intellectual property track 2009. In: Multilingual information access evaluation I: text retrieval experiments, Springer, pp 444–451Google Scholar
  26. 26.
    Golestan Far M, Sanne S, Bouadjenek MR, Ferraro G, Hawking D (2015) On term selection techniques for patent prior art search. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 803–806Google Scholar
  27. 27.
    Graf E, Azzopardi L (2008) A methodology for building a patent test collection for prior art search. In: Proceedings of the 2nd international workshop on evaluating information access, EVIAGoogle Scholar
  28. 28.
    Grossman MR, Cormack GV (2011) Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Rich JL & Tech 17:11–16Google Scholar
  29. 29.
    Hall BH, Jaffe A, Trajtenberg M (2005) Market value and patent citations. RAND J Econ 36(1):16–38Google Scholar
  30. 30.
    Harbert T (2013) The law machine. Spectrum 50(11):31–54CrossRefGoogle Scholar
  31. 31.
    Harhoff D, Narin F, Scherer FM, Vopel K (1999) Citation frequency and the value of patented inventions. Rev Econ Stat 81(3):511–515CrossRefGoogle Scholar
  32. 32.
    Harris CG, Foster S, Arens R, Srinivasan P (2009) On the role of classification in patent invalidity searches. In: Proceedings of the 2nd international workshop on patent information retrieval, ACM, pp 29–32Google Scholar
  33. 33.
    Harris CG, Arens R, Srinivasan P (2010) Comparison of ipc and uspc classification systems in patent prior art searches. In: Proceedings of the 3rd international workshop on patent information retrieval, ACM, pp 27–32Google Scholar
  34. 34.
    Hasan MA, Spangler WS, Griffin T, Alba A (2009) COA: finding novel patents through text analysis. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1175–1184Google Scholar
  35. 35.
    Hido S, Suzuki S, Nishiyama R, Imamichi T, Takahashi R, Nasukawa T, Idé T, Kanehira Y, Yohda R, Ueno T et al (2012) Modeling patent quality: a system for large-scale patentability analysis using text mining. Inf Med Technol 7(3):1180–1191Google Scholar
  36. 36.
    Hiemstra D, Robertson S, Zaragoza H (2004) Parsimonious language models for information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp 178–185Google Scholar
  37. 37.
    Hu P, Huang M, Xu P, Li W, Usadi AK, Zhu X (2012) Finding nuggets in IP portfolios: core patent mining through textual temporal analysis. In: Proceedings of the 21st ACM international conference on Information and knowledge management, ACM, pp 1819–1823Google Scholar
  38. 38.
    Iwayama M, Fujii A, Kando N, Takano A (2003) Overview of patent retrieval task at NTCIR-3. In: Proceedings of the ACL-2003 workshop on Patent corpus processing, vol 20, association for computational linguistics, pp 24–32Google Scholar
  39. 39.
    Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst (TOIS) 20(4):422–446CrossRefGoogle Scholar
  40. 40.
    Jin X, Spangler S, Chen Y, Cai K, Ma R, Zhang L, Wu X, Han J (2011) Patent maintenance recommendation with patent information network model. In: 2011 IEEE 11th international conference on data mining (ICDM), IEEE, pp 280–289Google Scholar
  41. 41.
    Jürgens JJ, Hansen P, Womser-Hacker C (2012) Going beyond CLEF-IP: the reality for patent searchers? In: Information access evaluation. Multilinguality, multimodality, and visual analytics, Springer, pp 30–35Google Scholar
  42. 42.
    Kim J, Kang IS, Lee JH (2006) Cluster-based patent retrieval using international patent classification system. In: Computer processing of oriental languages. Beyond the orient, the research challenges ahead, Springer, pp 205–212Google Scholar
  43. 43.
    Konishi K (2005) Query terms extraction from patent document for invalidity search. In: NTCIRGoogle Scholar
  44. 44.
    Krestel R, Smyth P (2013) Recommending patents based on latent topics. In: Proceedings of the 7th ACM conference on Recommender systems, ACM, pp 395–398Google Scholar
  45. 45.
    Lanjouw JO, Pakes A, Putnam J (1998) How to count patents and value intellectual property: the uses of patent renewal and application data. J Ind Econ 46(4):405–432CrossRefGoogle Scholar
  46. 46.
    Liu S, Liu F, Yu C, Meng W (2004) An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 266–272Google Scholar
  47. 47.
    Liu Y, Hseuh Py, Lawrence R, Meliksetian S, Perlich C, Veen A (2011) Latent graphical models for quantifying and predicting patent quality. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1145–1153Google Scholar
  48. 48.
    Lopez P, Romary L (2009) Multiple retrieval models and regression models for prior art search. In: CLEF 2009 workshopGoogle Scholar
  49. 49.
    Lopez P, Romary L (2010) Experiments with citation mining and key-term extraction for prior art search. In: CLEF 2010-conference on multilingual and multimodal information access evaluationGoogle Scholar
  50. 50.
    Lupu M, Huang J, Zhu J, Tait J (2009) Trec-chem: large scale chemical information retrieval evaluation at trec. In: ACM SIGIR forum, ACM, vol 43, pp 63–70Google Scholar
  51. 51.
    Lupu M, Tait J, Huang J, Zhu J (2010) Trec-chem 2010: notebook report. Proc TREC 2010:2Google Scholar
  52. 52.
    Lupu M, Mayer K, Tait J, Trippe AJ (2011a) Current challenges in patent information retrieval, vol 29. Springer, BerlinGoogle Scholar
  53. 53.
    Lupu M, Zhao J, Huang J, Gurulingappa H, Fluck J, Zimmermann M, Filippov IV, Tait J (2011b) Overview of the TREC 2011 chemical IR track. In: TRECGoogle Scholar
  54. 54.
    Lv Y, Zhai C (2009) Adaptive relevance feedback in information retrieval. In: Proceedings of the 18th ACM conference on Information and knowledge management, ACM, pp 255–264Google Scholar
  55. 55.
    Magdy W, Jones GJF (2010) Applying the KISS principle for the CLEF-IP 2010 prior art candidate patent search task. Dublin City University, CLEF labsGoogle Scholar
  56. 56.
    Magdy W, Jones GJ (2010b) Pres: a score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 611–618Google Scholar
  57. 57.
    Magdy W, Jones GJ (2011) A study on query expansion methods for patent retrieval. In: Proceedings of the 4th workshop on Patent information retrieval, ACM, pp 19–24Google Scholar
  58. 58.
    Magdy W, Leveling J, Jones GJ (2009) Exploring structured documents and query formulation techniques for patent retrieval. In: Multilingual information access evaluation I: text retrieval experiments, Springer, pp 410–417Google Scholar
  59. 59.
    Magdy W, Lopez P, Jones GJ (2011) Simple vs. sophisticated approaches for patent prior-art search. In: Advances in information retrieval, Springer, pp 725–728Google Scholar
  60. 60.
    Mahdabi P, Crestani F (2012) Learning-based pseudo-relevance feedback for patent retrieval. In: Multidisciplinary information retrieval, Springer, pp 1–11Google Scholar
  61. 61.
    Mahdabi P, Crestani F (2014a) The effect of citation analysis on query expansion for patent retrieval. Inf Retr 17(5–6):412–429CrossRefGoogle Scholar
  62. 62.
    Mahdabi P, Crestani F (2014b) Patent query formulation by synthesizing multiple sources of relevance evidence. ACM Trans Inf Syst (TOIS) 32(4):16CrossRefGoogle Scholar
  63. 63.
    Mahdabi P, Crestani F (2014c) Query-driven mining of citation networks for patent citation retrieval and recommendation. In: Proceedings of the 23rd ACM International conference on information and knowledge management, ACM, pp 1659–1668Google Scholar
  64. 64.
    Mahdabi P, Keikha M, Gerani S, Landoni M, Crestani F (2011) Building queries for prior-art search. Springer, BerlinCrossRefGoogle Scholar
  65. 65.
    Mahdabi P, Gerani S, Huang JX, Crestani F (2013) Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 113–122Google Scholar
  66. 66.
    Mann RJ, Underweiser M (2012) A new look at patent quality: relating patent prosecution to validity. J Empir Leg Stud 9(1):1–32CrossRefGoogle Scholar
  67. 67.
    Meij E, Weerkamp W, de Rijke M (2009) A query model based on normalized log-likelihood. In: Proceedings of the 18th ACM conference on Information and knowledge management, ACM, pp 1903–1906Google Scholar
  68. 68.
  69. 69.
    Osbeck MK (2015) Using data analytics tools to supplement traditional research and analysis in forecasting case outcomes. U of Michigan Public Law Research Paper Series (446)Google Scholar
  70. 70.
    Osborn M, Strzalkowski T, Marinescu M (1997) Evaluating document retrieval in patent database: a preliminary report. In: Proceedings of the sixth international conference on Information and knowledge management, ACM, pp 216–221Google Scholar
  71. 71.
    Piroi F, Lupu M, Hanbury A, Sexton AP, Magdy W, Filippov IV (2010) CLEF-IP 2010: retrieval experiments in the intellectual property domain. In: CLEF (notebook papers/labs/workshops)Google Scholar
  72. 72.
    Piroi F, Lupu M, Hanbury A, Zenz V (2011) CLEF-IP 2011: retrieval in the intellectual property domain. In: CLEF (notebook papers/labs/workshop), CiteseerGoogle Scholar
  73. 73.
    Piroi F, Lupu M, Hanbury A, Magdy W, Sexton A, Filippov I (2012) CLEF-IP 2012: retrieval experiments in the intellectual property domain, vol 1178, CEUR-WSGoogle Scholar
  74. 74.
    Piroi F, Lupu M, Hanbury A (2013) Information access evaluation. In: Proceedings of CLEF 2013 4th international conference of the CLEF initiative multilinguality, multimodality, and visualization, Valencia, Spain, September 23–26, 2013, Springer, Berlin, chap Overview of CLEF-IP 2013 Lab, pp 232–249.  https://doi.org/10.1007/978-3-642-40802-1_25
  75. 75.
    Rajshekhar K, Shalaby W, Zadrozny W (2016) Analytics in post-grant patent review: possibilities and challenges (preliminary report). In: Proceedings of the American society for engineering management 2016 international annual conferenceGoogle Scholar
  76. 76.
    Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M et al (1995) Okapi at trec-3. NIST Spec Publ SP 109:109Google Scholar
  77. 77.
    Roda G, Tait J, Piroi F, Zenz V (2010) Multilingual information access evaluation I. Text retrieval experiments: 10th workshop of the cross-language evaluation forum, CLEF 2009, Corfu, Greece, September 30–October 2, 2009, Revised selected papers, Springer, Berlin, chap CLEF-IP 2009: retrieval experiments in the intellectual property domain, pp 385–409.  https://doi.org/10.1007/978-3-642-15754-7_47
  78. 78.
    Salampasis M, Hanbury A (2014) Perfedpat: an integrated federated system for patent search. World Pat Inf 38:4–11CrossRefGoogle Scholar
  79. 79.
    Salampasis M, Giachanou A, Hanbury A (2014) An evaluation of an interactive federated patent search system. In: Multidisciplinary information retrieval, Springer, pp 120–131Google Scholar
  80. 80.
    Salton G (1971) The SMART retrieval system-experiments in automatic document processing. Prentice-Hall Inc, Upper Saddle RiverGoogle Scholar
  81. 81.
    Schwartz DL, Sichelman TM (2015) Data sources on patents, copyrights, trademarks, and other intellectual property. Copyrights, Trademarks, and Other Intellectual Property (August 17, 2015) 2Google Scholar
  82. 82.
    Shalaby W, Zadrozny W (2015) Measuring semantic relatedness using mined semantic analysis. arXiv preprint arXiv:1512.03465
  83. 83.
    Shalaby W, Zadrozny W (2016) Innovation analytics using mined semantic analysis. In: Proceedings of the 29th international FLAIRS conferenceGoogle Scholar
  84. 84.
    Shalaby W, Rajshekhar K, Zadrozny W (2016) A visual semantic framework for innovation analytics. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI-16). http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12303/12306
  85. 85.
    Spangler S, Chen Y, Kreulen J, Boyer S, Griffin T, Alba A, Kato L, Lelescu A, Yan S (2010) Simple: interactive analytics on patent data. In: 2010 IEEE international conference on data mining workshops (ICDMW), IEEE, pp 426–433Google Scholar
  86. 86.
    Spangler S, Ying C, Kreulen J, Boyer S, Griffin T, Alba A, Kato L, Lelescu A, Yan S (2011) Exploratory analytics on patent data sets using the simple platform. World Pat Inf 33(4):328–339CrossRefGoogle Scholar
  87. 87.
    Tannebaum W, Rauber A (2012a) Acquiring lexical knowledge from query logs for query expansion in patent searching. In: 2012 IEEE sixth international conference on semantic computing (ICSC), IEEE, pp 336–338Google Scholar
  88. 88.
    Tannebaum W, Rauber A (2012b) Analyzing query logs of USPTO examiners to identify useful query terms in patent documents for query expansion in patent searching: a preliminary study. In: Multidisciplinary information retrieval, Springer, pp 127–136Google Scholar
  89. 89.
    Tannebaum W, Rauber A (2013) Mining query logs of uspto patent examiners. In: Information access evaluation. Multilinguality, multimodality, and visualization, Springer, pp 136–142Google Scholar
  90. 90.
    Tannebaum W, Rauber A (2014) Using query logs of uspto patent examiners for automatic query expansion in patent searching. Inf Retr 17(5–6):452–470CrossRefGoogle Scholar
  91. 91.
    Tannebaum W, Rauber A (2015) Patnet: a lexical database for the patent domain. In: Advances in information retrieval, Springer, pp 550–555Google Scholar
  92. 92.
    Tannebaum W, Mahdabi P, Rauber A (2015) Effect of log-based query term expansion on retrieval effectiveness in patent searching. In: Experimental IR meets multilinguality, multimodality, and interaction, Springer, pp 300–305Google Scholar
  93. 93.
    Trajtenberg M (1990) A penny for your quotes: patent citations and the value of innovations. Rand J Econ 21(1):172–187CrossRefGoogle Scholar
  94. 94.
    Verberne S, D’hondt E (2009) Prior art retrieval using the claims section as a bag of words. In: Multilingual information access evaluation I: text retrieval experiments. Springer, pp 497–501Google Scholar
  95. 95.
    Verma M, Varma V (2011) Patent search using IPC classification vectors. In: Proceedings of the 4th workshop on patent information retrieval, ACM, pp 9–12Google Scholar
  96. 96.
    Voorhees EM (1998) Using wordnet for text retrieval. Fellbaum (Fellbaum, 1998) pp 285–303Google Scholar
  97. 97.
    Wajda J, Zadrozny W (2016) Challenging problems and solutions in intelligent systems. In: Chap prior-art relevance ranking based on the examiner’s query log content, Springer International Publishing, Cham, pp 323–333.  https://doi.org/10.1007/978-3-319-30165-5_15
  98. 98.
    Wanagiri MZ, Adriani M (2010) Prior art retrieval using various patent document fields contents. In: CLEF (Notebook Papers/LABs/Workshops)Google Scholar
  99. 99.
    Wang F, Lin L (2015) Query construction based on concept importance for effective patent retrieval. In: 2015 12th international conference on fuzzy systems and knowledge discovery (FSKD), IEEE, pp 1455–1459Google Scholar
  100. 100.
    Wang S, Lei Z, Lee WC (2014) Exploring legal patent citations for patent valuation. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, ACM, pp 1379–1388Google Scholar
  101. 101.
    Xue X, Croft WB (2009) Transforming patents into prior-art queries. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, ACM, pp 808–809Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of North Carolina at CharlotteCharlotteUSA

Personalised recommendations