Skip to main content

A Survey of Automated Hierarchical Classification of Patents

  • Chapter
Book cover Professional Search in the Modern World

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8830))

Abstract

In this era of “big data”, hundreds or even thousands of patent applications arrive every day to patent offices around the world. One of the first tasks of the professional analysts in patent offices is to assign classification codes to those patents based on their content. Such classification codes are usually organized in hierarchical structures of concepts. Traditionally the classification task has been done manually by professional experts. However, given the large amount of documents, the patent professionals are becoming overwhelmed. If we add that the hierarchical structures of classification are very complex (containing thousands of categories), reliable, fast and scalable methods and algorithms are needed to help the experts in patent classification tasks. This chapter describes, analyzes and reviews systems that, based on the textual content of patents, automatically classify such patents into a hierarchy of categories. This chapter focuses specially in the patent classification task applied for the International Patent Classification (IPC) hierarchy. The IPC is the most used classification structure to organize patents, it is world-wide recognized, and several other structures use or are based on it to ensure office inter-operability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aiolli, F., Cardin, R., Sebastiani, F., Sperduti, A.: Preferential text classification: Learning algorithms and evaluation measures. Information Retrieval 12(5), 559–580 (2009)

    Article  Google Scholar 

  2. Beney, J.: LCI-INSA linguistic experiment for CLEF-IP classification track. In: CLEF (Notebook Papers/LABs/Workshops) (2010)

    Google Scholar 

  3. Bennett, P.N., Nguyen, N.: Refined experts: Improving classification in large taxonomies. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–18. ACM (2009)

    Google Scholar 

  4. Benzineb, K., Guyot, J.: Automated patent classification. In: Lupu, M., Mayer, K., Tait, J., Trippe, A.J. (eds.) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol. 29, pp. 239–261. Springer (2011)

    Google Scholar 

  5. Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. Springer (2006)

    Google Scholar 

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  7. Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pp. 78–87. ACM (2004)

    Google Scholar 

  8. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27:1–27:27 (2011)

    Google Scholar 

  9. Chen, Y.L., Chang, Y.C.: A three-phase method for patent classification. Information Processing and Management 48(6), 1017–1030 (2012)

    Article  MathSciNet  Google Scholar 

  10. Clare, A.J., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  11. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  12. CPC: Website of the Cooperative Patent Classification, http://www.cooperativepatentclassification.org/index.html (2013) (accessed: January 01, 2014)

  13. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  14. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MATH  Google Scholar 

  15. Derieux, F., Bobeica, M., Pois, D., Raysz, J.P.: Combining semantics and statistics for patent classification. In: CLEF (Notebook Papers/LABs/Workshops) (2010)

    Google Scholar 

  16. Deschacht, K., Moens, M.F.: Efficient hierarchical entity classifier using conditional random fields. In: Proceedings of the 2nd Workshop on Ontology Learning and Population, pp. 33–40 (2006)

    Google Scholar 

  17. Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 681–687. MIT (2002)

    Google Scholar 

  18. EPO: Website of the European Patent Office, http://www.epo.org/ (accessed: January 1, 2014)

  19. Fall, C.J., Benzineb, K.: Literature survey: Issues to be considered in the automatic classification of patents. Tech. rep., World Intellectual Property Organization (October 2002)

    Google Scholar 

  20. Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G.: Automated categorization in the international patent classification. SIGIR Forum 37(1), 10–25 (2003)

    Article  Google Scholar 

  21. Fall, C., Törcsvári, A., Fiévet, P., Karetka, G.: Automated categorization of German-language patent documents. Expert Systems with Applications 26(2), 269–277 (2004)

    Article  Google Scholar 

  22. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)

    MATH  Google Scholar 

  23. Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)

    MATH  Google Scholar 

  24. Gomez, J.C., Boiy, E., Moens, M.F.: Highly discriminative statistical features for email classification. Knowledge and Information Systems 31(1), 23–53 (2012)

    Article  Google Scholar 

  25. Gomez, J.C., Moens, M.-F.: Hierarchical classification of web documents by stratified discriminant analysis. In: Salampasis, M., Larsen, B. (eds.) IRFC 2012. LNCS, vol. 7356, pp. 94–108. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  26. Gomez, J.C., Moens, M.F.: PCA document reconstruction for email classification. Computational Statistics & Data Analysis 56(3), 741–751 (2012)

    Article  MathSciNet  Google Scholar 

  27. Gomez, J.C., Moens, M.F.: Minimizer of the reconstruction error for multi-class document categorization. Expert Systems with Applications 41(3), 861–868 (2014)

    Article  Google Scholar 

  28. Guyot, J., Benzineb, K., Falquet, G., Shift, S.: myclass: A mature tool for patent classification. In: CLEF (Notebook Papers/LABs/Workshops) (2010)

    Google Scholar 

  29. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2006)

    Google Scholar 

  30. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall (1994)

    Google Scholar 

  31. Hofmann, T., Cai, L., Ciaramita, M.: Learning with taxonomies: Classifying documents and words. In: NIPS Workshop on Syntax, Semantics, and Statistics (2003)

    Google Scholar 

  32. Hull, D.A.: Stemming algorithms: A case study for detailed evaluation. Journal of the American Society for Information Science 47(1), 70–84 (1996)

    Article  Google Scholar 

  33. Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons (2011)

    Google Scholar 

  34. Seutter, C.H.A.K.M., Beney, J.G.: Multi-classification of patent applications with Winnow. In: Broy, M., Zamulin, A.V. (eds.) PSI 2003. LNCS, vol. 2890, pp. 546–555. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  35. Krier, M., Zaccà, F.: Automatic categorisation applications at the European patent office. World Patent Information 24(3), 187–196 (2002)

    Article  Google Scholar 

  36. Larkey, L.S.: A patent search and classification system. In: Proceedings of the 4th ACM Conference on Digital Libraries, pp. 179–187. ACM (1999)

    Google Scholar 

  37. Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS (LNAI), vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Google Scholar 

  38. Li, W.: Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Transactions on Information Theory 38(6), 1842–1845 (1992)

    Article  Google Scholar 

  39. Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2(4), 285–318 (1988)

    Google Scholar 

  40. Lupu, M., Hanbury, A.: Patent retrieval. Foundations and Trends in Information Retrieval 7(1), 1–97 (2013)

    Article  Google Scholar 

  41. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

    Google Scholar 

  42. McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48. AAAI Press (1998)

    Google Scholar 

  43. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press (2012)

    Google Scholar 

  44. Nanba, H., Fujii, A., Iwayama, M., Hashimoto, T.: Overview of the patent mining task at the NTCIR-7 workshop. In: Proceedings of the NII Test Collection for IR Systems-7. NTCIR (2008)

    Google Scholar 

  45. Nanba, H., Fujii, A., Iwayama, M., Hashimoto, T.: Overview of the patent mining task at the NTCIR-8 workshop. In: Proceedings of the NII Test Collection for IR Systems-8. NTCIR (2010)

    Google Scholar 

  46. Piroi, F., Lupu, M., Hanbury, A., Zenz, V.: CLEF-IP 2011: Retrieval in the intellectual property domain. In: Petras, V., Forner, P., Clough, P.D. (eds.) Proceedings of CLEF 2011 (Notebook Papers/Labs/Workshop) (2011)

    Google Scholar 

  47. Piroi, F.: CLEF-IP 2010: Classification task evaluation summary. Tech. Rep. IRF-TR-2010-00005, Information Retrieval Facility (August 2010)

    Google Scholar 

  48. Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137 (1980)

    Article  Google Scholar 

  49. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  50. Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Kernel-based learning of hierarchical multilabel classification models. Journal of Machine Learning Research 7, 1601–1626 (2006)

    MathSciNet  MATH  Google Scholar 

  51. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  52. Seeger, M.: Cross-validation optimization for large scale hierarchical classification kernel methods. In: Advances in Neural Information Processing Systems, pp. 1233–1240 (2006)

    Google Scholar 

  53. Seung, D., Lee, L.: Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13, 556–562 (2001)

    Google Scholar 

  54. Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for svm. In: Proceedings of the 24th International Conference on Machine Learning, pp. 807–814. ACM (2007)

    Google Scholar 

  55. Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22(1-2), 31–72 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  56. Smith, H.: Automation of patent classification. World Patent Information 24(4), 269–271 (2002)

    Article  Google Scholar 

  57. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Information Processing and Management 45(4), 427–437 (2009)

    Article  Google Scholar 

  58. Tang, L., Rajan, S., Narayanan, V.K.: Large scale multi-label classification via metalabeler. In: Proceedings of the 18th International Conference on World Wide Web, pp. 211–220. ACM (2009)

    Google Scholar 

  59. Teodoro, D., Gobeill, J., Pasche, E., Ruch, P., Vishnyakova, D., Lovis, C.: Automatic IPC encoding and novelty tracking for effective patent mining. In: Proceedings of the 8th NTCIR Workshop Meeting, pp. 309–317. National Institute of Informatics Japan (2010)

    Google Scholar 

  60. Tikk, D., Biró, G., Yang, J.: Experiment with a hierarchical text categorization method on WIPO patent collections. In: Attoh-Okine, N., Ayyub, B. (eds.) Applied Research in Uncertainty Modeling and Analysis. International Series in Intelligent Technologies, vol. 20, pp. 283–302. Springer (2005)

    Google Scholar 

  61. Torkkola, K.: Linear discriminant analysis in document classification. In: IEEE ICDM Workshop on Text Mining, pp. 800–806. IEEE (2001)

    Google Scholar 

  62. Trappey, A.J.C., Hsu, F.C., Trappey, C.V., Lin, C.I.: Development of a patent document classification and search platform using a back-propagation network. Expert Systems with Applications 31(4), 755–765 (2006)

    Article  Google Scholar 

  63. Tseng, Y.H., Lin, C.J., Lin, Y.I.: Text mining techniques for patent analysis. Information Processing and Management 43(5), 1216–1247 (2007)

    Article  Google Scholar 

  64. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research 6, 1453–1484 (2005)

    MathSciNet  MATH  Google Scholar 

  65. Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer (2010)

    Google Scholar 

  66. USPTO: Website of the United States Patent and Trademark Office (2014), http://www.uspto.gov/ (accessed January 01, 2014)

  67. Verberne, S., D’hondt, E.: Patent classification experiments with the Linguistic Classification System LCS in CLEF-IP 2011. In: Proceedings of CLEF 2011 (Notebook Papers/Labs/Workshop) (2011)

    Google Scholar 

  68. Verberne, S., Vogel, M., D’hondt, E.: Patent classification experiments with the linguistic classification system LCS. In: CLEF (Notebook Papers/LABs/Workshops) (2010)

    Google Scholar 

  69. Vishwanathan, S.V., Schraudolph, N.N., Smola, A.J.: Step size adaptation in reproducing kernel hilbert space. Journal of Machine Learning Research 7, 1107–1133 (2006)

    MathSciNet  MATH  Google Scholar 

  70. Wanner, L., Baeza-Yates, R., Brügmann, S., Codina, J., Diallo, B., Escorsa, E., Giereth, M., Kompatsiaris, Y., Papadopoulos, S., Pianta, E., Piella, G., Puhlmann, I., Rao, G., Rotard, M., Schoester, P., Serafini, L., Zervaki, V.: Towards content-oriented patent document processing. World Patent Information 30(1), 21–33 (2008)

    Article  Google Scholar 

  71. Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 1106–1110. ACL (1992)

    Google Scholar 

  72. WIPO: WIPO-alpha readme (2009), http://www.wipo.int/classifications/ipc/en/ITsupport/Categorization/dataset/wipo-alpha-readme.html (accessed: January 01, 2014)

  73. WIPO: Website of the World Intellectual Property Organization (2014), http://www.wipo.int/export/sites/www/classifications/ipc/en/guide/guide_ipc.pdf (accessed: January 01, 2014)

  74. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier (2011)

    Google Scholar 

  75. Wu, F., Zhang, J., Honavar, V.: Learning classifiers using hierarchically structured class taxonomies. In: Zucker, J.-D., Saitta, L. (eds.) SARA 2005. LNCS (LNAI), vol. 3607, pp. 313–320. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  76. Xiao, T., Cao, F., Li, T., Song, G., Zhou, K., Zhu, J., Wang, H.: kNN and re-ranking models for English patent mining at NTICR-7. In: Proceedings of the 7th NTCIR Workshop Meeting. National Institute of Informatics Japan (2008)

    Google Scholar 

  77. Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1-2), 69–90 (1999)

    Article  Google Scholar 

  78. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann (1997)

    Google Scholar 

  79. Zhang, M.L., Zhou, Z.H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering 18(10), 1338–1351 (2006)

    Article  Google Scholar 

  80. Zhang, M.L., Zhou, Z.H.: ML-kNN: A lazy learning approach to multi-label learning. Pattern Recognition 40(7), 2038–2048 (2007)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Gomez, J.C., Moens, MF. (2014). A Survey of Automated Hierarchical Classification of Patents. In: Paltoglou, G., Loizides, F., Hansen, P. (eds) Professional Search in the Modern World. Lecture Notes in Computer Science, vol 8830. Springer, Cham. https://doi.org/10.1007/978-3-319-12511-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12511-4_11

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12510-7

  • Online ISBN: 978-3-319-12511-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics