A Survey of Automated Hierarchical Classification of Patents

Gomez, Juan Carlos; Moens, Marie-Francine

doi:10.1007/978-3-319-12511-4_11

Juan Carlos Gomez¹⁸ &
Marie-Francine Moens¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8830))

1006 Accesses
10 Citations

Abstract

In this era of “big data”, hundreds or even thousands of patent applications arrive every day to patent offices around the world. One of the first tasks of the professional analysts in patent offices is to assign classification codes to those patents based on their content. Such classification codes are usually organized in hierarchical structures of concepts. Traditionally the classification task has been done manually by professional experts. However, given the large amount of documents, the patent professionals are becoming overwhelmed. If we add that the hierarchical structures of classification are very complex (containing thousands of categories), reliable, fast and scalable methods and algorithms are needed to help the experts in patent classification tasks. This chapter describes, analyzes and reviews systems that, based on the textual content of patents, automatically classify such patents into a hierarchy of categories. This chapter focuses specially in the patent classification task applied for the International Patent Classification (IPC) hierarchy. The IPC is the most used classification structure to organize patents, it is world-wide recognized, and several other structures use or are based on it to ensure office inter-operability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aiolli, F., Cardin, R., Sebastiani, F., Sperduti, A.: Preferential text classification: Learning algorithms and evaluation measures. Information Retrieval 12(5), 559–580 (2009)
Article Google Scholar
Beney, J.: LCI-INSA linguistic experiment for CLEF-IP classification track. In: CLEF (Notebook Papers/LABs/Workshops) (2010)
Google Scholar
Bennett, P.N., Nguyen, N.: Refined experts: Improving classification in large taxonomies. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–18. ACM (2009)
Google Scholar
Benzineb, K., Guyot, J.: Automated patent classification. In: Lupu, M., Mayer, K., Tait, J., Trippe, A.J. (eds.) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol. 29, pp. 239–261. Springer (2011)
Google Scholar
Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. Springer (2006)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pp. 78–87. ACM (2004)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27:1–27:27 (2011)
Google Scholar
Chen, Y.L., Chang, Y.C.: A three-phase method for patent classification. Information Processing and Management 48(6), 1017–1030 (2012)
Article MathSciNet Google Scholar
Clare, A.J., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)
Chapter Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
CPC: Website of the Cooperative Patent Classification, http://www.cooperativepatentclassification.org/index.html (2013) (accessed: January 01, 2014)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
MATH Google Scholar
Derieux, F., Bobeica, M., Pois, D., Raysz, J.P.: Combining semantics and statistics for patent classification. In: CLEF (Notebook Papers/LABs/Workshops) (2010)
Google Scholar
Deschacht, K., Moens, M.F.: Efficient hierarchical entity classifier using conditional random fields. In: Proceedings of the 2nd Workshop on Ontology Learning and Population, pp. 33–40 (2006)
Google Scholar
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 681–687. MIT (2002)
Google Scholar
EPO: Website of the European Patent Office, http://www.epo.org/ (accessed: January 1, 2014)
Fall, C.J., Benzineb, K.: Literature survey: Issues to be considered in the automatic classification of patents. Tech. rep., World Intellectual Property Organization (October 2002)
Google Scholar
Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G.: Automated categorization in the international patent classification. SIGIR Forum 37(1), 10–25 (2003)
Article Google Scholar
Fall, C., Törcsvári, A., Fiévet, P., Karetka, G.: Automated categorization of German-language patent documents. Expert Systems with Applications 26(2), 269–277 (2004)
Article Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)
MATH Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
MATH Google Scholar
Gomez, J.C., Boiy, E., Moens, M.F.: Highly discriminative statistical features for email classification. Knowledge and Information Systems 31(1), 23–53 (2012)
Article Google Scholar
Gomez, J.C., Moens, M.-F.: Hierarchical classification of web documents by stratified discriminant analysis. In: Salampasis, M., Larsen, B. (eds.) IRFC 2012. LNCS, vol. 7356, pp. 94–108. Springer, Heidelberg (2012)
Chapter Google Scholar
Gomez, J.C., Moens, M.F.: PCA document reconstruction for email classification. Computational Statistics & Data Analysis 56(3), 741–751 (2012)
Article MathSciNet Google Scholar
Gomez, J.C., Moens, M.F.: Minimizer of the reconstruction error for multi-class document categorization. Expert Systems with Applications 41(3), 861–868 (2014)
Article Google Scholar
Guyot, J., Benzineb, K., Falquet, G., Shift, S.: myclass: A mature tool for patent classification. In: CLEF (Notebook Papers/LABs/Workshops) (2010)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2006)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall (1994)
Google Scholar
Hofmann, T., Cai, L., Ciaramita, M.: Learning with taxonomies: Classifying documents and words. In: NIPS Workshop on Syntax, Semantics, and Statistics (2003)
Google Scholar
Hull, D.A.: Stemming algorithms: A case study for detailed evaluation. Journal of the American Society for Information Science 47(1), 70–84 (1996)
Article Google Scholar
Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons (2011)
Google Scholar
Seutter, C.H.A.K.M., Beney, J.G.: Multi-classification of patent applications with Winnow. In: Broy, M., Zamulin, A.V. (eds.) PSI 2003. LNCS, vol. 2890, pp. 546–555. Springer, Heidelberg (2004)
Chapter Google Scholar
Krier, M., Zaccà, F.: Automatic categorisation applications at the European patent office. World Patent Information 24(3), 187–196 (2002)
Article Google Scholar
Larkey, L.S.: A patent search and classification system. In: Proceedings of the 4th ACM Conference on Digital Libraries, pp. 179–187. ACM (1999)
Google Scholar
Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS (LNAI), vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Google Scholar
Li, W.: Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Transactions on Information Theory 38(6), 1842–1845 (1992)
Article Google Scholar
Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2(4), 285–318 (1988)
Google Scholar
Lupu, M., Hanbury, A.: Patent retrieval. Foundations and Trends in Information Retrieval 7(1), 1–97 (2013)
Article Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48. AAAI Press (1998)
Google Scholar
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press (2012)
Google Scholar
Nanba, H., Fujii, A., Iwayama, M., Hashimoto, T.: Overview of the patent mining task at the NTCIR-7 workshop. In: Proceedings of the NII Test Collection for IR Systems-7. NTCIR (2008)
Google Scholar
Nanba, H., Fujii, A., Iwayama, M., Hashimoto, T.: Overview of the patent mining task at the NTCIR-8 workshop. In: Proceedings of the NII Test Collection for IR Systems-8. NTCIR (2010)
Google Scholar
Piroi, F., Lupu, M., Hanbury, A., Zenz, V.: CLEF-IP 2011: Retrieval in the intellectual property domain. In: Petras, V., Forner, P., Clough, P.D. (eds.) Proceedings of CLEF 2011 (Notebook Papers/Labs/Workshop) (2011)
Google Scholar
Piroi, F.: CLEF-IP 2010: Classification task evaluation summary. Tech. Rep. IRF-TR-2010-00005, Information Retrieval Facility (August 2010)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137 (1980)
Article Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Kernel-based learning of hierarchical multilabel classification models. Journal of Machine Learning Research 7, 1601–1626 (2006)
MathSciNet MATH Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Seeger, M.: Cross-validation optimization for large scale hierarchical classification kernel methods. In: Advances in Neural Information Processing Systems, pp. 1233–1240 (2006)
Google Scholar
Seung, D., Lee, L.: Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13, 556–562 (2001)
Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for svm. In: Proceedings of the 24th International Conference on Machine Learning, pp. 807–814. ACM (2007)
Google Scholar
Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22(1-2), 31–72 (2011)
Article MathSciNet MATH Google Scholar
Smith, H.: Automation of patent classification. World Patent Information 24(4), 269–271 (2002)
Article Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Information Processing and Management 45(4), 427–437 (2009)
Article Google Scholar
Tang, L., Rajan, S., Narayanan, V.K.: Large scale multi-label classification via metalabeler. In: Proceedings of the 18th International Conference on World Wide Web, pp. 211–220. ACM (2009)
Google Scholar
Teodoro, D., Gobeill, J., Pasche, E., Ruch, P., Vishnyakova, D., Lovis, C.: Automatic IPC encoding and novelty tracking for effective patent mining. In: Proceedings of the 8th NTCIR Workshop Meeting, pp. 309–317. National Institute of Informatics Japan (2010)
Google Scholar
Tikk, D., Biró, G., Yang, J.: Experiment with a hierarchical text categorization method on WIPO patent collections. In: Attoh-Okine, N., Ayyub, B. (eds.) Applied Research in Uncertainty Modeling and Analysis. International Series in Intelligent Technologies, vol. 20, pp. 283–302. Springer (2005)
Google Scholar
Torkkola, K.: Linear discriminant analysis in document classification. In: IEEE ICDM Workshop on Text Mining, pp. 800–806. IEEE (2001)
Google Scholar
Trappey, A.J.C., Hsu, F.C., Trappey, C.V., Lin, C.I.: Development of a patent document classification and search platform using a back-propagation network. Expert Systems with Applications 31(4), 755–765 (2006)
Article Google Scholar
Tseng, Y.H., Lin, C.J., Lin, Y.I.: Text mining techniques for patent analysis. Information Processing and Management 43(5), 1216–1247 (2007)
Article Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research 6, 1453–1484 (2005)
MathSciNet MATH Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer (2010)
Google Scholar
USPTO: Website of the United States Patent and Trademark Office (2014), http://www.uspto.gov/ (accessed January 01, 2014)
Verberne, S., D’hondt, E.: Patent classification experiments with the Linguistic Classification System LCS in CLEF-IP 2011. In: Proceedings of CLEF 2011 (Notebook Papers/Labs/Workshop) (2011)
Google Scholar
Verberne, S., Vogel, M., D’hondt, E.: Patent classification experiments with the linguistic classification system LCS. In: CLEF (Notebook Papers/LABs/Workshops) (2010)
Google Scholar
Vishwanathan, S.V., Schraudolph, N.N., Smola, A.J.: Step size adaptation in reproducing kernel hilbert space. Journal of Machine Learning Research 7, 1107–1133 (2006)
MathSciNet MATH Google Scholar
Wanner, L., Baeza-Yates, R., Brügmann, S., Codina, J., Diallo, B., Escorsa, E., Giereth, M., Kompatsiaris, Y., Papadopoulos, S., Pianta, E., Piella, G., Puhlmann, I., Rao, G., Rotard, M., Schoester, P., Serafini, L., Zervaki, V.: Towards content-oriented patent document processing. World Patent Information 30(1), 21–33 (2008)
Article Google Scholar
Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 1106–1110. ACL (1992)
Google Scholar
WIPO: WIPO-alpha readme (2009), http://www.wipo.int/classifications/ipc/en/ITsupport/Categorization/dataset/wipo-alpha-readme.html (accessed: January 01, 2014)
WIPO: Website of the World Intellectual Property Organization (2014), http://www.wipo.int/export/sites/www/classifications/ipc/en/guide/guide_ipc.pdf (accessed: January 01, 2014)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier (2011)
Google Scholar
Wu, F., Zhang, J., Honavar, V.: Learning classifiers using hierarchically structured class taxonomies. In: Zucker, J.-D., Saitta, L. (eds.) SARA 2005. LNCS (LNAI), vol. 3607, pp. 313–320. Springer, Heidelberg (2005)
Chapter Google Scholar
Xiao, T., Cao, F., Li, T., Song, G., Zhou, K., Zhu, J., Wang, H.: kNN and re-ranking models for English patent mining at NTICR-7. In: Proceedings of the 7th NTCIR Workshop Meeting. National Institute of Informatics Japan (2008)
Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1-2), 69–90 (1999)
Article Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann (1997)
Google Scholar
Zhang, M.L., Zhou, Z.H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering 18(10), 1338–1351 (2006)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-kNN: A lazy learning approach to multi-label learning. Pattern Recognition 40(7), 2038–2048 (2007)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, KU Leuven, Celestijnenlaan 200A, 3001, Heverlee, Belgium
Juan Carlos Gomez & Marie-Francine Moens

Authors

Juan Carlos Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Francine Moens
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Technology, University of Wolverhampton, Wulfruna Street, WV1 1LY, Wolverhampton, UK
Georgios Paltoglou
Department of Multimedia and Graphic Arts, Cyprus University of Technology, Limassol, Cyprus
Fernando Loizides
Swedish Institute of Computer Science, Isafjordsgatan 22, SE-164 28, Kista, Sweden
Preben Hansen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gomez, J.C., Moens, MF. (2014). A Survey of Automated Hierarchical Classification of Patents. In: Paltoglou, G., Loizides, F., Hansen, P. (eds) Professional Search in the Modern World. Lecture Notes in Computer Science, vol 8830. Springer, Cham. https://doi.org/10.1007/978-3-319-12511-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-12511-4_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12510-7
Online ISBN: 978-3-319-12511-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics