Abstract
We propose a document signature approach to patent classification. Automatic patent classification is a challenging task because of the fast growing number of patent applications filed every year and the complexity, size and nested hierarchical structure of patent taxonomies. In our proposal, the classification of a target patent is achieved through a k-nearest neighbour search using Hamming distance on signatures generated from patents; the classification labels of the retrieved patents are weighted and combined to produce a patent classification code for the target patent. The use of this method is motivated by the fact that intuitively document signatures are more efficient than previous approaches for this task that considered the training of classifiers on the whole vocabulary feature set. Our empirical experiments also demonstrate that the combination of document signatures and k-nearest neighbours search improves classification effectiveness, provided that enough data is used to generate signatures.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Previous work has also used the first 300 words extracted from each patent: this setting has in fact shown strong promise [7].
- 2.
No publicly available implementation of Fall et al.’s methods was available and our re-implementation did not lead to effectiveness comparable to the reported one. We were therefore unable to obtain efficiency figures for the benchmark methods. Similarly, we were unable to test for significant differences.
References
Chappell, T., Geva, S., Zuccon, G.: Approximate nearest-neighbour search with inverted signature slice lists. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 147–158. Springer, Heidelberg (2015)
Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: Proceedings of CIKM 2004, pp. 78–87 (2004)
Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Using taxonomy, discriminants, and signatures for navigating in text databases. VLDB 97, 446–455 (1997)
Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J. 7(3), 163–178 (1998)
Chen, Y.-L., Chang, Y.-C.: A three-phase method for patent classification. Inf. Process. Manage. 48(6), 1017–1030 (2012)
De Vries, C.M., Geva, S.: Pairwise similarity of topsig document signatures. In: Proceedings of ADCS 2012, pp. 128–134 (2012)
Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G.: Automated categorization in the international patent classification. SIGIR Forum 37(1), 10–25 (2003)
Faloutsos, C.: Signature-based text retrieval methods: a survey. Data Eng. 13(1), 25–32 (1990)
Geva, S., De Vries, C.M.: TopSig: topology preserving document signatures. In: Proceedings of CIKM 2011, pp. 333–338 (2011)
Kim, J.-H., Choi, K.-S.: Patent document categorization based on semantic structural information. Inf. Process. Manage. 43(5), 1200–1215 (2007). Patent Processing
Larkey, L.S.: A patent search and classification system. In: Proceedings of DL 1999, pp. 179–187 (1999)
Tikk, D.: A hierarchical online classifier for patent categorization, pp. 244–267 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Seneviratne, D., Geva, S., Zuccon, G., Ferraro, G., Chappell, T., Meireles, M. (2015). A Signature Approach to Patent Classification. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-28940-3_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28939-7
Online ISBN: 978-3-319-28940-3
eBook Packages: Computer ScienceComputer Science (R0)