Patent Document Clustering Using Dimensionality Reduction

Girthana, K.; Swamynathan, S.

doi:10.1007/978-981-10-6875-1_17

K. Girthana¹⁹ &
S. Swamynathan¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 564))

1123 Accesses

Abstract

Patents are a type of intellectual property rights that provide exclusive rights to the invention. Whenever there is a novelty or an invention, prior art search on patents is carried out to check the degree of innovation. Clustering is used to group the relevant documents of prior art search to gain insights about the patent document. The patent documents represent hundreds of features (words extracted from the title and abstract fields). The common sets of features between the documents are subtle. Therefore, the number of features for clustering increases drastically. This leads to the curse of dimensionality. Hence, in this work, dimensionality reduction techniques such as PCA and SVD are employed to compare and analyze the quality of clusters formed from the Google patent documents. This comparative analysis was performed by considering title, abstract, and classification code fields of the patent document. Classification code information was used to decide the number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, London (1979)
MATH Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Elsevier (2011)
MATH Google Scholar
Ding, C., He, X.: K-means clustering via principal component analysis. In: 21st International Conference on Machine Learning (ICML-04), p. 29. ACM (2004)
Google Scholar
Li, C.H., Park, S.C.: An efficient document classification model using an improved back propagation neural network and singular value decomposition. Expert Syst. Appl. 36(2), 3208–3215 (2009)
Article Google Scholar
Gaff, B.M., Rubinger, B.: The significance of prior art. Computer. 8, 9–11 (2014)
Google Scholar
Jun, S., Park, S.S., Jang, D.S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst. Appl. 41(7), 3204–3212 (2014)
Article Google Scholar
Andrews, N.O., Fox, E.A.: Recent developments in document clustering. Technical Report TR-07-35. Department of Computer Science, Polytechnic Institute & State University (2007)
Google Scholar
Huang, S.H., Ke, H.R., Yang, W.P.: Structure clustering for Chinese patent documents. Expert Syst. Appl. 34(4), 2290–2297 (2008)
Article Google Scholar
Balabantaray, R.C., Sarma, C., Jha, M.: Document clustering using K-Means and K-Medoids. Int. J. Knowl. Based Comput. Syst. 1(1) (2015)
Google Scholar
Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pp. 9–15 (1998)
Google Scholar
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based approach to browsing large document collections. In: 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR92), pp. 318–329. ACM (1992)
Google Scholar
Kang, I.S., Na, S.H., Kim, J., Lee, J.H.: Cluster-based patent retrieval. Inf. Process. Manag. 43(5), 1173–1182 (2007)
Article Google Scholar
Aggarwal, C.C., Yu, P.S.: Finding Generalized Projected Clusters in High Dimensional Spaces. ACM, vol. 29, no. 2 (2000)
Google Scholar
Mugunthadevi, K., Punitha, S.C., Punithavalli, M.: Survey on feature selection in document clustering. Int. J. Comput. Sci. Eng. 3(3), 1240–1241 (2011)
Google Scholar
Kumar, C.A.: Analysis of unsupervised dimensionality reduction techniques. Comput. Sci. Inf. Syst. 6(2), 217–227 (2009)
Article Google Scholar
Tang, B., Shepherd, M., Heywood, M. I., Luo, X.: Comparing dimension reduction techniques for document clustering. Adv. Artif. Intell. 292–296 (2005)
Google Scholar
Kadhim, A.I., Cheah, Y.N., Ahamed, N.H.: Text document preprocessing and dimension reduction techniques for text document clustering. In: 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology (ICAIET), pp. 69–73. IEEE (2014)
Google Scholar
Liu, T., Liu, S., Chen, Z., Ma, W.Y.: An evaluation on feature selection for text clustering. ICML 3, 488–495 (2003)
Google Scholar
Kantrowitz, M., Mohit, B., Mittal, V.: Stemming and its effects on TFIDF ranking (poster session). In: 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 357–359. ACM (2000)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Progr. Electron. Libr. Inf. Syst. 40(3), 130–137 (1980)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Science and Technology, Anna University, Chennai, 600025, India
K. Girthana & S. Swamynathan

Authors

K. Girthana
View author publications
You can also search for this author in PubMed Google Scholar
S. Swamynathan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Girthana .

Editor information

Editors and Affiliations

Faculty of Computer Science, Bialystok University of Technology, Białystok, Poland
Khalid Saeed
Dept. of Computer Science & Engg., University of Calcutta Dept. of Computer Science & Engg., Kolkata, West Bengal, India
Nabendu Chaki
C. V. Raman College of Engineering, Bhubaneswar, Odisha, India
Bibudhendu Pati
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Sambit Bakshi
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Durga Prasad Mohapatra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Girthana, K., Swamynathan, S. (2018). Patent Document Clustering Using Dimensionality Reduction. In: Saeed, K., Chaki, N., Pati, B., Bakshi, S., Mohapatra, D. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 564. Springer, Singapore. https://doi.org/10.1007/978-981-10-6875-1_17

Download citation

DOI: https://doi.org/10.1007/978-981-10-6875-1_17
Published: 22 December 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6874-4
Online ISBN: 978-981-10-6875-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics