Abstract
Patents are a type of intellectual property rights that provide exclusive rights to the invention. Whenever there is a novelty or an invention, prior art search on patents is carried out to check the degree of innovation. Clustering is used to group the relevant documents of prior art search to gain insights about the patent document. The patent documents represent hundreds of features (words extracted from the title and abstract fields). The common sets of features between the documents are subtle. Therefore, the number of features for clustering increases drastically. This leads to the curse of dimensionality. Hence, in this work, dimensionality reduction techniques such as PCA and SVD are employed to compare and analyze the quality of clusters formed from the Google patent documents. This comparative analysis was performed by considering title, abstract, and classification code fields of the patent document. Classification code information was used to decide the number of clusters.
References
Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, London (1979)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Elsevier (2011)
Ding, C., He, X.: K-means clustering via principal component analysis. In: 21st International Conference on Machine Learning (ICML-04), p. 29. ACM (2004)
Li, C.H., Park, S.C.: An efficient document classification model using an improved back propagation neural network and singular value decomposition. Expert Syst. Appl. 36(2), 3208–3215 (2009)
Gaff, B.M., Rubinger, B.: The significance of prior art. Computer. 8, 9–11 (2014)
Jun, S., Park, S.S., Jang, D.S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst. Appl. 41(7), 3204–3212 (2014)
Andrews, N.O., Fox, E.A.: Recent developments in document clustering. Technical Report TR-07-35. Department of Computer Science, Polytechnic Institute & State University (2007)
Huang, S.H., Ke, H.R., Yang, W.P.: Structure clustering for Chinese patent documents. Expert Syst. Appl. 34(4), 2290–2297 (2008)
Balabantaray, R.C., Sarma, C., Jha, M.: Document clustering using K-Means and K-Medoids. Int. J. Knowl. Based Comput. Syst. 1(1) (2015)
Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pp. 9–15 (1998)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based approach to browsing large document collections. In: 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR92), pp. 318–329. ACM (1992)
Kang, I.S., Na, S.H., Kim, J., Lee, J.H.: Cluster-based patent retrieval. Inf. Process. Manag. 43(5), 1173–1182 (2007)
Aggarwal, C.C., Yu, P.S.: Finding Generalized Projected Clusters in High Dimensional Spaces. ACM, vol. 29, no. 2 (2000)
Mugunthadevi, K., Punitha, S.C., Punithavalli, M.: Survey on feature selection in document clustering. Int. J. Comput. Sci. Eng. 3(3), 1240–1241 (2011)
Kumar, C.A.: Analysis of unsupervised dimensionality reduction techniques. Comput. Sci. Inf. Syst. 6(2), 217–227 (2009)
Tang, B., Shepherd, M., Heywood, M. I., Luo, X.: Comparing dimension reduction techniques for document clustering. Adv. Artif. Intell. 292–296 (2005)
Kadhim, A.I., Cheah, Y.N., Ahamed, N.H.: Text document preprocessing and dimension reduction techniques for text document clustering. In: 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology (ICAIET), pp. 69–73. IEEE (2014)
Liu, T., Liu, S., Chen, Z., Ma, W.Y.: An evaluation on feature selection for text clustering. ICML 3, 488–495 (2003)
Kantrowitz, M., Mohit, B., Mittal, V.: Stemming and its effects on TFIDF ranking (poster session). In: 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 357–359. ACM (2000)
Porter, M.F.: An algorithm for suffix stripping. Progr. Electron. Libr. Inf. Syst. 40(3), 130–137 (1980)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Girthana, K., Swamynathan, S. (2018). Patent Document Clustering Using Dimensionality Reduction. In: Saeed, K., Chaki, N., Pati, B., Bakshi, S., Mohapatra, D. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 564. Springer, Singapore. https://doi.org/10.1007/978-981-10-6875-1_17
Download citation
DOI: https://doi.org/10.1007/978-981-10-6875-1_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6874-4
Online ISBN: 978-981-10-6875-1
eBook Packages: EngineeringEngineering (R0)