Skip to main content

Patent Document Clustering Using Dimensionality Reduction

  • Conference paper
  • First Online:
Progress in Advanced Computing and Intelligent Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 564))

  • 1123 Accesses

Abstract

Patents are a type of intellectual property rights that provide exclusive rights to the invention. Whenever there is a novelty or an invention, prior art search on patents is carried out to check the degree of innovation. Clustering is used to group the relevant documents of prior art search to gain insights about the patent document. The patent documents represent hundreds of features (words extracted from the title and abstract fields). The common sets of features between the documents are subtle. Therefore, the number of features for clustering increases drastically. This leads to the curse of dimensionality. Hence, in this work, dimensionality reduction techniques such as PCA and SVD are employed to compare and analyze the quality of clusters formed from the Google patent documents. This comparative analysis was performed by considering title, abstract, and classification code fields of the patent document. Classification code information was used to decide the number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, London (1979)

    MATH  Google Scholar 

  2. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Elsevier (2011)

    MATH  Google Scholar 

  3. Ding, C., He, X.: K-means clustering via principal component analysis. In: 21st International Conference on Machine Learning (ICML-04), p. 29. ACM (2004)

    Google Scholar 

  4. Li, C.H., Park, S.C.: An efficient document classification model using an improved back propagation neural network and singular value decomposition. Expert Syst. Appl. 36(2), 3208–3215 (2009)

    Article  Google Scholar 

  5. Gaff, B.M., Rubinger, B.: The significance of prior art. Computer. 8, 9–11 (2014)

    Google Scholar 

  6. Jun, S., Park, S.S., Jang, D.S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst. Appl. 41(7), 3204–3212 (2014)

    Article  Google Scholar 

  7. Andrews, N.O., Fox, E.A.: Recent developments in document clustering. Technical Report TR-07-35. Department of Computer Science, Polytechnic Institute & State University (2007)

    Google Scholar 

  8. Huang, S.H., Ke, H.R., Yang, W.P.: Structure clustering for Chinese patent documents. Expert Syst. Appl. 34(4), 2290–2297 (2008)

    Article  Google Scholar 

  9. Balabantaray, R.C., Sarma, C., Jha, M.: Document clustering using K-Means and K-Medoids. Int. J. Knowl. Based Comput. Syst. 1(1) (2015)

    Google Scholar 

  10. Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pp. 9–15 (1998)

    Google Scholar 

  11. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based approach to browsing large document collections. In: 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR92), pp. 318–329. ACM (1992)

    Google Scholar 

  12. Kang, I.S., Na, S.H., Kim, J., Lee, J.H.: Cluster-based patent retrieval. Inf. Process. Manag. 43(5), 1173–1182 (2007)

    Article  Google Scholar 

  13. Aggarwal, C.C., Yu, P.S.: Finding Generalized Projected Clusters in High Dimensional Spaces. ACM, vol. 29, no. 2 (2000)

    Google Scholar 

  14. Mugunthadevi, K., Punitha, S.C., Punithavalli, M.: Survey on feature selection in document clustering. Int. J. Comput. Sci. Eng. 3(3), 1240–1241 (2011)

    Google Scholar 

  15. Kumar, C.A.: Analysis of unsupervised dimensionality reduction techniques. Comput. Sci. Inf. Syst. 6(2), 217–227 (2009)

    Article  Google Scholar 

  16. Tang, B., Shepherd, M., Heywood, M. I., Luo, X.: Comparing dimension reduction techniques for document clustering. Adv. Artif. Intell. 292–296 (2005)

    Google Scholar 

  17. Kadhim, A.I., Cheah, Y.N., Ahamed, N.H.: Text document preprocessing and dimension reduction techniques for text document clustering. In: 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology (ICAIET), pp. 69–73. IEEE (2014)

    Google Scholar 

  18. Liu, T., Liu, S., Chen, Z., Ma, W.Y.: An evaluation on feature selection for text clustering. ICML 3, 488–495 (2003)

    Google Scholar 

  19. Kantrowitz, M., Mohit, B., Mittal, V.: Stemming and its effects on TFIDF ranking (poster session). In: 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 357–359. ACM (2000)

    Google Scholar 

  20. Porter, M.F.: An algorithm for suffix stripping. Progr. Electron. Libr. Inf. Syst. 40(3), 130–137 (1980)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Girthana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Girthana, K., Swamynathan, S. (2018). Patent Document Clustering Using Dimensionality Reduction. In: Saeed, K., Chaki, N., Pati, B., Bakshi, S., Mohapatra, D. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 564. Springer, Singapore. https://doi.org/10.1007/978-981-10-6875-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6875-1_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6874-4

  • Online ISBN: 978-981-10-6875-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics