Advertisement

Clustering View-Segmented Documents via Tensor Modeling

  • Salvatore Romeo
  • Andrea Tagarelli
  • Dino Ienco
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8502)

Abstract

We propose a clustering framework for view-segmented documents, i.e., relatively long documents made up of smaller fragments that can be provided according to a target set of views or aspects. The framework is designed to exploit a view-based document segmentation into a third-order tensor model, whose decomposition result would enable any standard document clustering algorithm to better reflect the multi-faceted nature of the documents. Experimental results on document collections featuring paragraph-based, metadata-based, or user-driven views have shown the significance of the proposed approach, highlighting performance improvement in the document clustering task.

Keywords

Document Collection Normalize Mutual Information Document Cluster Tensor Model Tensor Decomposition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bickel, S., Scheffer, T.: Multi-View Clustering. In: Proc. IEEE Int. Conf. on Data Mining (ICDM), pp. 19–26 (2004)Google Scholar
  2. 2.
    Dunlavy, D.M., Kolda, T.G., Kegelmeyer, W.P.: Multilinear algebra for analyzing data with multiple linkages. In: Graph Algorithms in the Language of Linear Algebra, Fundamentals of Algorithms, pp. 85–114. SIAM (2011)Google Scholar
  3. 3.
    Ienco, D., Robardet, C., Pensa, R.G., Meo, R.: Parameter-less co-clustering for star-structured heterogeneous data. Data Min. Knowl. Disc. 26(2), 217–254 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Kolda, T., Bader, B.: The TOPHITS model for higher-order web link analysis. In: Proc. Workshop on Link Analysis, Counterterrorism and Security (2006)Google Scholar
  5. 5.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review 51(3), 455–500 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Lathauwer, L.D., Moor, B.D., Vandewalle, J.: A Multilinear Singular Value Decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Lewis, D.D., Yang, Y., Rose, T., Li, F.: RCV1: A New Benchmark Collection for Text Categorization Research. J. Mach. Learn. Res. 5, 361–397 (2004)Google Scholar
  8. 8.
    Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.-C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 1023–1031 (2012)Google Scholar
  9. 9.
    Liu, X., Glänzel, W., Moor, B.D.: Hybrid clustering of multi-view data via Tucker-2 model and its application. Scientometrics 88(3), 819–839 (2011)CrossRefGoogle Scholar
  10. 10.
    Ponti, G., Tagarelli, A.: Topic-Based Hard Clustering of Documents Using Generative Models. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. 5722, vol. 5722, pp. 231–240. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: Proc. Int. Conf. on Web Search and Web Data Mining (WSDM), pp. 54–63 (2009)Google Scholar
  12. 12.
    Romeo, S., Tagarelli, A., Gullo, F., Greco, S.: A Tensor-based Clustering Approach for Multiple Document Classifications. In: Proc. Int. Conf. on Pattern Recognition Applications and Methods (ICPRAM), pp. 200–205 (2013)Google Scholar
  13. 13.
    Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: Proc. KDD Workshop on Text Mining (2000)Google Scholar
  14. 14.
    Tagarelli, A., Karypis, G.: A segment-based approach to clustering multi-topic documents. Knowl. Inf. Syst. 34(3), 563–595 (2013)CrossRefGoogle Scholar
  15. 15.
    Zhang, Z.-Y., Li, T., Ding, C.: Non-negative Tri-factor tensor decomposition with applications. Knowl. Inf. Syst. 34(2), 243–265 (2013)CrossRefGoogle Scholar
  16. 16.
    Zhong, S., Ghosh, J.: A Unified Framework for Model-Based Clustering. J. Mach. Learn. Res. 4, 1001–1037 (2003)MathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Salvatore Romeo
    • 1
  • Andrea Tagarelli
    • 1
  • Dino Ienco
    • 2
  1. 1.DIMESUniversity of CalabriaItaly
  2. 2.IRSTEAMontpellierFrance

Personalised recommendations