Skip to main content

Topic-Level Clustering on Web Resources

  • Conference paper
  • First Online:
  • 2737 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10108))

Abstract

The rapid development of Internet, social media, and news portals has provided a large amount of information in various aspects. Confronting such plenty of resources, it is valuable to develop effective clustering approaches. However, performance of traditional clustering models on web resources is not good enough due to the high dimension. In this paper, we propose a clustering model based on topic model and density peaks. Our model combines biterm topic model and clustering by fast search of density peaks, which firstly extract a set of features with the co-occurrence of two words from the original documents, followed by clustering analysis via topical features. Web resources are translated from raw data into clusters, and evaluation on clustering results of center part verifies the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bao, S., Xu, S.¸ Zhang, L., Yan, R., Su, Z., Han, D., Yu, Y.: Joint emotion-topic modeling for social affective text mining. In: Proceedings of the 9th IEEE International Conference on Data Mining (ICDM), pp. 699–704 (2009)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    Google Scholar 

  3. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391 (1990)

    Article  Google Scholar 

  4. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)

    Google Scholar 

  5. Fischer, G.: User modeling in humancomputer interaction. User Model. User-Adap. Inter. 11(1–2), 65–86 (2001)

    Article  MATH  Google Scholar 

  6. Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theory 21(1), 32–40 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  7. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl. 1), 5228–5235 (2004)

    Article  Google Scholar 

  8. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999)

    Google Scholar 

  9. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. Wiley, New York (2009)

    MATH  Google Scholar 

  10. Kuang, W., Luo, N., Sun, Z.: Resource recommendation based on topic model for educational system. In: Proceedings of the 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 370–374 (2011)

    Google Scholar 

  11. Lakiotaki, K., Matsatsinis, N.F., Tsoukiàs, A.: Multicriteria user modeling in recommender systems. IEEE Intell. Syst. 26(2), 64–76 (2011)

    Article  Google Scholar 

  12. Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 375–384 (2009)

    Google Scholar 

  13. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability: Statistics, vol. 1, pp. 281–297. University of California Press (1967)

    Google Scholar 

  14. Martın-Guerrero, J.D., Palomares, A., Balaguer-Ballester, E., Soria-Olivas, E., Gómez-Sanchis, J., Soriano-Asensi, A.: Studying the feasibility of a recommender in a citizen web portal based on user modeling and clustering algorithms. Expert Syst. Appl. 30(2), 299–312 (2006)

    Google Scholar 

  15. McLachlan, G., Krishnan, T.: The EM algorithm and extensions. Wiley, New York (2007)

    MATH  Google Scholar 

  16. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  17. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 487–494 (2004)

    Google Scholar 

  18. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. (2012)

    Google Scholar 

  19. Thollard, F., Dupont, P., Higuera, C.D.L.: Probabilistic dfa inference using kullback-leibler divergence and minimality. In: Proceedings of the 17th International Conference on Machine Learning (ICML), pp. 975–982 (2000)

    Google Scholar 

  20. Trier, Ø.D., Jain, A.K., Taxt, T.: Feature extraction methods for character recognition-a survey. Pattern Recogn. 29(4), 641–662 (1996)

    Article  Google Scholar 

  21. Wang, S., Tang, Z., Rao, Y., Xie, H., Wang, F.L.: A clustering algorithm based on minimum spanning tree with e-learning applications. In: Gong, Z., Chiu, D.K.W., Zou, D. (eds.) ICWL 2015. LNCS, vol. 9584, pp. 3–12. Springer, Heidelberg (2016). doi:10.1007/978-3-319-32865-2_1

    Chapter  Google Scholar 

  22. Xie, H., Li, Q., Cai, Y.: Community-aware resource profiling for personalized search in folksonomy. J. Comput. Sci. Technol. 27(3), 599–610 (2012)

    Article  MATH  Google Scholar 

  23. Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Rao, Y.: Community-aware user profile enrichment in folksonomy. Neural Netw. 58, 111–121 (2014)

    Article  Google Scholar 

  24. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

  25. Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web (WWW), pp. 1445–1456 (2013)

    Google Scholar 

  26. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. ACM Sigmod. Rec. 25(2), 103–114 (1996)

    Article  Google Scholar 

Download references

Acknowledgements

The research work described in this article was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS11/E06/14).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fu Lee Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhao, S., Wang, F.L., Wong, L.P. (2017). Topic-Level Clustering on Web Resources. In: Wu, TT., Gennari, R., Huang, YM., Xie, H., Cao, Y. (eds) Emerging Technologies for Education. SETE 2016. Lecture Notes in Computer Science(), vol 10108. Springer, Cham. https://doi.org/10.1007/978-3-319-52836-6_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52836-6_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52835-9

  • Online ISBN: 978-3-319-52836-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics