Skip to main content

Improving Quality of Search Results Clustering with Approximate Matrix Factorisations

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3936))

Abstract

In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We compare four different factorisations (SVD, NMF, LNMF and K-Means/Concept Decomposition) with respect to topic separation capability, outlier detection and label quality. We also compare our approach with two other clustering algorithms: Suffix Tree Clustering (STC) and Tolerance Rough Set Clustering (TRC). For our experiments we use the standard merge-then-cluster approach based on the Open Directory Project web catalogue as a source of human-clustered document summaries.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: SIGIR 1998: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 46–54. ACM Press, New York (1998)

    Google Scholar 

  2. Zamir, O.E.: Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. PhD thesis, University of Washington (1999)

    Google Scholar 

  3. Dong, Z.: Towards Web Information Clustering. PhD thesis, Southeast University, Nanjing, China (2002)

    Google Scholar 

  4. Lang, N.C.: A tolerance rough set approach to clustering web search results. Master’s thesis, Faculty of Mathematics, Informatics and Mechanics, Warsaw University (2004)

    Google Scholar 

  5. Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In: Proceedings of the 13th international conference on World Wide Web, pp. 658–665. ACM Press, New York (2004)

    Google Scholar 

  6. Stefanowski, J., Weiss, D.: Carrot2 and language properties in web search results clustering. In: AWIC 2003. LNCS, vol. 2663, pp. 240–249. Springer, Heidelberg (2003)

    Google Scholar 

  7. Osiński, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on Singular Value Decomposition. In: Proceedings of the International IIS: Intelligent Information Processing and Web Mining Conference. Advances in Soft Computing, Zakopane, Poland, pp. 359–368. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Osi´nski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intelligent Systems 20(3), 48–54 (2005)

    Article  Google Scholar 

  9. Lee, D., Seung, S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)

    Article  MATH  Google Scholar 

  10. Li, S.Z., Hou, X.W., Zhang, H., Cheng, Q.: Learning spatially localized, parts-based representation. CVPR (1), 207–212 (2001)

    Google Scholar 

  11. Dhillon, I., Modha, D.: Concept decompositions for large sparse text data using clustering. Machine Learning 42(1), 143–175 (2001)

    Article  MATH  Google Scholar 

  12. Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proceedings of SIGIR 1996, 19th ACM International Conference on Research and Development in Information Retrieval, Zürich, CH, pp. 76–84 (1996)

    Google Scholar 

  13. Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to Web search results. Computer Networks 31(11–16), 1361–1374 (1999)

    Article  Google Scholar 

  14. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 267–273. ACM Press, New York (2003)

    Chapter  Google Scholar 

  15. Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)

    Google Scholar 

  16. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. Neural Information Processing Systems 13, 556–562 (2000)

    Google Scholar 

  17. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: KDD 2001: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 245–250. ACM Press, New York (2001)

    Google Scholar 

  18. Dom, B.E.: An information-theoretic external cluster-validity measure. Technical Report IBM Research Report RJ 10219, IBM (2001)

    Google Scholar 

  19. Osiński, S.: Dimensionality reduction techniques for search results clustering. Master’s thesis, The University of Sheffield (2004)

    Google Scholar 

  20. Xu, W., Gong, Y.: Document clustering by concept factorization. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 202–209. ACM Press, New York (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Osinski, S. (2006). Improving Quality of Search Results Clustering with Approximate Matrix Factorisations. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_16

Download citation

  • DOI: https://doi.org/10.1007/11735106_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33347-0

  • Online ISBN: 978-3-540-33348-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics