Abstract
There were two kinds of methods in information retrieval, based on content and based on hyper-link. The quantity of computation in systems based on content was very large and the precision in systems based on hyper-link only was not ideal. It was necessary to develop a technique combining the advantages of two systems. In this paper, we drew up a framework by using the two methods. We set up the transition probability matrix, which composed of link information and the relevant value of pages with the given query. The relevant value was denoted by TFIDF. We got the CLBCRA by solving the equation with the coefficient of transition probability matrix. Experimental results showed that more pages, which were important both in content and hyper-link, were selected.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Raghavan, S., Garcia-Molina, H.: Complex queries over web repositories. In: VLDB 2003. Proceedings of 29th International Conference on Very Large Data Bases, pp. 33–44. Morgan Kaufmann, Berlin, Germany (2004)
Delort, J.-Y., Bouchon-Meunier, B., Rifqi, M.: Enhanced web document summarization using hyperlinks. In: HYPERTEXT 2003. Proceedings of the 14th ACM conference on Hypertext and hypermedia, pp. 208–215. ACM Press, New York (2003)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th international conference on World Wide Web, pp. 107–117 (1998)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: SODA 1998. Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA. Society for Industrial and Applied Mathematics, pp. 668–677 (1998)
Steinberger, R., Pouliquen, B., Hagman, J.: Cross-lingual document similarity calculation using the multilingual thesaurus eurovoc. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 415–424. Springer, Heidelberg (2002)
Guo, G., Wang, H., Bell, D.A., Bi, Y., Greer, K.: An knn model-based approach and its application in text categorization. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 559–570. Springer, Heidelberg (2004)
Soucy, P., Mineau, G.W.: Beyond tfidf weighting for text categorization in the vector space model. In: IJCAI 2005. Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, July 30-August 5, 2005, pp. 1130–1135 (2005)
Tal-Ezer, H.: Faults of pagerank / something is wrong with google mathematical model (2005)
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: VLDB 2004. Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31 - September 3, 2004, pp. 576–587 (2004)
Podnar, I., Luu, T., Rajman, M., Klemm, F., Aberer, K.: A peer-to-peer architecture for information retrieval across digital library collections. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds.) ECDL 2006. LNCS, vol. 4172, pp. 14–25. Springer, Heidelberg (2006)
Buntine, W.L., Aberer, K., Podnar, I., Rajman, M.: Opportunities from open source search. In: Skowron, A., Agrawal, R., Luck, M., Yamaguchi, T., Morizet-Mahoudeaux, P., Liu, J., Zhong, N. (eds.) Web Intelligence, pp. 2–8. IEEE Computer Society Press, Los Alamitos (2005)
Aberer, K., Klemm, F., Rajman, M., Wu, J.: An architecture for peer-to-peer information retrieval. In: Callan, J., Fuhr, N., Nejdl, W. (eds.) Workshop on Peer-to-Peer Information Retrieval (2004)
Sugiyama, K., Hatano, K., Yoshikawa, M., Uemura, S.: Improvement in tf-idf scheme for web peges based on the contents of their hyperlinked neighboring pages. Syst. Comput. Japan 36(14), 56–68 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Wang, Hm., Guo, Y. (2007). CLBCRA-Approach for Combination of Content-Based and Link-Based Ranking in Web Search. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-73871-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)