Abstract
Knowing the geographical serving area of web resources is very important for many web applications. Here serving area stands for the geographical distribution of online users who are interested in a given web site. In this paper, we proposed a set of novel methods to detect the serving area of web resources by analyzing search engine logs. We use the search logs to detect serving area in two ways. First, we extracted the user IP locations to generate the geographical distribution of users who had the same interests in a web site. Second, query terms input by users were considered as the user knowledge about a web site. To increase the confidence and to cover new sites for use in real-time applications, we also proposed a categorization system for local web sites. A novel method for detecting the serving area was proposed based on categorizing the web content. For each category, a radius was assigned according to previous logs. In our experiments, we evaluated all these three algorithms. From the results, we found that the approach based on query terms was superior to that based on IP locations, since search queries for local sites tended to include location words while the IP locations were sometimes erroneous. The approach based on categorization was efficient for sites of known categories and were useful for small sites without sufficient number of query logs.
This work was done when the first author was visiting Microsoft Research Asia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ding, J., Gravano, L., Shivakumar, N.: Computing geographical scopes of web resource. In: 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt (September 2000)
Buyukkokten, O., Cho, J., Garcia-Molina, H., Gravano, L., Shivakumar, N.: Exploiting geographical location information of web pages. In: ACM SIGMOD Workshop on the Web and Databases 1999 (WebDB 1999), Philadelphia (June 1999)
Yokoji, S., Takahashi, K., Miura, N.: Kokono search: a location based search engine. In: 10th International World Wide Web Conference (WWW 2001), Hong Kong (May 2001)
Kosala, R., Blocakeel, H.: Web mining research: a survey. In: 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston (August 2000)
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: Proceedings of the 27th SIGIR, pp. 273–280 (2004)
Wang, C., Xie, X., Wang, L., Lu, Y., Ma, W.-Y.: Detecting Geographic Locations from Web Resources. In: The 2nd Internatinal Workshop on Geographic Information Retrieval (GIR 2005), ACM Fourteenth Conference on Information and Knowledge Management (CIKM 2005), Bremen (October 2005)
Dumais, S., Chen, H.: Hierarchical classification of web content. In: Proceeding of SIGIR 2000, 23rd ACM International Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 256–263. ACM Press, New York (2000)
Glover, E.J., Tsioutsiouliklis, K., Lawrence, S., Pennock, D.M., Flake, G.W.: Using web structure for classifying and describing web pages. In: Proceedings of the Eleventh International Conference on World Wide Web, pp. 562–569. ACM Press, New York (2002)
Yang, Y., Slattery, S., Ghani, R.: A study of approaches to hypertext categorization. Journal of Intelligent Information Systems
Gravano, L., Hatzivassiloglou, V., Lichtenstein, R.: Categorizing web queries according to geographical locality. In: 12th ACM Conference on Information and Knowledge Management (CIKM 2003), New Orleans (November 2003)
CITY-DATA.COM. http://www.city-data.com
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data (1998)
Hearst, M.A.: Trends and controversies: support vector machines. IEEE Intelligent Systems 13(4), 18–28 (1998)
Hill, L.L., Frew, J., Zheng, Q.: Geographic names: the implementation of a gazetteer in a georeferenced digital library. Digital Library, 5(1) (January 1999)
Iko, P., Takahiko, S., Katsumi, T., Masaru, K.: User behavior analysis of location aware search engine. In: 3rd International Conference on Mobile Data Management (MDM 2002), Singapore (January 2002)
McCurley, K.S.: Geographical mapping and navigation of the web. In: 10th International World Wide Web Conference (WWW 2001), Hong Kong (May 2001)
Google Local Search. http://www.google.com/local
MSN Local Search. http://search.msn.com/local
Geographic Names Information System (GNIS). http://geonames.usgs.gov/
North American Numbering Plan. http://sd.wareonearth.com/~phil/npanxx
USPS – The United States Postal Services. http://www.usps.com
Open Directory Project. http://dmoz.org/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Q., Xie, X., Wang, L., Yue, L., Ma, WY. (2007). Computing Geographical Serving Area Based on Search Logs and Website Categorization. In: Wagner, R., Revell, N., Pernul, G. (eds) Database and Expert Systems Applications. DEXA 2007. Lecture Notes in Computer Science, vol 4653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74469-6_79
Download citation
DOI: https://doi.org/10.1007/978-3-540-74469-6_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74467-2
Online ISBN: 978-3-540-74469-6
eBook Packages: Computer ScienceComputer Science (R0)