Database Support for Automatic Web Queries Categorization

  • Ernestina Menasalvas Ruiz
  • Santiago Eibe Garcia
Part of the Studies in Computational Intelligence book series (SCI, volume 225)


The increasing usage of web search engines together with the potential added value of knowing user interests when submitting a query are in the roots of the categorization of web queries research. Categorizing queries is challenging both for the problems associated to gathering and analyzing user context information and for the ones related to deployment of the knowledge obtained. Related to the first one, an interesting open problem is to analyze the mapping, if any, between user queries and the content shown by the portal at the main page. The automatization of this problem would be very beneficial and among the challenges we underlay the implementation of a database to support the process. In this chapter, we firstly review the main approaches for web query categorization and then we concentrate on analysing the process and the database support required for its automatization.


Query Categorization data warehouse search engines data mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Annand, S.: Putting the user in context, 2006. In: ECML PKDD 2006 Workshop on Ubiquitous Knowledge Discovery for users (UKDU 2006), Berlin (2006)Google Scholar
  2. 2.
    Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: KDD 2000: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 407–416. ACM, New York (2000)CrossRefGoogle Scholar
  3. 3.
    Beitzel, S.M.: On understanding and classifying web queries. PhD Thesis, Illinois Institute of Technology (2006)Google Scholar
  4. 4.
    Beitzel, S.M., Jensen, E.C., Chowdhury, A., Frieder, O., Grossman, D.: Temporal analysis of a very large topically categorized web query log. J. Am. Soc. Inf. Sci. Technol. 58(2), 166–178 (2007)CrossRefGoogle Scholar
  5. 5.
    Beitzel, S.M., Jensen, E.C., Frieder, O., Grossman, D., Lewis, D.D., Chowdhury, A., Kolcz, A.: Automatic web query classification using labeled and unlabeled training data. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 581–582. ACM Press, New York (2005)CrossRefGoogle Scholar
  6. 6.
    Beitzel, S.M., Jensen, E.C., Lewis, D.D., Chowdhury, A., Frieder, O.: Automatic classification of web queries using very large unlabeled query logs. ACM Trans. Inf. Syst. 25(2), 9 (2007)Google Scholar
  7. 7.
    Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: WWW 2007: Proceedings of the 16th international conference on World Wide Web, pp. 757–766. ACM Press, New York (2007)Google Scholar
  8. 8.
    Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)CrossRefGoogle Scholar
  9. 9.
    Chung, S., McLeod, D.: Dynamic topic mining from news stream data. In: CoopIS/DOA/ODBASE, pp. 653–670 (2003)Google Scholar
  10. 10.
    Dumais, S.T., Chen, H.: Hierarchical classification of Web content. In: Belkin, N.J., Ingwersen, P., Leong, M.-K. (eds.) Proc. of SIGIR-2000, 23rd ACM International Conference on Research and Development in Information Retrieval, Athens, GR, pp. 256–263. ACM Press, New York (2000)CrossRefGoogle Scholar
  11. 11.
    Eibe, S., Valencia, M., Menasalvas, E., Segovia, J., Sousa, P.: Towards user context enhance search engine logs mining. In: Proceedings of the AWIC 2007 (2007)Google Scholar
  12. 12.
    Gravano, L., Hatzivassiloglou, V., Lichtenstein, R.: Categorizing web queries according to geographical locality. In: 12th ACM Conference on Information and Knowledge Management (CIKM 2003), November 3-8, pp. 325–333. ACM Press, New York (2003)Google Scholar
  13. 13.
    Jansen, B.J., Booth, D.L., Spink, A.: Determining the user intent of web search engine queries. In: WWW 2007: Proceedings of the 16th international conference on World Wide Web, pp. 1149–1150. ACM, New York (2007)CrossRefGoogle Scholar
  14. 14.
    Jansen, B.J., Spink, A.: How are we searching the world wide web? a comparison of nine search engine transaction logs. Inf. Process. Manage. 42(1), 248–263 (2006)CrossRefGoogle Scholar
  15. 15.
    Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Process. Manage. 36(2), 207–227 (2000)CrossRefGoogle Scholar
  16. 16.
    Jones, R., Diaz, F.: Temporal profiles of queries. ACM Trans. Inf. Syst. 25(3), 14 (2007)CrossRefGoogle Scholar
  17. 17.
    Joshi, H., Ito, S., Kanala, S., Hebbar, S., Bayrak, C.: Concept set extraction with user session context. In: ACM-SE 45: Proceedings of the 45th annual southeast regional conference, pp. 455–460. ACM, New York (2007)CrossRefGoogle Scholar
  18. 18.
    Kang, I., Kim, G.: Query type classification for web document retrieval (2003)Google Scholar
  19. 19.
    Kawai, Y., Kumamoto, T., Tanaka, K.: User preference modeling based on interest and impressions for news portal site systems. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 549–559. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  20. 20.
    Kontostathis, A., Galitsky, L., Pottenger, W.M., Roy, S., Phelps, D.J.: A Survey of Emerging Trend Detection in Textual Data Mining. Springer, Heidelberg (2003)Google Scholar
  21. 21.
    Kukulenz, D., Ntoulas, A.: Answering bounded continuous search queries in the world wide web. In: WWW 2007: Proceedings of the 16th international conference on World Wide Web, pp. 551–560. ACM, New York (2007)CrossRefGoogle Scholar
  22. 22.
    Kules, B., Kustanowitz, J., Shneiderman, B.: Categorizing web search results into meaningful and stable categories using fast-feature techniques. In: JCDL, pp. 210–219 (2006)Google Scholar
  23. 23.
    Lee, U., Liu, Z., Cho, J.: Automatic identification of user goals in web search. In: WWW 2005: Proceedings of the 14th international conference on World Wide Web, pp. 391–400. ACM, New York (2005)CrossRefGoogle Scholar
  24. 24.
    Li, Y.: Mining ontology for automatically acquiring web user information needs. IEEE Transactions on Knowledge and Data Engineering 18(4), 554–568 (2006) (Senior Member-Ning Zhong)CrossRefGoogle Scholar
  25. 25.
    Rose, D.E., Levinson, D.: Understanding user goals in web search. In: WWW 2004: Proceedings of the 13th international conference on World Wide Web, pp. 13–19. ACM, New York (2004)CrossRefGoogle Scholar
  26. 26.
    Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Query enrichment for web-query classification. ACM Trans. Inf. Syst. 24(3), 320–352 (2006)CrossRefGoogle Scholar
  27. 27.
    Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 131–138. ACM, New York (2006)CrossRefGoogle Scholar
  28. 28.
    Sieg, A., Mobasher, B., Burke, R.D.: Representing context in web search with ontological user profiles. In: Kokinov, B., Richardson, D.C., Roth-Berghofer, T.R., Vieu, L. (eds.) CONTEXT 2007. LNCS, vol. 4635, pp. 439–452. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  29. 29.
    Song, R., Luo, Z., Wen, J.-R., Yu, Y., Hon, H.-W.: Identifying ambiguous queries in web search. In: WWW 2007: Proceedings of the 16th international conference on World Wide Web, pp. 1169–1170. ACM, New York (2007)CrossRefGoogle Scholar
  30. 30.
    Spink, A., Jansen, B.J., Blakely, C., Koshman, S.: Overlap among major web search engines. In: ITNG 2006: Proceedings of the Third International Conference on Information Technology: New Generations (ITNG 2006), Washington, DC, USA, pp. 370–374. IEEE Computer Society, Los Alamitos (2006)CrossRefGoogle Scholar
  31. 31.
    Wen, J.-R., Nie, J.-Y., Zhang, H.: Clustering user queries of a search engine. In: WWW, pp. 162–168 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ernestina Menasalvas Ruiz
    • 1
  • Santiago Eibe Garcia
    • 1
  1. 1.Facultad de Informatica UPM 

Personalised recommendations