Classifying and Ranking: The First Step Towards Mining Inside Vertical Search Engines

  • Hang Guo
  • Jun Zhang
  • Lizhu Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4653)


Vertical Search Engines (VSEs), which usually work on specific domains, are designed to answer complex queries of professional users. VSEs usually have large repositories of structured instances. Traditional instance ranking methods do not consider the categories that instances belong to. However, users of different interests usually care only the ranking list in their own communities. In this paper we design a ranking algorithm –ZRank, to rank the classified instances according to their importances in specific categories. To test our idea, we develop a scientific paper search engine–CPaper. By employing instance classifying and ranking algorithms, we discover some helpful facts to users of different interests.


Search Engine Ranking List Ranking Algorithm Structure Instance Important Instance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arocena, G.O., Mendelzon, A.O.: Weboql: Restructuring documents, databases, and webs. In: Proc of ICDE (1998)Google Scholar
  2. 2.
    Balmin, A., Hristidis, V., Papakonstantinou, Y.: ObjectRank: Authority-based keyword search in databases. In: Proc. of VLDB (2004)Google Scholar
  3. 3.
    Guo, H., Zhou, L.: Segmented document classification: Problem and solution. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 41–48. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Guo, Q., et al.: A highly adaptable web extractor based on graph data model. In: Proc. of 6th Asia Pacific Web Conference (April 2004)Google Scholar
  5. 5.
    Jin, R., Hauptmann, A.G., Zhai, C.X.: Title language model for information retrieval. In: Proc. of SIGIR (2002)Google Scholar
  6. 6.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proc. of 10th European Conference on Machine Learning, Chemnitz (1998)Google Scholar
  7. 7.
    kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM (1999)Google Scholar
  8. 8.
    Botev, C., Guo, L., Shao, F., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: Proc. of SIGMOD (2003)Google Scholar
  9. 9.
    Lam-Adesina, A.M., Jones, G.J.F.: Applying summarization techniques for term selection in relevance feedback. In: Proc. of 24th SIGIR (2001)Google Scholar
  10. 10.
    McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Proc. of AAAI workshop on Learning for Text Categorization, pp. 41–48. American Association for AI (July 1998)Google Scholar
  11. 11.
    Meng, X., Hu, D., Li, C.: Sg-wrap: A schema-guided wrapper generator. In: Proc of ICDE (2002)Google Scholar
  12. 12.
    Nie, Z., Zhang, Y., Wen, J., Ma, W.: Object-level ranking: bringing order to web objects. In: Proc. of WWW, pp. 567–574. ACM Press, New York (2005)Google Scholar
  13. 13.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34 (2002)Google Scholar
  14. 14.
    Tejada, S., Knoblock, C., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: Proc of KDD (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Hang Guo
    • 1
  • Jun Zhang
    • 2
  • Lizhu Zhou
    • 1
  1. 1.Computer Science & Technology Department, 100084, Tsinghua University, BeijingChina
  2. 2.IBM China Software Develop Lab, 100084, BeijingChina

Personalised recommendations