Automatic Generation and Use of Negative Terms to Evaluate Topic-Related Web Pages

  • Young-Tae Byun
  • Yong-Ho Choi
  • Kee-Cheol Lee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3597)


Deciding the relevance of Web pages to a query or a topic is very important in serving Web users. For clustering and classifying Web pages the similar decisions need to be made. Most of work usually uses positively related terms in one form or another. Once a topic is given or focused, we suggest using negative terms to the topic for the relevance decision. A method to generate negative terms automatically by using DMOZ, Google and WordNet, is discussed, and formulas to decide the relevance using the negative terms are also given in this paper. Experiments convince us of the usefulness of the negative terms against the topic. This work also helps to solve the polysemy problem. Since generating negative terms to any topic is automatic, this work may help many studies for the service improvement in the Web.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Attardi, G., Gulli, A., Sebastiani, F.: Automatic Web Page Categorization by Link and Context Analysis. In: Proc. of THAI 1999, European Symposium on Telematics, Hypermedia and Artificial Intellignece (1999)Google Scholar
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  3. 3.
    Kontostathis, A., Pottenger, W.M.: Improving Retrieval Performance with Positive and Negative Equivalence Classes of Terms, TR in Lehigh Univ. (2002)Google Scholar
  4. 4.
    Hoashi, K., Matsumoto, K., Inoue, N., Hashimoto, K.: Experiments on the TREC-8 Filtering Track. In: Proc. of SIGIR 2000 (2000)Google Scholar
  5. 5.
    Yu, C.T., Salton, G., Siu, M.K.: Effective Automatic Indexing Using Term Addition and Deletion. JACM 25 (1978)Google Scholar
  6. 6.
    Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: A new approach to topic-specific Web resource discovery. In: Proc. of the 8th International WWW Conference (1999)Google Scholar
  7. 7.
    Chakrabarti, S., et al.: Automatic Resource Compilation by Analyzing Hyperline Structure and Associated Text. In: Proc. of the 7th International WWW Conference (1998)Google Scholar
  8. 8.
    Eguchi, K.: Incremental query expansion using local information of clusters. In: Proc. of the 4th World Multiconference on Systems, Cybernetics and Informatics (2000)Google Scholar
  9. 9.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proc. of ACM-SIAM Symposium on Discrete Algorithms (1998)Google Scholar
  10. 10.
    Kim, S.: Improving the Performance of an Information Agent for a Specific Domain on the WWW. Master thesis, Hongik Graudate School (2002)Google Scholar
  11. 11.
    Menzcer, F., Pant, G., Ruiz, M.: Evaluation Topic-Driven Web Crawlers. In: Proc. of SIGIR 2001 (2001)Google Scholar
  12. 12.
    Menzcer, F., Pant, G., Srinivasan, P.: Topical Web Crawlers: Evaluating Adaptive Algorithms. ACM Transactions on Internet Technology V (February 2003)Google Scholar
  13. 13.
    Miller, G.: Wordnet: An online lexical database. International Journal of Lexicography 3 (1997)Google Scholar
  14. 14.
    Pant, G., Menzcer, F.: MySpiders: Evolve Your Own Intelligent Web Crawlers. Autonomous Agents and Multi-Agent Systems 5 (2002)Google Scholar
  15. 15.
  16. 16.
  17. 17.

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Young-Tae Byun
    • 1
  • Yong-Ho Choi
    • 2
  • Kee-Cheol Lee
    • 1
  1. 1.Department of Computer Engineering Hong-Ik UniversitySeoulKorea
  2. 2.Cyber Terror Response Center Korean National Police AgencySeoulKorea

Personalised recommendations