Automatic Generation and Use of Negative Terms to Evaluate Topic-Related Web Pages

Byun, Young-Tae; Choi, Yong-Ho; Lee, Kee-Cheol

doi:10.1007/11527725_23

Young-Tae Byun²⁰,
Yong-Ho Choi²¹ &
Kee-Cheol Lee²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3597))

Included in the following conference series:

International Conference Human Society@Internet

693 Accesses

Abstract

Deciding the relevance of Web pages to a query or a topic is very important in serving Web users. For clustering and classifying Web pages the similar decisions need to be made. Most of work usually uses positively related terms in one form or another. Once a topic is given or focused, we suggest using negative terms to the topic for the relevance decision. A method to generate negative terms automatically by using DMOZ, Google and WordNet, is discussed, and formulas to decide the relevance using the negative terms are also given in this paper. Experiments convince us of the usefulness of the negative terms against the topic. This work also helps to solve the polysemy problem. Since generating negative terms to any topic is automatic, this work may help many studies for the service improvement in the Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Attardi, G., Gulli, A., Sebastiani, F.: Automatic Web Page Categorization by Link and Context Analysis. In: Proc. of THAI 1999, European Symposium on Telematics, Hypermedia and Artificial Intellignece (1999)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Kontostathis, A., Pottenger, W.M.: Improving Retrieval Performance with Positive and Negative Equivalence Classes of Terms, TR in Lehigh Univ. (2002)
Google Scholar
Hoashi, K., Matsumoto, K., Inoue, N., Hashimoto, K.: Experiments on the TREC-8 Filtering Track. In: Proc. of SIGIR 2000 (2000)
Google Scholar
Yu, C.T., Salton, G., Siu, M.K.: Effective Automatic Indexing Using Term Addition and Deletion. JACM 25 (1978)
Google Scholar
Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: A new approach to topic-specific Web resource discovery. In: Proc. of the 8^th International WWW Conference (1999)
Google Scholar
Chakrabarti, S., et al.: Automatic Resource Compilation by Analyzing Hyperline Structure and Associated Text. In: Proc. of the 7^th International WWW Conference (1998)
Google Scholar
Eguchi, K.: Incremental query expansion using local information of clusters. In: Proc. of the 4^th World Multiconference on Systems, Cybernetics and Informatics (2000)
Google Scholar
Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proc. of ACM-SIAM Symposium on Discrete Algorithms (1998)
Google Scholar
Kim, S.: Improving the Performance of an Information Agent for a Specific Domain on the WWW. Master thesis, Hongik Graudate School (2002)
Google Scholar
Menzcer, F., Pant, G., Ruiz, M.: Evaluation Topic-Driven Web Crawlers. In: Proc. of SIGIR 2001 (2001)
Google Scholar
Menzcer, F., Pant, G., Srinivasan, P.: Topical Web Crawlers: Evaluating Adaptive Algorithms. ACM Transactions on Internet Technology V (February 2003)
Google Scholar
Miller, G.: Wordnet: An online lexical database. International Journal of Lexicography 3 (1997)
Google Scholar
Pant, G., Menzcer, F.: MySpiders: Evolve Your Own Intelligent Web Crawlers. Autonomous Agents and Multi-Agent Systems 5 (2002)
Google Scholar
[DMOZ], http://www.dmoz.org/Kids_and_Teens/School_Time/Science/Living_Thing/Animals/Mammals/
[Google], http://www.google.com/
[Wordnet], http://www.cogsci.princeton.edu/~wn/

Download references

Author information

Authors and Affiliations

Department of Computer Engineering Hong-Ik University, Seoul, Korea
Young-Tae Byun & Kee-Cheol Lee
Cyber Terror Response Center Korean National Police Agency, Seoul, Korea
Yong-Ho Choi

Authors

Young-Tae Byun
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Ho Choi
View author publications
You can also search for this author in PubMed Google Scholar
Kee-Cheol Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Osaka University, Cybermedia Center, 5-1 Mihogaoka, Ibaraki, 567-0047, Osaka, Japan
Shinji Shimojo
Graduate School of Mathematical Sciences, University of Tokyo, 3-8-1 Komaba, Meguro-ku, 153-8914, Tokyo, Japan
Shingo Ichii
School of Computing, National University of Singapore,
Tok-Wang Ling
National Internet Development Agency of Korea, 3F, 1321-11 Secho-2 Dong, Secho-Gu, 135-875, Seoul, Korea
Kwan-Ho Song

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Byun, YT., Choi, YH., Lee, KC. (2005). Automatic Generation and Use of Negative Terms to Evaluate Topic-Related Web Pages. In: Shimojo, S., Ichii, S., Ling, TW., Song, KH. (eds) Web and Communication Technologies and Internet-Related Social Issues - HSI 2005. HSI 2005. Lecture Notes in Computer Science, vol 3597. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527725_23

Download citation

DOI: https://doi.org/10.1007/11527725_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27830-6
Online ISBN: 978-3-540-31808-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics