Skip to main content

Intelligent Internet Information Systems in Knowledge Acquisition: Techniques and Applications

  • Chapter
Intelligent Knowledge-Based Systems

Abstract

The explosive growth of the World Wide Web continues to revolutionize information editing, publishing and accessing patterns. Within the Web infrastructure, individuals can easily edit and publish documents that contain hyperlinks to other documents published by the same or other Web sites. As a result, the Web contains information on almost any subject available anywhere to anyone at anytime. However, this explosive information growth has made the task of finding information like trying to find a needle in a haystack. Although directory services (like Yahoo!1) and search engines (like Google2) facilitate information searches, many users still have difficulty locating useful information. Browsing directories is time consuming as there are a seemingly infinite number of possible topics. For example, Open Directory (currently the largest directory database) contains over 460,000 categorics3. Users must click and click and click to find a target directory and browse documents. Furthermore, the construction of directories is labor-intensive and the directory service cannot keep up with Web growth. Finding documents using search engines is frustrating as search results usually contain thousands of links. Although some search engines like Google apply hyperlink analysis to provide better ranking, it is still of ten ineffective.

http://www.yahoo.com/.

http://www.google.com/.

http://dmoz.org/. The Web site contains over 3.8 million sites, 57,238 editors, and over 460,000 categories when I visited the site at June 26, 2003.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 429.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” Proceedings of the ACM SIGMOD International Conference, pages 94–105, 1998.

    Google Scholar 

  2. R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proceedings of the ACM SIGMOD International Conference on Management of Data, May 1993.

    Google Scholar 

  3. R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th International Conference on VLDB, September 1994.

    Google Scholar 

  4. J. Allan, “Relevance Feedback with too much Data,” Proceedings of the ACM SIGIR International Conference on Information Retrieval, pages 337–343, July 1995.

    Google Scholar 

  5. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic Local Alignment Search Tool,” Journal of Molecular Biology, 215: 403–410, 1990.

    Google Scholar 

  6. M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “OPTICS: Ordering Points to Identify the Clustering Structure,” Proceedings of the ACM SIGMOD International Conference, pages 49–60, 1999.

    Google Scholar 

  7. C. Apte, F. Damerau, and S. M. Weiss, “Automated Learning of Decision Rules for Text Categorization,” ACM Transactions on Information Systems, 12(3):233–251, July 1994.

    Article  Google Scholar 

  8. R. Baeza-Yates, “Modern Information Retrieval,” Addison Wesley, 1999.

    Google Scholar 

  9. T. Berners-Lee, T. R. Cailliau, etc., “The World-Wide Web,” Communications of the ACM, 37(8):76–82, August 1994.

    Google Scholar 

  10. T. Berners-Lee, “Semantic Web Road Map,” http://www.w3.org/DesignIssues/Semantic.html.

    Google Scholar 

  11. K. Bharat and M. R. Henzinger, “Improved Algorithms for Topic Distillation in a Hyperlinked Environment,” Proceedings of the ACM SIGIR International Conference on Information Retrieval, 1998.

    Google Scholar 

  12. A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas, “Finding Authorities and Hubs from Link Structures on the World Wide Web,” Proceedings of the 10th International World Wide Web Conference, pages 415–429, 2001.

    Google Scholar 

  13. S. Brin and L. Page, “The Anatomy of a Large-scale Hypertextual Web Search Engine,” Proceedings of the 7th International World Wide Web Conference, 1998.

    Google Scholar 

  14. A. Broder, S. Glassman, M. Manasse, and G. Zweig, “Syntactic Clustering of the Web,” Proceedings of the 6th International WWW Conference, pages 391–404, 1997.

    Google Scholar 

  15. A. Caglayan and C. Harrison, “Agent Sourcebook—A Complete Guide to Desktop, Internet, and Intranet Agents,” John Wiley & Son, 1997.

    Google Scholar 

  16. C. Cardie, “Empirical Methods in Information Extraction,” AI Magazine, 18(4):5–79, 1997.

    Google Scholar 

  17. S. Chakrabarti, “Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction,” Proceedings of the 10th International World Wide Web Conference, 2001.

    Google Scholar 

  18. S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. M. Kleinberg, “Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text,” Proceedings of the 7th International World Wide Web Conference, 1998.

    Google Scholar 

  19. S. Chakrabarti, B. Dom, S. Kumar, P Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. M. Kleinberg, “Mining the Web’s Link Structure,” IEEE Computer, 32(8):60–67, August 1999.

    Google Scholar 

  20. S. Chakrabarti, M. Joshi, and V. Tawde, “Enhanced Topic Distillation using Text, Markup Tags, and Hyperlinks,” Proceedings of the ACM SIGIR International Conference on Information Retrieval, 2001.

    Google Scholar 

  21. M. S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from a Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, 8(6): 866–883, 1996.

    Article  Google Scholar 

  22. L. F. Chien, “PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval,” Proceedings of the ACM SIGIR International Conference on Information Retrieval, 1997.

    Google Scholar 

  23. B. Chidlovskii, “Wrapper Generation by k-Reversible Grammar Induction,” Workshop on Machine Learning for Information Extraction, August, 2000.

    Google Scholar 

  24. D. W. Chung, U. T. Ng, A. W. Fu, and Y.J. Fu, “Efficient Mining of Association Rules in Distributed Databases,” IEEE Transactions on Knowledge and Data Engineering, 8(6):911–922, December 1996.

    Article  Google Scholar 

  25. P. Clark and T. Niblett, “The CN2 Induction Algorithm,” Machine Learning Journal, 3(4):261–283, 1989.

    Google Scholar 

  26. W. B. Croft and P. Savino, “Implementing Ranking Strategies Using Text Signatures,” ACM Transactions on Office Information Systems, 6(1):42–62, Jan. 1998.

    Article  Google Scholar 

  27. T. G. Dietterich, “Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms,” Neural Computation, 10(7):1895–1924, 1998.

    Article  Google Scholar 

  28. R. Doorenbos, O. Etzioni, and D. S. Weld, “A Scalable Comparison-Shopping Agent for the World-Wide Web,” Proceedings of the 1st International Conference on Autonomous Agents, pages 39–48, February 1997.

    Google Scholar 

  29. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases,” Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pages 226–231, 1996.

    Google Scholar 

  30. O. Etzioni, “The World-Wide Web: Quagmire or Gold Mine,” Communications of the ACM, 39(11):65–68. November 1996.

    Article  Google Scholar 

  31. O. Etzioni and M. Perkowitz, “Category Translation: Learning to Understand Information on the Internet,” Proceedings of 15th International Joint conference on AI, pages 930–936, 1995.

    Google Scholar 

  32. W. B. Frakes and R. Baeza-Yates, “Information Retrieval: Data Structures and Algorithms,” Prentice Hall, 1992.

    Google Scholar 

  33. D. Freitag, “Machine Learning for Information Extraction,” Ph.D. Dissertation of Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 1998.

    Google Scholar 

  34. N. Fuhr, “Models for Retrieval with Probabilistic Indexing,” Information Processing and Management, 25(1):55–72, 1989.

    Article  MathSciNet  Google Scholar 

  35. S. Guha, R. Rastogi, and K. Shim, “CURE: An Efficient Clustering Algorithm for Large Databases,” Proceedings of the ACM SIGMOD International Conference, pages 73–84, 1998.

    Google Scholar 

  36. S. Guha, R. Rastogi, and K. Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes,” Proceedings of the 15th International Conference on Data Engineering, 1999.

    Google Scholar 

  37. J. Han, Y. Cai, and N. Cercone, “Knowledge Discovery in Databases: An Attribute-Oriented Approach,” Proceedings of the 18th VLDB Conference, pages 547–559, 1992.

    Google Scholar 

  38. J. Han, Y. Fu, W. Wang, J. Chiang, W. Gong, K. Koperski, D. Li, Y. Lu, A. Rajan, N. Stefanovic, B. Xia, and O. R. Zaiane, “DBMiner: A System for Mining Knowledge in Large Relational Databases,” Proceedings of the International Conference on Data Mining and Knowledge Discovery, pages 250–255, 1996.

    Google Scholar 

  39. J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” Morgan Kaufinann, 2001.

    Google Scholar 

  40. J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proceedings of the ACM SIGMOD International Conference, pages 486–493,2000.

    Google Scholar 

  41. C. C. Hayes, “Agents in a Nutshell—A Very Brief Introduction,” IEEE Transactions on Knowledge and Data Engineering, 11(1):127–132, Jan/Feb 1999.

    Article  MathSciNet  Google Scholar 

  42. C. N. Hsu, and M. T. Dung, “Generating Finite-state Transducers for Semi-structured Data Extraction from the Web,” Information Systems, 23(8):521–538, 1998.

    Article  Google Scholar 

  43. A. Jain, M. Murty, and P. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, 31(3):264–323, 1999.

    Article  Google Scholar 

  44. Y. F. Jing and W. B. Croft, “An Association Thesaurus for Information Retrieval,” http://cobar. cs.umass. edu/info/psfiles/irpubs/jingcroftassocthes.ps.gz, UMass TR 94–17.

    Google Scholar 

  45. T. Kalt and W. B. Croft, “A New Probabilistic Model of Text Classification and Retrieval,” http://cobar. cs.umass.edu/info/psfiles/irpubs/ir.html, UMass Computer Science Technical Report, IR-78, 1996.

    Google Scholar 

  46. M. Kantardzic, “Data Mining: Concepts, Models, Methods, and Algorithms,” Wiley-Interscience, 2003.

    Google Scholar 

  47. H. Y. Kao, S. H. Lin, J. M. Ho, and M. S. Chen, “Entropy-Based Link Analysis for Mining Web Informative Structures,” the Eleventh International Conference on Information and Knowledge Management (CIKM’02), 2002.

    Google Scholar 

  48. H. Y. Kao, S. H. Lin, J. M. Ho, and M. S. Chen, “Mining Web Informative Structures and Contents Based on Entropy Analysis,” to appear in IEEE Transactions on Knowledge and Data Engineering.

    Google Scholar 

  49. G. Karypis, E.-H. Han, and V. Kumar, “CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling,” IEEEComputer, 32(8):68–75, 1999.

    Article  Google Scholar 

  50. J. M. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” ACM-SIAM Symposium on Discrete Algorithms, 1998.

    Google Scholar 

  51. R. Kosala and H. Blockeel, “Web Mining Research: A Survey,” SIGKDD Explorations, 2(1):1–15, 2000.

    Article  Google Scholar 

  52. N. Kushmerick, D. Weld, and R. Doorenbos, “Wrapper Induction for Information Extraction,” Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAl), 1997.

    Google Scholar 

  53. L. S. Larkey and W. B. Croft, “Combining Classifiers in Text Categorization,” Proceedings of the ACM SIGIR International Conference on Information Retrieval, pages 289–297, 1996.

    Google Scholar 

  54. D. Lewis, “An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task,” Proceedings of the ACM SIGIR International Conference on Information Retrieval, pages 37–50, 1992.

    Google Scholar 

  55. R. Lempel and S. Moran, “The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC Effect,” Proceedings of the 9th International World Wide Web Conference, May 2000.

    Google Scholar 

  56. S. H. Lin, M. C. Chen, J. M. Ho, and Y. M. Huang, “ACIRD: Intelligent Internet Document Organization and Retrieval,” IEEE Transactions on Knowledge and Data Engineering, 14(3):599–614, May/June 2002.

    Article  Google Scholar 

  57. S. H. Lin and J. M. Ho, “Discovering Informative Content Blocks from Web Documents,” Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.

    Google Scholar 

  58. S. H. Lin, C. S. Shih, M. C. Chen, J. M. Ho, M. T. Kao, and Y. M. Huang, “Extracting Classification Knowledge of Internet Documents: A semantics Approach,” Proceedings of the ACM SIGIR International Conference on Information Retrieval, pages 241–249, 1998.

    Google Scholar 

  59. U. Manber and S. Wu, “GLIMPSE: a Tool to Search through Entire File Systems,” Winter USENIX Technical Conference, pages 23–32, USENIX Association, 1994.

    Google Scholar 

  60. S. Madria, S. Bhowmick, W. Ng, and P. Lim, “Research Issues in Web Data Mining,” Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, pages 303–312, 1999.

    Google Scholar 

  61. A. McCallum, K. Nigam, J. Rennie, and K. Seymore, “A Machine Learning Approach to Building Domain-Specific Search Engines,” Proceedings of the 6th International Joint Conference on Artificial Intelligence, pages 662–667, 1999.

    Google Scholar 

  62. M. Mehta, J. Rissanen, and R. Agrawal, “SLIQ: A Fast Scalable Classifier for Data Mining,” Proceedings of the 5th International Conference on Extending Database Technology, 1996.

    Google Scholar 

  63. T. M. Mitchell, “Machine Learning,” McGraw-Hill, 1997.

    Google Scholar 

  64. J. Mostafa, S. Mukhopadhyay, W. Lam, and M. Palakal, “A Multilevel Approach to Intelligent Information Filtering: Model, System, and Evaluation,” ACM Transactions on Information Systems, 15(4):368–399, October 1997.

    Google Scholar 

  65. R. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proceedings of the 20th International Conference on Very Large Databases, 1994.

    Google Scholar 

  66. R. Ng and J. Han, “CLARANS: A Method for Clustering Objects for Spatial Data Mining,” IEEE Transactions on Knowledge and Data Engineering, 14(5):1003–1016, September/October 2002.

    Article  Google Scholar 

  67. S. K. Pal, V. Talwar, and P. Mitra, “Web Mining in Soft Computing Framework: Relevance, State of the Art and Future Directions,” IEEE Transactions on Neural Networks, 13(5):1163–1177,2002.

    Article  Google Scholar 

  68. J. S. Park, M.-S. Chen, and P. S. Yu, “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” IEEE Transactions on Knowledge and Data Engineering, 9(5):813–825, September/October 1997.

    Article  Google Scholar 

  69. G. Piatetsky Shapiro and W. J. Frawley, “Knowledge Discovery in Databases.” AAAI MIT Press, 1991.

    Google Scholar 

  70. M. F. Porter, “An Algorithm for Suffix Stripping,” Program, 14(3):130–137, 1980.

    Google Scholar 

  71. J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, Vol. 1, pages 261–283, 1989.

    Google Scholar 

  72. J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers. San Mateo, CA, 1993.

    Google Scholar 

  73. S. Raghavan and H. Garcia-Molina, “Crawling the Hidden Web,” Proceedings of the 27th International Conference on Very Large Data Bases, pages 129–138, 2001.

    Google Scholar 

  74. J. Rennie and A. McCallum, “Using Reinforcement Learning to Spider the Web Efficiently,” Proceedings of the 6th International Conference on Machine Learning, pages 335–343, 1999.

    Google Scholar 

  75. G. Salton, “Automatic Information Organization and Retrieval,” McGraw-Hill, 1968.

    Google Scholar 

  76. G. Salton and C. Buckley, “Term-weighting Approaches in Automatic Text Retrieval,” Information Processing and Management, 24(5):513–523, 1988.

    Article  Google Scholar 

  77. G. Salton and C. Buckley, “Improving Retrieval Performance by Relevance Feedback,” Journal of American Society for Information Science, 41(4):188–297,1990.

    Article  Google Scholar 

  78. G. Salton, A. Wong, and C. Yang, “A Vector Space Model for Automatic Indexing,” Communications of the ACM, 18(11):613–620, 1971.

    Article  Google Scholar 

  79. D. Shasha and T. Wang, “New Techniques for Best-Match Retrieval,” ACM Transactions on Office Information Systems, 8(2):140–158, January 1990.

    Article  Google Scholar 

  80. R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proceedings of the 21st International Conference on Very Large Databases, pages 407–419, 1995.

    Google Scholar 

  81. R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” Proceedings of the ACM SIGMOD International Conference on Management of Data, June 1996.

    Google Scholar 

  82. S. B. Thrun, et al, “The MONK’s Problems A Performance Comparison of Different Learning Algorithms,” Technical report CMU-CS-91-197. Carnegie Mellon University, 1991.

    Google Scholar 

  83. W3C XML, “Extensible Markup Language (XML),” http://www.w3.org/XML/.

    Google Scholar 

  84. K. Wang and H. Liu, “Discovering Structural Association of Semistructured Data,” IEEE Transactions on Knowledge and Data Engineering, 12(3):353–371, 2000.

    Article  Google Scholar 

  85. M. Wooldridge and N. Jennings, “Intelligent Agents: Theory and Practice,” Knowledge Engineering Review 10(2):115–152, Cambridge University Press, 1995.

    Article  Google Scholar 

  86. Y. Yang, “Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval,” Proceedings of the ACM SIGIR International Conference on Information Retrieval, pages 13–22, 1994.

    Google Scholar 

  87. B. Yuwono, S. L. Y. Lam, J. H. Ying, and D. L. Lee, “A World Wide Web Resource Discovery System,” World Wide Web Journal, 1(1), Winter 1996.

    Google Scholar 

  88. O. R. Zaiane, M. Xin, and J. Han, “Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs,” Proceedings of Advances in Digital Libraries Conference, pages 19–29, 1998.

    Google Scholar 

  89. T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Database,” Proceedings of the ACM SIGMOD Conference on Management of Data, pages 103–114, 1996.

    Google Scholar 

  90. G. K. Zipf, “Human Behavior and the Principle of Least Effort,” Addison Wesley Publishing, Reading, Massachusetts, 1949.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Kluwer Academic Publishers

About this chapter

Cite this chapter

Lin, SH. (2005). Intelligent Internet Information Systems in Knowledge Acquisition: Techniques and Applications. In: Leondes, C.T. (eds) Intelligent Knowledge-Based Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4020-7829-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4020-7829-3_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4020-7746-3

  • Online ISBN: 978-1-4020-7829-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics