Web Mining: Extracting Knowledge from the World Wide Web

Shi, Zhongzhi; Ma, Huifang; He, Qing

doi:10.1007/978-0-387-79420-4_14

Zhongzhi Shi⁴,
Huifang Ma⁴ &
Qing He⁴

2057 Accesses
1 Citations

This chapter addresses existing techniques for Web mining, which is moving the World Wide Web toward a more useful environment in which users can quickly and easily find the information they need. In particular, this chapter introduces the reader to methods of data mining on the Web developed by our laboratory, including uncovering patterns in Web content (semantic processing, classification, clustering), structure (retrieval, classical link analysis method), and event (preprocessing of Web event mining, news dynamic trace, multi-document summarization analysis). This chapter would be an excellent resource for students and researchers who are familiar with the basic principles of data mining and want to learn more about the application of data mining to their problems in Web mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ando R., Kboguraev B., Kbyrd R. J.: Multi-document Summarization by Visualizing Topical Content.ANLP-NAACL 2000 Workshop, Seattle Advanced Summarization Workshop, 2000: 12-19
Google Scholar
Bing Liu: Web data mining. Springer Verlag, 2007
Google Scholar
C. Apte, F. Damerau, S. Weiss: Text mining with decision rules and decision trees. In Proceedings of the Conference on Automated Learning and Discovery, Workshop, 1998
Google Scholar
David C. Luckham, James Vera: An Event-Based Architecture Definition Language. IEEE TRANSANCTION ON Software Engineering, 1995, 21(9): 717–734
Article Google Scholar
Etzioni, Oren: World-Wide Web: Quagmire or gold mine. Communications of the ACM, 1996, 39(11): 65–68
Article Google Scholar
Evans K., Dklavans J., Lmckeown K. R.: Columbia Newsblaster Multilingual news summarization on the Web.Demonstration Papers at HLT-NAACL, 2004: 1–4
Google Scholar
G. DeJong: Prediction and substantiation: A new approach to natural language processing. Cognitive Science, 1979: 251–273
Google Scholar
H. Chen, D. T. Ng.: An algorithmic approach to concept exploration in a large knowled-genetwork (automatic thesaurus consultation): symbolic branch-and-bound vs. connection-ist Hopfield net activation. Journal of the American Society for Information Science, 1995, 46(5):348–369
Article Google Scholar
H. Chen, J. Martinez, T. D. Ng, B. R. Schatz: A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System. Journal of the American Society for Information Science, 1997, 48(1): 17–31
Article Google Scholar
J. R. T. Ng, J. Han: Efficient and effective clustering methods for spatial data mining. Proceedings of the 20th VLDB Conference, 1994: 144–155
Google Scholar
Jia Ziyan, He Qing, Zhang Hai Jun, Li Jiayou, Shi Zhongzhi: A News Event Detection and Tracking Algorithm Based on Dynamic Evolution Model. Journal of Computer Research and Development (in Chinese), 2004, 41(7): 1273–1280
Google Scholar
Jon M. Kleinberg: Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999, 46(5): 604–632
Article MATH MathSciNet Google Scholar
Lin Chin Yew, Hovy Eduard: From Single to Multi-document Summarization: A Prototype System and its Evaluation. In Proceedings of ACL, 2002: 25–34
Google Scholar
M. Ester, H. P. Kriegel, J. Sander, X. Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceeding of the 2nd Internatioal Conference on Knowledge Discovery and Data Mining, 1996: 226–231
Google Scholar
M. Spiliopoulou: Data mining for the Web. In Proceedings of Principles of Data Mining and Knowledge Discovery. Third European conference, 1999, 588–589
Google Scholar
Qing He, Ziyan Jia, Jiayou Li,Haijun Zhang,Qingyong Li, Zhongzhi Shi: GHUNT: A SEMANTIC INDEXING SYSTEM BASED ON CONCEPT SPACE. International Conference on Natural Language Processing and Knowledge Engineering (IEEENLP&KE-2003), 2003: 716–721
Google Scholar
Raymond Kosala, Hendrik Blockeel: Web mining research: a survey. ACM SIGKDD Explorations Newsletter, 2000, 2(1): 1–15
Article Google Scholar
R. Cooley: Web Usage Mining: Discovery and Application of Interesting Patterns from Web data. PhD thesis, Dept. of Computer Science, University of Minnesota. May, 2000
Google Scholar
Radevr, Jing Hongyan, Budzikowska Malgorzata: Centroid-based summarization of multiple documents Sentence extraction, utility-based evaluationand user studies. ANLP-NAACL 2000 Workshop, 2000: 21–29
Google Scholar
S. Lu, X. L. Li, S. Bai et al.: An improved approach to weighting terms in text. Journal of Chinese Information Processing (in Chinese), 2000, 14(6): 8–13
MATH Google Scholar
S. K. Madria, S. S. Rhowmich, W. K. Ng, F. P. Lim: Research issues in Web data mining. Proceedings of Data Warehousing and Knowledge Discovery, First International Conference. 1999: 303–312
Google Scholar
Sergey Brin, Larry Page: The anatomy of a large-scale hypertextual Web search engine. Proceedings of the Seventh International World Wide Web, 1998, 30(7): 107–117
Google Scholar
Shaohui Liu, Mingkai Dong, Haijun Zhang, Rong Li, Zhongzhi Shi: An approach of multi-hierarchy text classification. International Conferences on Info-tech and Info-net. 2001, 3: 95–100
Google Scholar
T. Mitchell: Machine Learning. McGraw: Hill, 1996
MATH Google Scholar
Teuvo Kohonen, Samuel Kashi: Self-Organization of a Massive Document Collection. IEEE Transactions On Neural Networks, 2000,11(3): 574–585
Article Google Scholar
V. Vapnik: The Nature of Statistical Learning Theory. New York. Springer-Verlag, 1995
MATH Google Scholar
Wei Wang, Jiong Yang, Richard Muntz: STING: A Statistical Information Grid Approach to Spatial Data Mining. Proceedings of the 23rd VLDB Conference, 1997: 186–195
Google Scholar
Wu Bin, Zheng Yi, Liu Shaohui, Shi Zhongzhi: CSIM: A Document Clustering Algorithm Based On Swarm Intelligence. World Congress on Computational Intelligence, 2002: 477– 482
Google Scholar
www.keenage.com
X. L. Li, J. M. Liu, Z. Z. Shi: The concept-reasoning network and its application in text classification. Journal of Computer Research and Development (in Chinese), 2000, 37(9): 1032–1038
Google Scholar
Y. Yang, C. G. Chute: An example-based mapping method for text categorization and retrieval. ACM Transaction on Information Systems (TOIS), 1994, 12(3): 252–277
Article Google Scholar
Y. Yang: Expert Network: Effective and efficient learning from human decisions in text categorization and retrieval. Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SIGIR'94), 1994: 13–22
Google Scholar
Yuan Li, Qing He, Zhongzhi Shi: Association Retrieval based on concept semantic space. (in Chinese) Journal of University of Science and Technology Beijing, 2001, 23(6): 577–580
Google Scholar
Zhongzhi Shi, Qing He, Ziyan Jia, Jiayou Li: Intelligence Chinese Document Semantic Indexing System. International Journal of Information Technology and Decision Making, 2003, 2(3): 407–424
Article Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, No. 6 Kexueyuan Nanlu, Beijing, 100080, People's Republic of China
Zhongzhi Shi, Huifang Ma & Qing He

Authors

Zhongzhi Shi
View author publications
You can also search for this author in PubMed Google Scholar
Huifang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Qing He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongzhi Shi .

Editor information

Editors and Affiliations

School of Software Faculty of Engineering and Information Technology, University of Technology, PO Box 123, Sydney, Broadway, NSW 2007, Australia
Longbing Cao & Huaifeng Zhang &
Department of Computer Science, University of Illinois at Chicago, 851 S. Morgan St., Chicago, IL, 60607
Philip S. Yu
Centre for Quantum Computation and Intelligent Systems Faculty of Engineering and Information Technology, University of Technology, PO Box 123, Sydney, Broadway, NSW 2007, Australia
Chengqi Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shi, Z., Ma, H., He, Q. (2009). Web Mining: Extracting Knowledge from the World Wide Web. In: Cao, L., Yu, P.S., Zhang, C., Zhang, H. (eds) Data Mining for Business Applications. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79420-4_14

Download citation

DOI: https://doi.org/10.1007/978-0-387-79420-4_14
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-79419-8
Online ISBN: 978-0-387-79420-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics