A Roadmap for Web Mining: From Web to Semantic Web

  • Bettina Berendt
  • Andreas Hotho
  • Dunja Mladenic
  • Maarten van Someren
  • Myra Spiliopoulou
  • Gerd Stumme
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3209)


The purpose of Web mining is to develop methods and systems for discovering models of objects and processes on the World Wide Web and for web-based systems that show adaptive performance. Web Mining integrates three parent areas: Data Mining (we use this term here also for the closely related areas of Machine Learning and Knowledge Discovery), Internet technology and World Wide Web, and for the more recent Semantic Web. The World Wide Web has made an enormous amount of information electronically accessible. The use of email, news and markup languages like HTML allow users to publish and read documents at a world-wide scale and to communicate via chat connections, including information in the form of images and voice records. The HTTP protocol that enables access to documents over the network via Web browsers created an immense improvement in communication and access to information. For some years these possibilities were used mostly in the scientific world but recent years have seen an immense growth in popularity, supported by the wide availability of computers and broadband communication. The use of the internet for other tasks than finding information and direct communication is increasing, as can be seen from the interest in “e-activities” such as e-commerce, e-learning, e-government, e-science.


Data Mining Association Rule Resource Description Framework Information Extraction Inductive Logic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Michalski, R., Bratko, I. (eds.): M.K.: Machine Learning and Data Mining: methods and applications. John Wiley and Sons, Chichester (1998)Google Scholar
  2. 2.
    Paliouras, G., Karkaletsis, V. (eds.): C.S.: Machine Learning and its Applications. Springer, Heidelberg (2001)Google Scholar
  3. 3.
    Franke, J., Nakhaeizadeh, G., Renz, I. (eds.): Text Mining, Theoretical Aspects and Applications. Physica-Verlag, Heidelberg (2003)zbMATHGoogle Scholar
  4. 4.
    Berners-Lee, T., Fischetti, M.: Weaving the Web. Harper, San Francisco (1999)Google Scholar
  5. 5.
    Berendt, B., Stumme, G., Hotho, A.: Usage mining for and on the semantic web. In: Data Mining: Next Generation Challenges and Future Directions, pp. 467–486. AAAI/MIT Press (2004)Google Scholar
  6. 6.
    Berendt, B., Hotho, A., Stumme, G.: Towards semantic web mining. In: [73], 264–278 (2002)Google Scholar
  7. 7.
    Mladenić, D., Grobelnik, M.: Feature selection on hierarchy of web documents. Journal of Decission support systems 35, 45–87 (2003)CrossRefGoogle Scholar
  8. 8.
    Erdmann, M.: Ontologien zur konzeptuellen Modellierung der Semantik von XML. Isbn: 3831126356, University of Karlsruhe (2001)Google Scholar
  9. 9.
    W3C: RDF/XML Syntax Specification (Revised). W3C recommendation (2004),
  10. 10.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  11. 11.
    Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)zbMATHGoogle Scholar
  12. 12.
    Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)Google Scholar
  13. 13.
    Weiss, M., Indurkhya, N.: Pedictive Data-Mining: A Practical Guide. Morgan Kaufmann, San Francisco (1997)Google Scholar
  14. 14.
    Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood, New York (1994)zbMATHGoogle Scholar
  15. 15.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD, Washington D.C., USA, pp. 207–216 (1993)Google Scholar
  16. 16.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf.Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)Google Scholar
  17. 17.
    Adamo, J.M.: Data Mining and Association Rules for Sequential Patterns: Sequential and Parallel Algorithms. Springer, New York (2001)zbMATHCrossRefGoogle Scholar
  18. 18.
    Roddick, J., Spiliopoulou, M.: A survey of temporal knowledge discovery paradigms and methods. IEEE Trans. of Knowledge and Data Engineering (2002)Google Scholar
  19. 19.
    Lan, B., Bressan, S., Ooi, B.: Making web servers pushier. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 108–122. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  20. 20.
    Scheffer, T., Wrobel, S.: A sequential sampling algorithm for a general class of utility criteria. Knowledge Discovery and Data Mining, 330–334 (2000)Google Scholar
  21. 21.
    Zaki, M., Lesh, N., Ogihara, M.: Mining features for sequence classification. In: KDD 1999, pp. 342–346. ACM, New York (1999)Google Scholar
  22. 22.
    Weiss, G.M., Hirsh, H.: Learning to predict rare events in event sequences. In: Agrawal, R., Stolorz, P., Piatesky-Shapiro, G. (eds.) Proc. of 4th Int. Conf. KDD, New York, NY, pp. 359–363 (1998)Google Scholar
  23. 23.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine source. In: Proceedings of the seventh international conference on World Wide Web, Elsevier Science Publishers, Amsterdam (1998)Google Scholar
  24. 24.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems 1, 5–32 (1999)Google Scholar
  26. 26.
    Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web usage analysis. Rashid, L., Tuzhilin, A. (eds.) INFORMS Journal on Computing, Special Issue on Mining Web-based Data for E-Business Applications (2003)Google Scholar
  27. 27.
    McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning (ICML 1998), Morgan Kaufmann, San Francisco (1998)Google Scholar
  28. 28.
    Mladenic, D.: Web browsing using machine learning on text data. In: Szczepaniak, P. (ed.) Intelligent exploration of the web, vol. 111, pp. 288–303. Physica-Verlag, Heidelberg (2002)Google Scholar
  29. 29.
    Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 170–178 (1997)Google Scholar
  30. 30.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)CrossRefMathSciNetGoogle Scholar
  31. 31.
    Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.: An evaluation of naive bayesian anti-spam filtering. In: Potamias, G., Moustakis, V., van Someren, M. (eds.) Proceedings of the workshop on Machine Learning in the New Information Age (2000)Google Scholar
  32. 32.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Grobelnik, M., Mladenic, D., Milic-Frayling, N. (eds.) Proceedings of the KDD Workshop on Text Mining (2000)Google Scholar
  33. 33.
    Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Research and Development in Information Retrieval, pp. 46–54 (1998)Google Scholar
  34. 34.
    Califf, M.E., Mooney, R.J.: Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research 4, 177–210 (2003)CrossRefMathSciNetGoogle Scholar
  35. 35.
    Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings AAAI 2000, pp. 577–583 (2000)Google Scholar
  36. 36.
    Meng, X., Hu, D.: C.L.: Schema-guided wrapper maintenance for web-data extraction. In: ACM Fifth International Workshop on Web Information and Data Management, WIDM 2003 (2003)Google Scholar
  37. 37.
    Kushmerick, N., Thomas, B.: Adaptive information extraction: Core technologies for information agents. In: Intelligent Information Agents R&D in Europe: An Agent Link perspective, pp. 79–103. Springer, Berlin (2004)Google Scholar
  38. 38.
    Perkowitz, M., Etzioni, O.: Adaptive web sites: Automatically synthesizing web page. In: Proc. of AAAI/IAAI 1998, pp. 727–732 (1998)Google Scholar
  39. 39.
    Lin, W., Alvarez, S., Ruiz, C.: Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery 6, 83–105 (2002)CrossRefMathSciNetGoogle Scholar
  40. 40.
    Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining and Knowledge Discovery 6, 61–82 (2002)CrossRefMathSciNetGoogle Scholar
  41. 41.
    Baumgarten, M., Büchner, A.G., Anand, S.S., Mulvenna, M.D., Hughes, J.G.: Navigation pattern discovery from internet data. In: Proceedings, vol. [74], pp. 70–87 (2000)Google Scholar
  42. 42.
    Borges, J.L., Levene, M.: Data mining of user navigation patterns. In: Spiliopoulou, M., Masand, B. (eds.) Advances in Web Usage Analysis and User Profiling, pp. 92–111. Springer, Berlin (2000)CrossRefGoogle Scholar
  43. 43.
    Spiliopoulou, M.: The laborious way from data mining to web mining. Int. Journal of Comp. Sys., Sci. & Eng., Special Issue on “Semantics of the Web” 14, 113–126 (1999)Google Scholar
  44. 44.
    Cutler, M.: E-metrics: Tomorrow’s business metrics today. In: KDD 2000, ACM Press, Boston (2000)Google Scholar
  45. 45.
    Domingos, P., Richardson, M.: Mining the network value of customers. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 57–66. ACM, New York (2001)CrossRefGoogle Scholar
  46. 46.
    Schwartz, M., Wood, D.: Discovering shared interests using graph analysis. Communications of the ACM 36, 78–89 (1993)CrossRefGoogle Scholar
  47. 47.
    Kautz, H., Selman, B., Shah, M.: Referralweb: Combining social networks and collaborative filtering. Communications of the ACM 40, 63–66 (1997)CrossRefGoogle Scholar
  48. 48.
    Bloehdorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Proc. Of the Mining for and from the Semantic Web Workshop at KDD 2004 (2004)Google Scholar
  49. 49.
    Zaiane, O., Simoff, S.: Mdm/kdd: Multimedia data mining for the second time. SIGKDD Explorations 3 (2003)Google Scholar
  50. 50.
    Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Procs. Of the SIGIR 2003 Semantic Web Workshop, Toronto, Canada (2003)Google Scholar
  51. 51.
    McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning ICML 1998, Morgan Kaufmann, San Francisco (1998)Google Scholar
  52. 52.
    Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Proc. of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD, pp. 217–228 (2003)Google Scholar
  53. 53.
    Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intelligent Systems 16, 72–79 (2001)CrossRefGoogle Scholar
  54. 54.
    Cimiano, P., Hotho, A., Staab, S.: Comparing conceptual, partitional and agglomerative clustering for learning taxonomies from text. In: Proceedings of the European Conference on Artificial Intelligence, ECAI 2004 (2004)Google Scholar
  55. 55.
    Handschuh, S., Staab, S.: Authoring and annotation of web page in CREAM. In: Proc. Of WWW Conference (2002)Google Scholar
  56. 56.
    Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Proceedings of ECML/PKDD, pp. 217–228. Springer, Heidelberg (2003)Google Scholar
  57. 57.
    Hovy, E.: Combining and standardizing large-scale, practical ontologies for machine translation and other uses. In: Proc. 1st Intl. Conf. on Language Resources and Evaluation (LREC), Granada (1998)Google Scholar
  58. 58.
    Chalupsky, H.: Ontomorph: A translation system for symbolic knowledge. In: Principles of Knowledge Representation and Reasoning: Proceedings of the Seventh International Conference (KR 2000), pp. 471–482 (2000)Google Scholar
  59. 59.
    McGuinness, D., Fikes, R., Rice, J., Wilder, S.: An environment for merging and testing large ontologies. In: the Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000), Breckenridge, Colorado, USA, pp. 483–493 (2000)Google Scholar
  60. 60.
    Noy, N., Musen, M.: Prompt: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000), Austin, Texas, pp. 450–455 (2000)Google Scholar
  61. 61.
    Stumme, G., Maedche, A.: Fca-merge: Bottom-up merging of ontologies. In: Proceedings 17th International Conference on Artificial Intelligence (IJCAI 2001), pp. 225–230 (2001)Google Scholar
  62. 62.
    Doan, A., Madhavan, J., Domingos, P., Halevy, A.: Ontology matching: A machine learning approach. In: Handbook on Ontologies, pp. 385–404. Springer, Berlin (2004)Google Scholar
  63. 63.
    Heß, A., Kushmerick, N.: Machine learning forannotating semantic web services. In: Proceedings of the First International Semantic Web Services Symposium. AAAI Spring Symposium Series, vol. 2 (2004)Google Scholar
  64. 64.
    Aguado, B., Merceron, A., Voisard, A.: Extracting information from structured exercises. In: Proceedings of the 4th International Conference on Information Technology Based Higher Education and Training ITHET 2003, Marrakech, Morocco (2003)Google Scholar
  65. 65.
    Tane, J., Schmitz, C., Stumme, G.: Semantic resource management for the web: An elearning application. In: Proc. 13th International World WideWeb Conference, WWW 2004 (2004)Google Scholar
  66. 66.
    Althoff, K., Becker-Kornstaedt, U., Decker, B., Klotz, A., Leopold, E., Rech, J., Voss, A.: The indigo project: Enhancement of experience management and process learning with moderated discourses. In: Perner, P. (ed.) Data Mining in Marketing and Medicine, Springer, Berlin (2002)Google Scholar
  67. 67.
    Yihune, G.: Evaluation eines medizinischen Informationssystems im World Wide Web. Nutzungsanalyse am Beispiel. PhD thesis, Ruprecht-Karls-Universität Heidelberg (2003),
  68. 68.
    Kralisch, A., Berendt, B.: Cultural determinants of search behaviour on websites. In: Proceedings of the IWIPS 2004 Conference on Culture, Trust, and Design Innovation (2004)Google Scholar
  69. 69.
    Heino, J., Toivonen, H.: Automated detection of epidemics from the usage logs of a physicians’ reference database. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 180–191. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  70. 70.
    Mladenic, D., Lavrac, N., Bohanec, M., Moyle, S. (eds.): Data Mining and Decision Support: Integration and Collaboration. Kluwer Academic Publishers, Dordrecht (2003)zbMATHGoogle Scholar
  71. 71.
    Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: [75], pp. 217–228 (2002)Google Scholar
  72. 72.
    Iyengar, V.: Transforming data to satisfy privacy constraints. In: [75], pp. 279–288 (2002)Google Scholar
  73. 73.
    Horrocks, I., Hendler, J.A. (eds.): The Semantic Web. In: Horrocks, I., Hendler, J.A., (eds.): Proceedings of the First International Semantic Web Conference, Springer, Heidelberg (2002)Google Scholar
  74. 74.
    Masand, B., Spiliopoulou, M. (eds.): WebKDD 1999. LNCS (LNAI), vol. 1836. Springer, Heidelberg (2000)Google Scholar
  75. 75.
    Hand, D., Keim, D., Ng, R. (eds.): KDD - 2002 – Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Bettina Berendt
    • 1
  • Andreas Hotho
    • 2
  • Dunja Mladenic
    • 3
  • Maarten van Someren
    • 4
  • Myra Spiliopoulou
    • 5
  • Gerd Stumme
    • 2
  1. 1.Institute of Information SystemsHumboldt University BerlinGermany
  2. 2.Chair of Knowledge & Data EngineeringUniversity of KasselGermany
  3. 3.Jozef Stefan InstituteLjubljanaSlovenia
  4. 4.Social Science InformaticsUniversity of AmsterdamThe Netherlands
  5. 5.Institute of Technical and Business Information SystemsOtto–von–Guericke–University MagdeburgGermany

Personalised recommendations