In our chapter we are working within the field of Web content mining. In relation to the user’s description of a Web page, we define a new term: Named object. Named objects are used for a new classification of selected methods dealing with mining, information from Web pages. This classification has been made on the basis of a survey of published methods. Our approach is based on the perception of a Web page through an intention. This intention is important both for the users and authors of a Web page. Named object is near to Web design patterns, which became a basis for our own mining method, Pattrio. The Pattrio method is introduced in this work together with a few experiments.


Opinion Mining Nonnegative Matrix Factorization Price Information Name Object User Center Design 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alexander, Ch.: A Pattern Language: Towns, Buildings, Construction. Oxford University Press, New York (1977)Google Scholar
  2. 2.
    Boese, E. S., Howe, A. E.: Effects of web document evolution on genre classification. 14th ACM Information and Knowledge Management (Bremen, Germany, October 31–November 05, 2005). CIKM’ 05. ACM, New York NY, pp. 632–639 (2005)CrossRefGoogle Scholar
  3. 3.
    Borchers, J.O.: Interaction design patterns: twelve theses, Position paper, Workshop on Pattern Languages for Interaction Design. CHI 2000 Conference on Human Factors in Computing Systems, pp. 1–6 (2000)Google Scholar
  4. 4.
    Chaker, J., Ounelli, H.: Genre Categorization of Web Pages. ICDM Workshops (2007)Google Scholar
  5. 5.
    Chang Ch.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems, IEEE Transactions on Knowledge and Data Engineering, 18, 1411–1428 (2006)Google Scholar
  6. 6.
    Conrad, J.G., Schilder, F.: Opinion mining in legal blogs. Artificial intelligence and Law (Stanford, June 04–08, 2007). ICAIL’ 07. ACM, New York, NY, pp. 231–236. (2007)Google Scholar
  7. 7.
    Dong, L., Watters, C.R., Duffy J., Shepherd, M.A.: An Examination of Genre Attributes for Web Page Classification. HICSS (2008)Google Scholar
  8. 8.
    Van Duyne, D.K., Landay, J.A., Hong, J.I. The Design of Sites: Patterns, Principles, and Processes, for Crafting a Customer-Centered Web Experience. Pearson Education (2002)Google Scholar
  9. 9.
    Embley, D.E., Tao, C., Liddle, S.W.: Automating the extraction of data from HTML tables with unknown structure. Data Knowl. Eng. 5, 3–28Google Scholar
  10. 10.
    Flieder, K., Modritscher, F. Foundations of a pattern language based on Gestalt principles. In CHI’ 06 Extended Abstracts on Human Factors in Computing Systems, pp. 773–778 (2006)Google Scholar
  11. 11.
    Gagneux, A., Eglin, V., Emptoz, H.: Quality Approach of Web Documents by an Evaluation of Structure Relevance, Proceedings of WDA (2001)Google Scholar
  12. 12.
    Gatterbauer, W., Bohunsky, P., Herzog, M., Krupl, B., Pollak, B.: Towards domain-independent information extraction from web tables. World Wide Web’ 07, (2007)Google Scholar
  13. 13.
    Goldberg, J. H., Stimson, M. J., Lewenstein, M., Scott, N., Wichansky, A. M.: Eye tracking in web search tasks: design implications. Symposium on Eye Tracking Research & Applications, ETRA’ 02, ACM, pp. 51–58 (2002)Google Scholar
  14. 14.
    Graham, L.: A pattern language for web usability. Addison-Wesley (2003)Google Scholar
  15. 15.
    Han, J. Chang, K.: Data Mining for Web Intelligence. Computer 35: 11, 64–70 (2002)Google Scholar
  16. 16.
    Han J., Kamber, M.: Data mining: concepts and techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA. (2000)Google Scholar
  17. 17.
    Kanaris, I., Stamatatos, E.: Webpage Genre Identification Using Variable-Length Character n-Grams Tools with Artificial Intelligence, 2007. ICTAI 2007, pp. 3–10 (2007)Google Scholar
  18. 18.
    Kennedy, A., Shepherd, M.: Automatic identification of home pages on the web. Annual Hawaii International Conference on System Sciences (2005)Google Scholar
  19. 19.
    Kocibova, J., Klos, K., Lehecka, O., Kudelka, M., Snasel, V.: Web Page Analysis: Experiments Based on Discussion and Purchase Web Patterns. IEEE/ACM WIC Web Intelligence Workshops (2007).Google Scholar
  20. 20.
    Kohonen, T.: Self-Organizing Maps, Springer (2006)Google Scholar
  21. 21.
    Kosala, K. Blockeel, H.: Web Mining Research: A Survey, SIGKDD Explorations 2. 1–15 (2000)CrossRefGoogle Scholar
  22. 22.
    Kudelka, M., Snasel, V., Lehecka, O., El-Qawasmeh, E.: Semantic Analysis of Web Pages Using Web Patterns. IEEE/ACM/WIC Web Intelligence (2006)Google Scholar
  23. 23.
    Kudelka, M., Snasel, V., Lehecka, O., El-Qawasmeh, E., Pokorny, J.: Web Pages Reordering and Clustering Based on Web Patterns. SOFSEM 2008, Novy Smokovec, Slovakia, in Springer LNCS (2008)Google Scholar
  24. 24.
    Kudelka, M., Snasel V., Lehecka, O., El-Qawasmeh, E.: Web Content Mining Using Web Design Patterns, IEEE International Conference on Information Reuse and Integration (2008)Google Scholar
  25. 25.
    Lee, D., Jeong, O., and Lee, S.: Opinion mining of customer feedback data on the web. Conference on Ubiquitous information Management and Communication ICUIMC’ 08. pp. 230–235 (2008).Google Scholar
  26. 26.
    Lerman, K., Getoor, L., Minton, S., Knoblock, C.: Using the structure of Web sites for automatic segmentation of tables. ACM SIGMOD Management of Data, SIGMOD’ 04. pp. 119–130 (2004)Google Scholar
  27. 27.
    Limanto, H. Y., Giang, N. N., Trung, V. T., Zhang, J., He, Q., Huy, N. Q.: An information extraction engine for web discussion forums. World Wide Web www’ 05. pp. 978–979 (2005)Google Scholar
  28. 28.
    Nie, Z., Wen, J-R., Ma W-Y.: Object-level Vertical Search. CIDR 2007, pp. 235–246. (2007)Google Scholar
  29. 29.
    Nielsen, J., Loranger, H.: Prioritizing Web Usability. New Riders Press, Berkeley. (2006)Google Scholar
  30. 30.
    Pivk, A., Cimiano, P., Sure, Y., Gams, M., Rajkovic, V., Studer, R.: Transforming arbitrary tables into logical form with TARTAR. Data Knowl. Eng. 60, 567–595 (2007)CrossRefGoogle Scholar
  31. 31.
    Reis, D.C., Golgher, P.B., Silva, A.S., Laender, A.F.: Automatic web news extraction using tree edit distance. In WWW’ 04: Proceedings of the 13th international conference on World Wide Web (2004)Google Scholar
  32. 32.
    Santini, M.: Characterizing Genres of Web Pages: Genre Hybridism and Individualization. HICSS 2007, p. 71 (2007)Google Scholar
  33. 33.
    Salton G., Wong, A. Yang, C. S.: A vector space model for automatic indexing, Communications of the ACM 18, 613–620 (1975)MATHGoogle Scholar
  34. 34.
    Schmidt, S., Mandl, S., Ludwig, B., Stoyan, H.: Product-advisory on the web: An information extraction approach, Artificial Intelligence and Applications, pp. 678–683 (2007)Google Scholar
  35. 35.
    Schuth, A., Marx, M., de Rijke, M.: Extracting the discussion structure in comments on news-articles. ACM international Workshop on Web information and Data Management pp. 97–104 (2007)Google Scholar
  36. 36.
    Snasel, V., Rezankova, H., Husek, D., Kudelka, M., Lehecka, O.: Semantic Analysis of Web Pages Using Cluster Analysis and Nonnegative Matrix Factorization. IEEE/WIC AWIC 2007, Springer ASC (2007)Google Scholar
  37. 37.
    Tidwell, J.: Designing Interfaces: Patterns for Effective Interaction Design, O’Reilly Media, Inc. (2006)Google Scholar
  38. 38.
    Van Welie, M.: Pattern in Interaction Design,, (last access 2008-08-31).Google Scholar
  39. 39.
    Wong, T-L. W. Lam, W.: Hot Item Mining and Summarization from Multiple Auction Web Sites. ICDM 2005, pp. 797–800 (2005)Google Scholar
  40. 40.
    Yahoo!,, (last access 2008-08-31).Google Scholar
  41. 41.
    Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences, International Journal on Document Analysis and Recognition, 7, 1–16 (2004)Google Scholar
  42. 42.
    Zheng, S., Song, R., Wen, J.-R.: Template-independent news extraction based on visual consistency. In Proceedings of AAAI-2007, pp. 1507–1511 (2005).Google Scholar
  43. 43.
    Zheng, S., Zhou, D., Li, J., Giles, C.L.: Extracting Author Meta-Data from Web Using Visual Features, Data Mining Workshops, ICDM Workshops, 2007, pp. 33–40 (2007)Google Scholar
  44. 44.
    Zhu, J., Zhang, B., Nie, Z., Wen, J.R., Hon, H.W. Webpage understanding: an integrated approach, Conference on Knowledge Discovery in Data, San Jose, California, USA, pp. 903–912 (2007)Google Scholar

Copyright information

© Indian Institute of Information Technology, India 2009

Authors and Affiliations

  • Václav Snášel
    • 1
  • Milos Kudelka
    • 1
  1. 1.Faculty of Electrical Engineering and Computer ScienceVSB Technical University of OstravaOstrava-PorubaCzech Republic

Personalised recommendations