Extraction of Hidden Semantics from Web Pages

  • Vincenza Carchiolo
  • Alessandro Longheu
  • Michele Malgeri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2412)


One of the main limitation when accessing web is the lack of explicit structure, whose presence may help in understanding data semantics. Here, an approach to extract logical schema from web pages is presented, defining a page model where its contents is divided into “logical” sections, i.e. parts of a page each collecting related information. This model aims to take into account both traditional, static HTML pages, as well as dynamic pages content.


Logical Schema Primary Node Logical Section Semistructured Data Page Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Apers, P.M.G.: Identifying internet-related database reasearch, 2nd Intl. East-West DB Workshop, 1994.Google Scholar
  2. 2.
    WWW Consortium —
  3. 3.
    Ceri, S., et al.: Design Principles for Data-intensive Web Sites — Proc. Of ACM SIGMOD, 1999Google Scholar
  4. 4.
    Abiteboul, S.: et al., Data on the Web, Morgan Kaufmann, 2000.Google Scholar
  5. 5.
    Huck, G., et al., Jedi: extracting and synthesizing information form the web, Proc of 3rd IFCIS-CoopIS, 1998.Google Scholar
  6. 6.
    Adelberg, B.: NoDoSe: A tool for semi-automatically extracting structured and semistructured data from text documents, Proc. of ACM SIGMOD, 1998.Google Scholar
  7. 7.
    Hammer, J., et al.: Extracting semistructured information from the web, Workshop on Management of semistr. data, 1997.Google Scholar
  8. 8.
    Smith, D., Lopez, M.: Information Extraction for semi-structured documents, Proc. of Workshop on management of Semistructured data, 1997.Google Scholar
  9. 9.
    Vijjappu, L., et al., Web structure analysis for information miningGoogle Scholar
  10. 10.
    Longheu, A., Carchiolo, V., Malgeri, M.: Structuring the web, Proc. of DEXA-Takma —London, 2000Google Scholar
  11. 11.
    Longheu, A., Carchiolo, V., Malgeri, M.: Extracting logical schema from the web, Applied Intelligence, Special issue on text and web mining, Kluwer Academic.Google Scholar
  12. 12.
    Baeza-Yates, R. et al.: Modern Information Retrievial, ACM Press, 1999Google Scholar
  13. 13.
    Parisi, C., Longheu, A.: Ristrutturazione dei siti web: un modello semantico per l’accesso alle informazioni, Tech Internal Report No. DIIT00/Ah74, 2000Google Scholar
  14. 14.
    Suciu, D.: On database theory and XML,
  15. 15.
    Heflin, J.: Towards the semantic web: knowledge representation in a dynamic, distributed environment, Phd thesis, University of Maryland, College Park. 2001 Google Scholar
  16. 16.
    Bry, F., et al.: Towards grouping constructs for semistructured data, technical report PMS-FB-2001-7, Computer Science inst., Munich, GermanyGoogle Scholar
  17. 17.
    Heflin, J., et al: Dynamic ontologies on the web, Proc of the Seventeenth National Conference on Artificial Intelligence-AAAI-2000, 2000Google Scholar
  18. 18.
    RDF Recommendation —
  19. 19.
    Decker, S., et al.: The semantic web — on the respective roles of XML and RDF, IEEE Internet Computing, 2000Google Scholar
  20. 20.
    Mani, M., et al.: Semantic data modeling using XML schemas, Proc. 20th Intl Conf. on Conceptual Modeling (ER), 2001.Google Scholar
  21. 21.
    Davulcu, H., et al.: A layered architecture for querying dynamic web content, Proc. of ACM Conference on Management of Data (SIGMOD), 1999.Google Scholar
  22. 22.
    Lawrence, S.: Context in web search, IEEE Data engineering bulletin, Vol. 23, no. 3, 2000Google Scholar
  23. 23.
    Suciu, D. et al.: Focusing search in hierarchical structures with directory sets,
  24. 24.
    Fiebig, T. et al.: Evaluating queries on structure with extended access support relations, Proc. of 3rd International Workshop on Web and Databases-WebDB, 2000Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Vincenza Carchiolo
    • 1
  • Alessandro Longheu
    • 1
  • Michele Malgeri
    • 1
  1. 1.Dipartimento di Ingegneria Informatica. e delle TelecomunicazioniFacoltà di Ingegneria — Università di CataniaCataniaItaly

Personalised recommendations