Schema Discovery of the Semi-structured and Hierarchical Data

  • Jianwen He
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2412)


Web data are typically Semi-structured data and lack explicit external schema information, which makes querying and browsing the web data inefficient. In this paper, we present an approach to discover the inherent schema(s) in semi-structured, hierarchical data sources fast and efficiently, based on OEM model and efficient pruning strategy. The schema discovered by our algorithm is a kind of data path expressions and can be transformed into schema tree easily.


Data Path Pruning Strategy Hierarchical Data Transaction Database Data Schema 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Buneman, P.: Semi-structured data. In Proc. of PODS (1997).Google Scholar
  2. 2.
    Abiteboul, S.: Querying semi-structured data. In: Foto Afrati, Phokion Kolaities ed. Lecture Notes in Computer Science 1186, Database Thery-ICDT’97. New York: Springer-Verlag(1997) 1–18.Google Scholar
  3. 3.
    Wang N., Chen Y., Yu B.Q., Wang N.B.: Versatile: A scaleable CORBA-based system for integrating distributed data. In: Proceedings of the 1997 IEEE International Conference on Intelligent Processing Systems. Beijing: International Academic Publishers (1997) 1589–1593.Google Scholar
  4. 4.
    Chawathe, S., Garcia-Molina, H., Hammer, J., et al: The TSIMMIS project: integration of heterogeneous information sources. In: Proceedings of the 10th Anniversary Meeting of the Information Processing Society of Japan (1994) 7–18.Google Scholar
  5. 5.
    McHugh, J., Abiteboul, S., Goldman R et al.: Lore: a database management system for semi-structured data. ACM SIGMOD(1997) 26(3)54–66.CrossRefGoogle Scholar
  6. 6.
    Nestorov, S., Ullman, J., Wiener, J., et al: Representative objects: concise representations of semi-structured, hierarchical data. ICDE(1997), 79–90.Google Scholar
  7. 7.
    Bayarro, R..: Efficiently mining long patterns from databases. In: Proc. of the 1998 ACMSIGMOD int’l conference on Management of Data(1998) 85–93.Google Scholar
  8. 8.
    Quass, D., et al.: Lore: A lightweight object repository for semi-structured data. In Proceedings of the ACM SIGMOD International Conference on Management of data, page 549, Montreal, Canada, June (1996).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Jianwen He
    • 1
  1. 1.Department of MathematicsInner Mongolia UniversityHuhehotChina

Personalised recommendations