Advertisement

Journal of Computer Science and Technology

, Volume 16, Issue 6, pp 560–566 | Cite as

Extracting local schema from semistructured data based on graph-oriented semantic model

  • Wang Tengjiao Email author
  • Tang Shiwei 
  • Yang Dongqing 
  • Liu Yunfeng 
  • Lin Bin 
Correspondence

Abstract

Many modern applications (e-commerce, digital library, etc.) require integrated access to various information sources (from traditional RDBMS to semistructured Web repositories). Extracting schema from semistructured data is a prerequisite to integrate heterogeneous information sources. The traditional method that extracts global schema may require time (and space) to increase exponentially with the number of objects and edges in the source. A new method is presented in this paper, which is about extracting local schema. In this method, the algorithm controls the scale of extracting schema within the “schema diameter” by examining the semantic distance of the target set and using the Hash class and its path distance operation. This method is very efficient for restraining schema from expanding. The prototype validates the new approach.

Keywords

information integration data model semistructured data extracting schema 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Serge Abiteboul. Querying semi-structured data.Lecture Notes in Computer Science 1186, Foto Afrati, Phokion Kolaties (eds.), InProc. International Conference on Database Theory, New York: Springer-Verlag, 1997, pp.1–18.Google Scholar
  2. [2]
    Nestorov S, Abiteboul S, Motwani R. Extracting schema from semistructured data. InProceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, Washington, May, 1998, pp.295–306.Google Scholar
  3. [3]
    Buneman P, Davidson S, Fernandez M, Suciu D. Adding structure to unstructured data. InProceedings of the International Conference on Database Theory, Delphi, Greece, January, 1997, pp.335–350.Google Scholar
  4. [4]
    Goldman R, Widom J. Data Guide: Enabling query formulation and optimization in semistructured database. InProceedings of the Twenty-Third International Conference on Very Large Data Base, Bymatthias Jarke (ed.), Athens, Greece: Morgan Kaufmann, 1997. pp.436–445.Google Scholar
  5. [5]
    Nestorov S, Ullman J, Wiener J, Chawathe S. Representative objects: Concise representations of semistructured, hierarchical data. InProceedings of International Conference on Data Engineering, Birmingham, U.K., April, 1997, pp.79–90.Google Scholar
  6. [6]
    Prasenjit Mitra, Gio Wiederhold, Martin Kersten. A graph-oriented model for articulation of ontology interdependencies. InProceedings of Conference on Extending Database Technology (EDBT 2000), Konstanz, Germany, Mar., 2000, pp.86–100.Google Scholar
  7. [7]
    Papakonstantinou Y, Garcia-Molina H, Widom J. Object exchange across heterogeneous information source. InProceedings of the Eleventh International Conference on Data Engineering, Philip S Yu, Arbeee L P Chen (eds.), Taipei: IEEE Computer Society, 1995, pp.251–260.CrossRefGoogle Scholar

Copyright information

© Science Press, Beijing China and Allerton Press Inc. 2001

Authors and Affiliations

  • Wang Tengjiao 
    • 1
    • 2
    Email author
  • Tang Shiwei 
    • 1
    • 2
  • Yang Dongqing 
    • 1
    • 2
  • Liu Yunfeng 
    • 1
    • 2
  • Lin Bin 
    • 1
    • 2
  1. 1.Department of Computer Science and TechnologyPeking UniversityBeijingP.R. China
  2. 2.National Laboratory on Machine PerceptionPeking UniversityBeijingP.R. China

Personalised recommendations