Advertisement

Journal of Computer Science and Technology

, Volume 15, Issue 3, pp 241–248 | Cite as

Incremental mining of the schema of semistructured data

  • Zhou Aoying 
  • Jin Wen 
  • Zhou Shuigeng 
  • Qian Weining 
  • Tian Zenping 
Article

Abstract

Semistructured data are specified in lack of any fixed and rigid schema, even though typically some implicit structure appears in the data. The huge amounts of on-line applications make it important and imperative to mine the schema of semistructured data, both for the users (e.g., to gather useful information and facilitate querying) and for the systems (e.g., to optimize access). The critical problem is to discover the hidden structure in the semistructured data. Current methods in extracting Web data structure are either in a general way independent of application background, or bound in some concrete environment such as HTML, XML etc. But both face the burden of expensive cost and difficulty in keeping along with the frequent and complicated variances of Web data. In this paper, the problem of incremental mining of schema for semistructured data after the update of the raw data is discussed. An algorithm for incrementally mining the schema of semistructured data is provided, and some experimental results are, also given, which show that incremental mining for semistructured data is more efficient than non-incremental mining.

Keywords

data mining incremental mining semistructured data schema algorithm 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Fayyad U M, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. Advances inKnowledge Discovery and Data Mining. AAAI/MIT Press, 1996.Google Scholar
  2. [2]
    Chen M S, Han J H, Yu P S. Data mining: An overview from a database perspective.IEEE Trans. KDE, Dec. 1996, 8(6): 866–883.Google Scholar
  3. [3]
    Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. InProc. the ACM SIGMOD Conference on Management of Data. Washington, D. C., May 1993.Google Scholar
  4. [4]
    Agrawal R, Srikant R. Fast Algorithms for mining association rules. InProc. the 20th Int. Conference on Very Large Databases, Santiago, Chile, Sept. 1994.Google Scholar
  5. [5]
    Srikant R, Agrawal R. Mining generalized association rules. InProc. the 21st Int. Conference on Very Large Databases, Zurich, Switzerland, Sept. 1995.Google Scholar
  6. [6]
    Fu Y, Han J. Meta-rule-guided mining of association rules in relational databases. InProc. 1st Int. Workshop on Integration of Knowledge Discovery with Deductive and Object-Oriented Databases (KDOOD’95), Singapore, Dec. 1995, pp.39–46.Google Scholar
  7. [7]
    Koperski K, Han J. Discovery of spatial association rules in geographic information databases. InAdvances in Spatial Databases, Proceedings of 4th Symposium, SSD’95, (Aug.6–9, Portiand, Maine). Springer-Verlag, Berlin. 1995, pp.47–66.Google Scholar
  8. [8]
    Nestorov S, Abiteboul S, Motwani R. Inferring structure in semistructured data. (http://www.cs.stanford.edu/~rajeev)Google Scholar
  9. [9]
    Wang K, Liu H Q. Schema discovery for semistructured data. InProc. KDD’97.Google Scholar
  10. [10]
    Arocena G O, Mendelzon A O. WebOQL: Restructuring documents, databases and Webs. InProc. ICDE, Orlando, Florida, USA, February 1998.Google Scholar
  11. [11]
    Lakshmanan L, Sadri F, Subramanian I. A declarative language for querying and restructuring the Web. InProc. 6th Int. Workshop on Research Issues in Data Engineering, New Orleans, 1996.Google Scholar
  12. [12]
    Mendelzon A O, Mihaila G, Milo T. Querying the World Wide Web. InProc. PDIS’96, Miami, December 1996.Google Scholar
  13. [13]
    Papakonstantinow Y, Garcia-Marlia H, Widom J. Object exchange, across heterogeneous information sources. InProc. ICDE, Taiwan, march 1995, pp.251–260,Google Scholar
  14. [14]
    Cheung D W, Han J, Wong C Y. Maintenance of discovered association rules in large databases: An incremental updating technique. InProc. ICDE, New Orleans, LA., Feb. 1996.Google Scholar

Copyright information

© Science Press, Beijing China and Allerton Press Inc. 2000

Authors and Affiliations

  • Zhou Aoying 
    • 1
  • Jin Wen 
    • 1
  • Zhou Shuigeng 
    • 1
  • Qian Weining 
    • 1
  • Tian Zenping 
    • 1
  1. 1.Department of Computer ScienceFudan UniversityShanghaiP.R. China

Personalised recommendations