Advertisement

AUSMS: An Environment for Frequent Sub-structures Extraction in a Semi-structured Object Collection

  • Pierre-Alain Laur
  • Maguelonne Teisseire
  • Pascal Poncelet
Conference paper
  • 486 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2736)

Abstract

Mining knowledge from structured data has been extensively addressed in the few past years. However, most proposed approaches are interested in flat structures. With the growing popularity of the Web, the number of semi-structured documents available is rapidly increasing. Structure of these objects is irregular and it is judicious to assume that a query on documents structure is almost as important as a query on data. Moreover, manipulated data is not static since it is constantly being updated. The problem of maintaining such sub-structures then becomes as much of a priority as researching them because, every time data is updated, found sub-structures could become invalid. In this paper we propose a system, called A.U.S.M.S. (Automatic Update Schema Mining System), which enables us to retrieve data, identify frequent sub-structures and keep up-to-date extracted knowledge after sources evolutions.

Keywords

Sequential Pattern Mining Sequential Pattern Semistructured Data Database Increment Graph Interchange Format 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of SIGMOD 1993, pp. 20–76 (May 1993)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proceedings of International Conference on Data Engineering (ICDE 1995), Tapei, Taiwan, pp. 3–14 (March 1995)Google Scholar
  3. 3.
    Ares, J., Gehrke, J., Yiu, T., Flannick, J.: Sequential Pattern Using Bitmap Representation. In: Proceedings of PKDD 2002, Edmonton, Canada (July 2002)Google Scholar
  4. 4.
    Asai, T., Abe, K., et al.: Efficient substructure discovery from Large Semi-structured Data. In: Proceedings of the (ICDM 2002) Conference, Washington DC, USA (April 2002)Google Scholar
  5. 5.
    Chawathe, S., Abiteboul, S., Widom, J.: Representing and Querying Changes History in Semistructured Data. In: Proceedings of ICDE 1998, Orlando, USA (February 1998)Google Scholar
  6. 6.
    Herman, I., Marshall, M.S.: GraphXML An XML based graph interchange format, Centre for Mathematics and Computer Sciences (CWI), Technical Report Amsterdam (2000)Google Scholar
  7. 7.
    Laur, P.A., Masseglia, F., Poncelet, P.: A General Architecture for Finding Structural Regularities on the Web. In: Cerri, S.A., Dochev, D. (eds.) AIMSA 2000. LNCS (LNAI), vol. 1904, pp. 179–188. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  8. 8.
    Laur, P.A., Poncelet, P.: AUSMS: un environement pour l’extraction de sous-structures fréquentes dans une collection d’objets semi-structurées (in french). Actes des Journées d’Extraction et Gestion des Connaissances (EGC 2003), Lyon, France (2003)Google Scholar
  9. 9.
    Masseglia, F., Poncelet, P., Teisseire, M.: Incremental Mining of Sequential Patterns in Large Database. Actes des Journées BDA 2000, Blois, France (October 2000)Google Scholar
  10. 10.
    Mannila, H., Toivonen, H.: On an Algorithm for Finding all Interesting Sequences. In: Proceedings of the 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria (April 1996)Google Scholar
  11. 11.
    Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 47–52. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Parthasarathy, S., Zaki, M.J.: Incremental and Interactive Sequence Mining. In: Proceedings of the CIKM 1999 Conference, Kansas City, USA, pp. 251–258 (November 1999)Google Scholar
  13. 13.
    Wang, K., Liu, H.: Schema Discovery for Semi-structured Data. In: Proceedings of the KDD 1997 Conference, Newport Beach, USA, pp. 271–274 (August 1997)Google Scholar
  14. 14.
    Wang, K., Liu, H.: Discovering Structural Association of Semistructured Data. IEEE Transactions on Knowledge and Data Engineering, 353–371 (January 1999)Google Scholar
  15. 15.
    Zaki, M.: Efficiently Mining Frequent Trees in a Forest. In: Proceedings of SIGKDD 2002, Edmonton, Canada (July 2002)Google Scholar
  16. 16.
    Zheng, Q., Xu, K., Ma, S., Lu, W.: The Algorithms of Updating Sequential Patterns. In: Proceedings of the International Conference on Data Mining, ICDM 2002 (April 2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Pierre-Alain Laur
    • 1
  • Maguelonne Teisseire
    • 1
  • Pascal Poncelet
    • 2
  1. 1.LIRMMMontpellierFrance
  2. 2.EMA/LGI2PEcole des Mines d’Alès Site EERIENîmesFrance

Personalised recommendations