Advertisement

Adaptive XML Tree Classification on Evolving Data Streams

  • Albert Bifet
  • Ricard Gavaldà
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5781)

Abstract

We propose a new method to classify patterns, using closed and maximal frequent patterns as features. Generally, classification requires a previous mapping from the patterns to classify to vectors of features, and frequent patterns have been used as features in the past. Closed patterns maintain the same information as frequent patterns using less space and maximal patterns maintain approximate information. We use them to reduce the number of classification features. We present a new framework for XML tree stream classification. For the first component of our classification framework, we use closed tree mining algorithms for evolving data streams. For the second component, we use state of the art classification methods for data streams. To the best of our knowledge this is the first work on tree classification in streaming data varying with time. We give a first experimental evaluation of the proposed classification method.

Keywords

Frequent Pattern Concept Drift Frequent Tree Closed Pattern Maximal Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Arimura, H., Uno, T.: An output-polynomial time algorithm for mining frequent closed attribute trees. In: ILP, pp. 1–19 (2005)Google Scholar
  2. 2.
    Balcázar, J.L., Bifet, A., Lozano, A.: Mining implications from lattices of closed trees. In: Extraction et gestion des connaissances (EGC 2008), pp. 373–384 (2008)Google Scholar
  3. 3.
    Balcázar, J.L., Bifet, A., Lozano, A.: Mining frequent closed rooted trees. Accepted for publication in Machine Learning Journal (2009)Google Scholar
  4. 4.
    Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining (2007)Google Scholar
  5. 5.
    Bifet, A., Gavaldà, R.: Mining adaptively frequent closed unlabeled rooted trees in data streams. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008)Google Scholar
  6. 6.
    Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York (2009)Google Scholar
  7. 7.
    Chi, Y., Xia, Y., Yang, Y., Muntz, R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. Fundamenta Informaticae XXI, 1001–1038 (2001)Google Scholar
  8. 8.
    Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: ACL 2001, pp. 263–270 (2001)Google Scholar
  9. 9.
    Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. SIAM Journal on Computing 14(1), 27–45 (2002)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Garriga, G.C., Kralj, P., Lavrač, N.: Closed sets for labeled data. J. Mach. Learn. Res. 9, 559–580 (2008)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis (2007), http://sourceforge.net/projects/moa-datastream
  12. 12.
    Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: ICML, pp. 291–298 (2002)Google Scholar
  13. 13.
    Kudo, T., Maeda, E., Matsumoto, Y.: An application of boosting to graph classification. In: NIPS (2004)Google Scholar
  14. 14.
    Kudo, T., Matsumoto, Y.: A boosting algorithm for classification of semi-structured text. In: EMNLP, pp. 301–308 (2004)Google Scholar
  15. 15.
    Li, J., Li, H., Wong, L., Pei, J., Dong, G.: Minimum description length principle: Generators are preferable to closed patterns. In: AAAI (2006)Google Scholar
  16. 16.
    Punin, J., Krishnamoorthy, M., Zaki, M.: LOGML: Log markup language for web usage mining. In: WEBKDD Workshop, with SIGKDD (2001)Google Scholar
  17. 17.
    Song, G.-j., Yang, D.-q., Cui, B., Zheng, B., Liu, Y., Xie, K.-Q.: CLAIM: An efficient method for relaxed frequent closed itemsets mining over stream data. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 664–675. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  18. 18.
    Termier, A., Rousset, M.-C., Sebag, M., Ohara, K., Washio, T., Motoda, H.: DryadeParent, an efficient and robust closed attribute tree mining algorithm. IEEE Trans. Knowl. Data Eng. 20(3), 300–320 (2008)CrossRefGoogle Scholar
  19. 19.
    Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: KDD 2003, pp. 286–295. ACM Press, New York (2003)Google Scholar
  20. 20.
    Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD 2002 (2002)Google Scholar
  21. 21.
    Zaki, M.J., Aggarwal, C.C.: XRules: an effective structural classifier for xml data. In: KDD 2003, pp. 316–325. ACM Press, New York (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Albert Bifet
    • 1
  • Ricard Gavaldà
    • 1
  1. 1.Universitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations