Adaptive XML Stream Classification Using Partial Tree-Edit Distance

  • Dariusz Brzezinski
  • Maciej Piernik
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8502)


XML classification finds many applications, ranging from data integration to e-commerce. However, existing classification algorithms are designed for static XML collections, while modern information systems frequently deal with streaming data that needs to be processed on-line using limited resources. Furthermore, data stream classifiers have to be able to react to concept drifts, i.e., changes of the streams underlying data distribution. In this paper, we propose XStreamClass, an XML classifier capable of processing streams of documents and reacting to concept drifts. The algorithm combines incremental frequent tree mining with partial tree-edit distance and associative classification. XStreamClass was experimentally compared with four state-of-the-art data stream ensembles and provided best average classification accuracy on real and synthetic datasets simulating different drift scenarios.


XML data stream classification concept drift 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zaki, M.J., Aggarwal, C.C.: Xrules: An effective algorithm for structural classification of xml data. Machine Learning 62(1-2), 137–170 (2006)CrossRefGoogle Scholar
  2. 2.
    Costa, G., et al.: X-class: Associative classification of xml documents by structure. ACM Trans. Inf. Syst. 31(1), 1–3 (2013)CrossRefGoogle Scholar
  3. 3.
    Brzezinski, D., et al.: XCleaner: A new method for clustering XML documents by structure. Control and Cybernetics 40(3), 877–891 (2011)Google Scholar
  4. 4.
    Mayorga, V., Polyzotis, N.: Sketch-based summarization of ordered XML streams. In: Ioannidis, Y.E., Lee, D.L., Ng, R.T. (eds.) ICDE, pp. 541–552. IEEE (2009)Google Scholar
  5. 5.
    Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall (2010)Google Scholar
  6. 6.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. 6th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., pp. 71–80 (2000)Google Scholar
  7. 7.
    Oza, N.C., Russell, S.J.: Experimental comparisons of online and batch versions of bagging and boosting. In: Proc. 7th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., pp. 359–364 (2001)Google Scholar
  8. 8.
    Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Trans. on Neural Netw. Learn. Syst. 25(1), 81–94 (2014)CrossRefGoogle Scholar
  9. 9.
    Bifet, A., Gavaldà, R.: Adaptive xml tree classification on evolving data streams. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 147–162. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Wang, H., et al.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. 9th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., pp. 226–235 (2003)Google Scholar
  11. 11.
    Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)CrossRefGoogle Scholar
  12. 12.
    Piernik, M., Morzy, T.: Partial tree-edit distance. Technical Report RA-10/2013, Poznan University of Technology (2013),
  13. 13.
    Valiente, G.: Constrained tree inclusion. J. Discrete Alg. 3(2-4), 431–447 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Pawlik, M., Augsten, N.: RTED: A robust algorithm for the tree edit distance. PVLDB 5(4), 334–345 (2011)Google Scholar
  15. 15.
    Bifet, A., et al.: MOA: Massive Online Analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)Google Scholar
  16. 16.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Machine Learning Research 7, 1–30 (2006)zbMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Dariusz Brzezinski
    • 1
  • Maciej Piernik
    • 1
  1. 1.Institute of Computing SciencePoznan University of TechnologyPoznanPoland

Personalised recommendations