Advertisement

Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition

  • Ravdeep PasrichaEmail author
  • Ekta Gujral
  • Evangelos E. Papalexakis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)

Abstract

Tensor decompositions are used in various data mining applications from social network to medical applications and are extremely useful in discovering latent structures or concepts in the data. Many real-world applications are dynamic in nature and so are their data. To deal with this dynamic nature of data, there exist a variety of online tensor decomposition algorithms. A central assumption in all those algorithms is that the number of latent concepts remains fixed throughout the entire stream. However, this need not be the case. Every incoming batch in the stream may have a different number of latent concepts, and the difference in latent concepts from one tensor batch to another can provide insights into how our findings in a particular application behave and deviate over time. In this paper, we define “concept” and “concept drift” in the context of streaming tensor decomposition, as the manifestation of the variability of latent concepts throughout the stream. Furthermore, we introduce SeekAndDestroy (The method name is after (and a tribute to) Metallica’s song from their first album (who also owns the copyright for the name)), an algorithm that detects concept drift in streaming tensor decomposition and is able to produce results robust to that drift. To the best of our knowledge, this is the first work that investigates concept drift in streaming tensor decomposition. We extensively evaluate SeekAndDestroy on synthetic datasets, which exhibit a wide variety of realistic drift. Our experiments demonstrate the effectiveness of SeekAndDestroy, both in the detection of concept drift and in the alleviation of its effects, producing results with similar quality to decomposing the entire tensor in one shot. Additionally, in real datasets, SeekAndDestroy outperforms other streaming baselines, while discovering novel useful components. Code related to this paper is available at: https://github.com/ravdeep003/conceptDrift.

Keywords

Tensor analysis Streaming Concept drift Unsupervised learning 

Notes

Acknowledgements

Research was supported by the Department of the Navy, Naval Engineering Education Consortium under award no. N00174-17-1-0005, the National Science Foundation EAGER Grant no. 1746031, and by an Adobe Data Science Research Faculty Award. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding parties.

References

  1. 1.
    Bader, B., Harshman, R., Kolda, T.: Analysis of latent relationships in semantic graphs using DEDICOM. In: Workshop for Algorithms on Modern Massive Data Sets (2006)Google Scholar
  2. 2.
    Bader, B.W., Kolda, T.G., et al.: MATLAB Tensor Toolbox Version 2.6, February 2015. http://www.sandia.gov/~tgkolda/TensorToolbox/
  3. 3.
    Bifet, A., Gama, J., Pechenizkiy, M., Zliobaite, I.: Handling concept drift: importance, challenges and solutions. PAKDD-2011 Tutorial, Shenzhen, China (2011)Google Scholar
  4. 4.
    Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)CrossRefGoogle Scholar
  5. 5.
    Papalexakis, E.E., Faloutsos, C., Sidiropoulos, N.D.: ParCube: sparse parallelizable tensor decompositions. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS, vol. 7523, pp. 521–536. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33460-3_39CrossRefGoogle Scholar
  6. 6.
    Gujral, E., Pasricha, R., Papalexakis, E.E.: SamBaTen: sampling-based batch incremental tensor decomposition. arXiv preprint arXiv:1709.00668 (2017)
  7. 7.
    Harshman, R.: Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multimodal factor analysis (1970)Google Scholar
  8. 8.
    Håstad, J.: Tensor rank is NP-complete. J. Algorithms 11(4), 644–654 (1990)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Mørup, M., Hansen, L.K.: Automatic relevance determination for multi-way models. J. Chemom. 23(7–8), 352–363 (2009)CrossRefGoogle Scholar
  11. 11.
    Nion, D., Sidiropoulos, N.: Adaptive algorithms to track the PARAFAC decomposition of a third-order tensor. Signal Process. 57(6), 2299–2310 (2009)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Papalexakis, E.E.: Automatic unsupervised tensor mining with quality assessment. In: Proceedings of the 2016 SIAM International Conference on Data Mining, pp. 711–719. SIAM (2016)Google Scholar
  13. 13.
    Papalexakis, E.E., Faloutsos, C., Sidiropoulos, N.D.: Tensors for data mining and data fusion: models, applications, and scalable algorithms. ACM Trans. Intell. Syst. Technol. (TIST) 8(2), 16 (2017)Google Scholar
  14. 14.
    Webb, G.I., Hyde, R., Cao, H., Nguyen, H.L., Petitjean, F.: Characterizing concept drift. Data Min. Knowl. Disc. 30(4), 964–994 (2016)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Webb, G.I., Lee, L.K., Petitjean, F., Goethals, B.: Understanding concept drift. CoRR abs/1704.00362 (2017). http://arxiv.org/abs/1704.00362
  16. 16.
    Zhou, S., Vinh, N.X., Bailey, J., Jia, Y., Davidson, I.: Accelerating online CP decompositions for higher order tensors. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1375–1384. ACM (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Ravdeep Pasricha
    • 1
    Email author
  • Ekta Gujral
    • 1
  • Evangelos E. Papalexakis
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of California RiversideRiversideUSA

Personalised recommendations