An improvement of the parameterized frequent directions algorithm

Francis, Deena P.; Raimond, Kumudha

doi:10.1007/s10618-017-0542-x

An improvement of the parameterized frequent directions algorithm

Published: 16 September 2017

Volume 32, pages 453–482, (2018)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

460 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Matrix sketching is a technique used to create summaries of large matrices. Frequent directions (FD) and its parameterized variant, \(\alpha \)-FD are deterministic sketching techniques that have theoretical guarantees and also work well in practice. An algorithm called the iterative singular value decomposition (iSVD) has been shown to have better performance than FD and \(\alpha \)-FD in several datasets, despite the lack of theoretical guarantees. However, in datasets with major and sudden drift, iSVD performs poorly when compared to the other algorithms. The \(\alpha \)-FD algorithm has better error guarantees and empirical performance when compared to FD. However, it has two limitations: the restriction on the effective values of its parameter \(\alpha \) due to its dependence on sketch size and its constant factor reduction from selected squared singular values, both of which result in reduced empirical performance. In this paper, we present a modified parameterized FD algorithm, \(\beta \)-FD in order to overcome the limitations of \(\alpha \)-FD, while maintaining similar error guarantees to that of \(\alpha \)-FD. Empirical results on datasets with sudden and major drift and those with gradual and minor or no drift indicate that there is a trade-off between the errors in both kinds of data for different parameter values, and for \(\beta \approx 28\), our algorithm has overall better error performance than \(\alpha \)-FD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Randomized Generalized Singular Value Decomposition

Article 07 March 2020

Practical Sketching Algorithms for Low-Rank Tucker Approximation of Large Tensors

Article 29 March 2023

A fast randomized algorithm for computing an approximate null space

Article Open access 25 May 2023

References

Achlioptas D, McSherry F (2007) Fast computation of low-rank matrix approximations. J. ACM (JACM) 54(2):9
Article MathSciNet MATH Google Scholar
Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A public domain dataset for human activity recognition using smartphones. In: ESANN
Boutsidis C, Mahoney MW, Drineas P (2009) An improved approximation algorithm for the column subset selection problem. In: Proceedings of the twentieth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 968–977
Brand M (2002) Incremental singular value decomposition of uncertain data with missing values. In: European conference on computer vision. Springer, Berlin, pp 707–720
Buss S (2016) Connectus data set Florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/Buss/connectus.html
Clarkson KL, Woodruff DP (2013) Low rank approximation and regression in input sparsity time. In: Proceedings of the forty-fifth annual ACM symposium on theory of computing. ACM, New York, pp 81–90
Cuturi M (2011) Fast global alignment kernels. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 929–936
Desai A, Ghashami M, Phillips JM (2016) Improved practical matrix sketching with guarantees. IEEE Trans Knowl Data Eng 28(7):1678–1690
Article MATH Google Scholar
Drineas P, Kannan R, Mahoney MW (2006) Fast monte carlo algorithms for matrices II: computing a low-rank approximation to a matrix. SIAM J Comput 36(1):158–183
Article MathSciNet MATH Google Scholar
Ghashami M, Phillips JM (2014) Relative errors for deterministic low-rank matrix approximations. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 707–717
Ghashami M, Desai A, Phillips JM (2014) Improved practical matrix sketching with guarantees. In: European symposium on algorithms. Springer, Berlin, pp 467–479
Ghashami M, Liberty E, Phillips JM, Woodruff DP (2016) Frequent directions: simple and deterministic matrix sketching. SIAM J Comput 45(5):1762–1792
Article MathSciNet MATH Google Scholar
Hall PM, Marshall AD, Martin RR (1998) Incremental eigenanalysis for classification. In: BMVC, vol 98. Citeseer, pp 286–295
Har-Peled S (2014) Low rank matrix approximation in linear time. arXiv preprint arXiv:1410.8802
Hoens TR, Chawla NV, Polikar R (2011) Heuristic updatable weighted random subspaces for non-stationary environments. In: 2011 IEEE 11th international conference on data mining (ICDM). IEEE, Washington, pp 241–250
Katakis I, Tsoumakas G, Vlahavas IP (2008) An ensemble of classifiers for coping with recurring contexts in data streams. In: ECAI, pp 763–764
Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391
Article Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet MATH Google Scholar
Lecun Y, Cortes C (2009) The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
Levey A, Lindenbaum M (2000) Sequential Karhunen–Loeve basis extraction and its application to images. IEEE Trans Image Process 9(8):1371–1374
Article MATH Google Scholar
Liberty E (2013) Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, London, pp 581–588
Mahoney MW (2011) Randomized algorithms for matrices and data. Found Trends Mach Learn 3(2):123–224
MATH Google Scholar
Nelson J, Nguyên HL (2013) Osnap: faster numerical linear algebra algorithms via sparser subspace embeddings. In: 2013 IEEE 54th annual symposium on Foundations of Computer Science (FOCS). IEEE, Washington, pp 117–126
Sarlós T (2006) Improved approximation algorithms for large matrices via random projections. In: 2006 47th annual IEEE symposium on foundations of computer science (FOCS’06). IEEE, Washington, pp 143–152
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354
Google Scholar
Tsymbal A (2004) The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin, Dublin 106(2)
Google Scholar
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200-2011 dataset. Tech. rep, California Institute of Technology
Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 4(30):964–994
Article MathSciNet Google Scholar
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
Google Scholar
Woodruff DP et al (2014) Sketching as a tool for numerical linear algebra. Found Trends Theor Comput Sci 10(1–2):1–157
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the financial support offered by the Visvesvaraya Ph.D. Scheme for Electronics and Information Technology, Ministry of Electronics and Information Technology (MeitY), Govt. of India.

Author information

Authors and Affiliations

Department of Computer Sciences Technology, Karunya University, Coimbatore, India
Deena P. Francis & Kumudha Raimond

Authors

Deena P. Francis
View author publications
You can also search for this author in PubMed Google Scholar
Kumudha Raimond
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deena P. Francis.

Additional information

Responsible editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Francis, D.P., Raimond, K. An improvement of the parameterized frequent directions algorithm. Data Min Knowl Disc 32, 453–482 (2018). https://doi.org/10.1007/s10618-017-0542-x

Download citation

Received: 17 August 2016
Accepted: 31 August 2017
Published: 16 September 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10618-017-0542-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improvement of the parameterized frequent directions algorithm

Abstract

Access this article

Similar content being viewed by others

Randomized Generalized Singular Value Decomposition

Practical Sketching Algorithms for Low-Rank Tucker Approximation of Large Tensors

A fast randomized algorithm for computing an approximate null space

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improvement of the parameterized frequent directions algorithm

Abstract

Access this article

Similar content being viewed by others

Randomized Generalized Singular Value Decomposition

Practical Sketching Algorithms for Low-Rank Tucker Approximation of Large Tensors

A fast randomized algorithm for computing an approximate null space

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation