Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions

Kim, Mijung; Candan, K. Selçuk

doi:10.1007/s10618-015-0401-6

Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions

Published: 28 January 2015

Volume 30, pages 1–46, (2016)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Mijung Kim¹ &
K. Selçuk Candan¹

737 Accesses
7 Citations
Explore all metrics

Abstract

For many multi-dimensional data applications, tensor operations as well as relational operations both need to be supported throughout the data lifecycle. Tensor based representations (including two widely used tensor decompositions, CP and Tucker decompositions) are proven to be effective in multi-aspect data analysis and tensor decomposition is an important tool for capturing high-order structures in multi-dimensional data. Although tensor decomposition is shown to be effective for multi-dimensional data analysis, the cost of tensor decomposition is often very high. Since the number of modes of the tensor data is one of the main factors contributing to the costs of the tensor operations, in this paper, we focus on reducing the modality of the input tensors to tackle the computational cost of the tensor decomposition process. We propose a novel decomposition-by-normalization scheme that first normalizes the given relation into smaller tensors based on the functional dependencies of the relation, decomposes these smaller tensors, and then recombines the sub-results to obtain the overall decomposition. The decomposition and recombination steps of the decomposition-by-normalization scheme fit naturally in settings with multiple cores. This leads to a highly efficient, effective, and parallelized decomposition-by-normalization algorithm for both dense and sparse tensors for CP and Tucker decompositions. Experimental results confirm the efficiency and effectiveness of the proposed decomposition-by-normalization scheme compared to the conventional nonnegative CP decomposition and Tucker decomposition approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Canonical polyadic decomposition (CPD) of big tensors with low multilinear rank

Article 23 April 2020

Computing Dense Tensor Decompositions with Optimal Dimension Trees

Article 22 October 2018

Large-scale tucker Tensor factorization for sparse and accurate decomposition

Article 27 May 2022

Notes

Note that the cost increases linearly in the size of the input relation (Huhtala et al. 1999).
Table 10 Different attribute sets, join attributes (\(X\)), supports of \(X\) (the lowest of all the supports of \(X \rightarrow *\)), and execution times for FDs discovery for D1-D18 where \(A_n\) is the \(n\)th attribute of each data set
Full size table
Note that the fit is low in this experiment due to the extremely tight target rank (10) used for the decomposition for a very high dimensional tensor. The fit obtained by the conventional tensor decomposition technique, NNCP-CP-ALS, on the same tensor with the same rank is also similarly low, 0.0028.

References

Allen GI (2012) Sparse higher-order principal components analysis. In: Proceedings of the 15th international conference on artificial intelligence and statistics (AISTATS)
Andersson CA, Bro R (2000) The n-way toolbox for matlab. Chemom Intell Lab Syst 52(1):1–4. http://www.models.life.ku.dk/source/nwaytoolbox/
Antikainen J, Havel J, Josth JR, Herout A, Zemcik P, Hauta-Kasari M (2011) Nonnegative tensor factorization accelerated using gpgpu. IEEE Trans Parallel Distrib Syst 22(7):1135–1141
Article Google Scholar
Bader BW, Kolda TG (2006) Efficient matlab computations with sparse and factored tensors. Technical Report SAND2006-7592, Sandia National Laboratories
Bader BW, Kolda TG (2007) Matlab tensor toolbox version 2.2. http://csmr.ca.sandia.gov/tgkolda/TensorToolbox/
Carroll J, Chang JJ (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition. Psychometrika 35:283–319
Article MATH Google Scholar
Chu W, Ghahramani Z (2009) Probabilistic models for incomplete multi-dimensional arrays. In: Proceedings of the 12th international conference on artificial intelligence and statistics
Elmasri R, Navathe SB (1994) Fundamentals of database systems, 2nd edn. Benjamin-Cummings, Redwood City
MATH Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman, New York
MATH Google Scholar
Harshman RA (1970) Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics 16(1):84
Google Scholar
Hoff PD (2011) Hierarchical multilinear models for multiway data. Comput Stat Data Anal 55(1):530–543. doi:10.1016/j.csda.2010.05.020
Article MATH MathSciNet Google Scholar
Huhtala Y, Kärkkäinen J, Porkka P, Toivonen H (1999) Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2):100–111
Article MATH Google Scholar
Ilyas IF, Markl V, Haas PJ, Brown P, Aboulnaga (2004) A Cords: automatic discovery of correlations and soft functional dependencies. In: SIGMOD conference, pp. 647–658
Karmarker N, Karp RM (1983) The differencing method of set partitioning. Technical report, Berkeley
Kim M, Candan KS (2011) Approximate tensor decomposition within a tensor-relational algebraic framework. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp. 1737–1742 doi:10.1145/2063576.2063827
Kolda T, Sun J (2008) Scalable tensor decompositions for multi-aspect data mining. In: Proceedings of the 8th IEEE international conference on data mining, pp. 363–372. doi:10.1109/ICDM.2008.89
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500. doi:10.1137/07070111X
Article MATH MathSciNet Google Scholar
Kolda TG, Bader BW, Kenny JP (2005) Higher-order web link analysis using multilinear algebra. In: Proceedings of the 5th IEEE international conference on data mining, pp. 242–249. doi:10.1109/ICDM.2005.77
Kruskal JB (1977) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebr Appl 18(2):95–138
Article MATH MathSciNet Google Scholar
Lopes S, Petit JM, Lakhal L (2000) Efficient discovery of functional dependencies and armstrong relations. In: Proceedings of the 7th international conference on extending database technology: advances in database technology, EDBT ’00. Springer, London, pp. 350–364
Mahoney MW, Maggioni M, Drineas P (2006) Tensor-cur decompositions for tensor-based data. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 327–336. doi:10.1145/1150402.1150440
Mangasarian OL, Wolberg WH (1990) Cancer diagnosis via linear programming. SIAM News 23(5):1–18
Google Scholar
Mannila H, Räihä KJ (1992) On the complexity of inferring functional dependencies. Discret Appl Math 40(2):237–243. doi:10.1016/0166-218X(92)90031-5
Article MATH Google Scholar
Movielens dataset from grouplens research group (2013). http://www.grouplens.org
Phan AH, Cichocki A (2011) Parafac algorithms for large-scale problems. Neurocomputing 74(11):1970–1984. doi:10.1016/j.neucom.2010.06.030
Article Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Article Google Scholar
Ruggles S, Sobek M (1997) Integrated public use microdata series: Version 2.0 minneapolis: historical census projects http://www.ipums.umn.edu/
Sanchez E, Kowalski BR (1986) Generalized rank annihilation factor analysis. Anal Chem 58(2):496–499. doi:10.1021/ac00293a054
Article Google Scholar
Sanchez E, Kowalski BR (1990) Tensorial resolution: a direct trilinear decomposition. J Chemom 4(1):29–45. doi:10.1002/cem.1180040105
Article Google Scholar
Stoer M, Wagner F (1997) A simple min-cut algorithm. J ACM 44(4):585–591. doi:10.1145/263867.263872
Article MATH MathSciNet Google Scholar
Sun J, Papadimitriou S, Lin CY, Cao N, Liu S, Qian W (2009) Multivis: content-based social network exploration through multi-way visual analysis. In: Proceedings SDM, vol 9. SIAM, pp. 1063–1074
Sun J, Tao D, Papadimitriou S, Yu PS, Faloutsos C (2008) Incremental tensor analysis: theory and applications. ACM Trans Knowl Discov Data 2(3):11:1–11:37. doi:10.1145/1409620.1409621
Article Google Scholar
Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp. 990–998
Tsourakakis CE (2010) Mach: fast randomized tensor decompositions. In: Proceedings of the 10th SIAM International Conference on Data Mining, pp. 689–700
Tucker L (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31(3):279–311. doi:10.1007/BF02289464
Article MathSciNet Google Scholar
Wyss C, Giannella C, Robertson EL (2001) Fastfds: a heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances: extended abstract. In: Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery, DaWaK ’01. Springer, London, pp 101–110
Xu Z, Yan F, Qi A (2012) Infinite tucker decomposition: nonparametric bayesian models for multiway data analysis. In: ICML. icml.cc/Omnipress
Zhang Q, Berry M, Lamb B, Samuel T, Allen G, Nabrzyski J, Seidel E, van Albada G, Dongarra J, Sloot P (2009) A parallel nonnegative tensor factorization algorithm for mining global climate data, vol 5545. Springer, Berlin/Heidelberg, pp. 405–415
Zhou G, He Z, Zhang Y, Zhao Q, Cichocki A (2009) Canonical polyadic decomposition: from 3-way to n-way. In: Eighth international conference on computational intelligence and security (CIS), pp 391–395. doi:10.1109/CIS.2012.94

Download references

Acknowledgments

This work is partially funded by NSF Grants #116394 “RanKloud: Data Partitioning and Resource Allocation Strategies for Scalable Multimedia and Social Media Analysis”.

Author information

Authors and Affiliations

Arizona State University, Tempe, AZ, USA
Mijung Kim & K. Selçuk Candan

Authors

Mijung Kim
View author publications
You can also search for this author in PubMed Google Scholar
K. Selçuk Candan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mijung Kim.

Additional information

Responsible editors: Chih-Jen Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, M., Candan, K.S. Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions. Data Min Knowl Disc 30, 1–46 (2016). https://doi.org/10.1007/s10618-015-0401-6

Download citation

Received: 10 October 2013
Accepted: 04 January 2015
Published: 28 January 2015
Issue Date: January 2016
DOI: https://doi.org/10.1007/s10618-015-0401-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions

Abstract

Access this article

Similar content being viewed by others

Canonical polyadic decomposition (CPD) of big tensors with low multilinear rank

Computing Dense Tensor Decompositions with Optimal Dimension Trees

Large-scale tucker Tensor factorization for sparse and accurate decomposition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions

Abstract

Access this article

Similar content being viewed by others

Canonical polyadic decomposition (CPD) of big tensors with low multilinear rank

Computing Dense Tensor Decompositions with Optimal Dimension Trees

Large-scale tucker Tensor factorization for sparse and accurate decomposition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation