Advertisement

The curse of indecomposable aggregates for big data exploratory analysis with a case for frequent pattern cubes

  • Hamid FadisheiEmail author
  • Azadeh Soltani
Article
  • 14 Downloads

Abstract

Exploratory big data analytics requires the interaction delays to be kept at minimum. Although data cubes help this goal by pre-calculating the measures of interest, some aggregations are not decomposable and require runtime scans through the cube data which will cause the response time to exceed the real-time interaction limits. One of such costly aggregations is the calculation of the frequent patterns over data cube partitions. The existing inefficient merge-and-count approach used for solving this problem is not feasible in the world of big data. In this paper, an efficient approach is proposed for mining frequent patterns from cube data accompanied by a formal overview of decomposable and indecomposable data aggregates. A new concept of semi-decomposable aggregates is introduced that sits in between these two extremes. With the case of frequent pattern mining problem, we show that sometimes indecomposable aggregates are in fact semi-decomposable and exploratory data analysis can still be realized for them. The proposed FPCubes algorithm shows promising experimental results for aggregating frequent patterns which can help exploratory frequent itemset analysis on real-world multidimensional big datasets.

Keywords

Big data Exploratory data analytics Data cube Frequent itemset mining 

Notes

References

  1. 1.
    Acharya S, Gibbons PB, Poosala V, Ramaswamy S (1999) The Aqua approximate query answering system. ACM SIGMOD Rec 28:574–576 (ACM)CrossRefGoogle Scholar
  2. 2.
    Chen Y, Dong G, Han J, Pei J, Wah BW, Wang J (2006) Regression cubes with lossless compression and aggregation. IEEE Trans Knowl Data Eng 18(12):1585–1599CrossRefGoogle Scholar
  3. 3.
    Fadishei H, Soltani A (2019) Frequent pattern cubes. https://github.com/fadishei/fpcubes. Accessed 19 May 2019
  4. 4.
    Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min Knowl Disc 1(1):29–53CrossRefGoogle Scholar
  5. 5.
    Han J (1997) OLAP mining: an integration of OLAP with data mining. In: Proceedings of the 7th IFIP, vol 2. Citeseer, pp 1–9Google Scholar
  6. 6.
    Harinarayan V, Rajaraman A, Ullman JD (1996) Implementing data cubes efficiently. ACM SIGMOD Rec 25:205–216 (ACM)CrossRefGoogle Scholar
  7. 7.
    Instacart Online Grocery Shopping Dataset (2017) https://www.instacart.com/datasets/grocery-shopping-2017
  8. 8.
    Jesus P, Baquero C, Almeida PS (2014) A survey of distributed data aggregation algorithms. IEEE Commun Surv Tutor 17(1):381–404CrossRefGoogle Scholar
  9. 9.
    Jesus P (2012) Robust distributed data aggregation. Ph.D. thesis. University of Minho, Braga, PortugalGoogle Scholar
  10. 10.
    Jordan C (1870) Traite des substitutions et des equations algebriques. Gauthier-Villars, PariszbMATHGoogle Scholar
  11. 11.
    Kamat N, Nandi A (2018) A session-based approach to fast-but-approximate interactive data cube exploration. ACM Trans Knowl Discov Data (TKDD) 12(1):9Google Scholar
  12. 12.
    Kamber M, Han J, Chiang J (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. KDD 97:207Google Scholar
  13. 13.
    Lemire D, Kaser O, Kurz N, Deri L, O’Hara C, Saint-Jacques F, Ssi-Yan-Kai G (2018) Roaring bitmaps: implementation of an optimized software library. Softw Pract Exp 48(4):867–895CrossRefGoogle Scholar
  14. 14.
    Lins L, Klosowski JT, Scheidegger C (2013) Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Trans Vis Comput Graph 19(12):2456–2465CrossRefGoogle Scholar
  15. 15.
    Liu Z, Heer J (2014) The effects of interactive latency on exploratory visual analysis. IEEE Trans Vis Comput Graph 20(12):2122–2131CrossRefGoogle Scholar
  16. 16.
    Liu Z, Jiang B, Heer J (2013) imMens: real-time visual querying of big data. Comput Graph Forum 32:421–430CrossRefGoogle Scholar
  17. 17.
    Messaoud RB, Boussaid O, Rabaseda SL (2006) Mining association rules in OLAP cubes. In: 2006 Innovations in Information Technology. IEEEGoogle Scholar
  18. 18.
    Miranda F, Lins L, Klosowski JT, Silva CT (2018) TopKube: a rank-aware data cube for real-time exploration of spatiotemporal data. IEEE Trans Vis Comput Graph 24(3):1394–1407CrossRefGoogle Scholar
  19. 19.
    Monteiro RS, Zimbrão G, Schwarz H, Mitschang B, de Souza JM (2005) Building the data warehouse of frequent itemsets in the DWFIST approach. In: International Symposium on Methodologies for Intelligent Systems. Springer, pp 294–303Google Scholar
  20. 20.
    Ohmori T, Naruse M, Hoshi M (2007) A new data cube for integrating data mining and OLAP. In: 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE, pp 896–903Google Scholar
  21. 21.
    Pahins CA, Stephens SA, Scheidegger C, Comba JL (2017) Hashedcubes: simple, low memory, real-time visual exploration of big data. IEEE Trans Vis Comput Graph 23(1):671–680CrossRefGoogle Scholar
  22. 22.
    Rahman S, Aliakbarpour M, Kong HK, Blais E, Karahalios K, Parameswaran A, Rubinfield R (2017) I’ve seen enough: incrementally improving visualizations to support rapid decision making. Proc VLDB Endow 10(11):1262–1273CrossRefGoogle Scholar
  23. 23.
    Sapia C (2000) PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: International Conference on Data Warehousing and Knowledge Discovery. Springer, pp 224–233Google Scholar
  24. 24.
    Shrivastava N, Buragohain C, Agrawal D, Suri S (2004) Medians and beyond: new aggregation techniques for sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems. ACM, pp 239–249Google Scholar
  25. 25.
    Singh K, Shakya HK, Biswas B (2015) An efficient approach to discovering frequent patterns from data cube using aggregation and directed graph. In: Proceedings of the Sixth International Conference on Computer and Communication Technology 2015. ACM, pp 31–35Google Scholar
  26. 26.
    Tang X, Wehrmeister R, Shau J, Chakraborty A, Alex D, Al Omari A, Atnafu F, Davis J, Deng L, Jaiswal D, et al (2016) SQL-SA for big data discovery. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, pp 1182–1193Google Scholar
  27. 27.
    Tukey JW (1977) Exploratory data analysis. Addison-Wesley, MassachusettszbMATHGoogle Scholar
  28. 28.
    Wan M, McAuley J (2018) Item recommendation on monotonic behavior chains. In: Proceedings of the 12th ACM Conference on Recommender Systems. ACM, pp 86–94Google Scholar
  29. 29.
    Wang Z, Ferreira N, Wei Y, Bhaskar AS, Scheidegger C (2017) Gaussian cubes: real-time modeling for visual exploration of large multidimensional datasets. IEEE Trans Vis Comput Graph 23(1):681–690CrossRefGoogle Scholar
  30. 30.
    Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: KDD-97 Proceedings. AAAI, pp 283–286Google Scholar
  31. 31.
    Zgraggen E, Galakatos A, Crotty A, Fekete JD, Kraska T (2017) How progressive visualizations affect exploratory analysis. IEEE Trans Vis Comput Graph 23(8):1977–1987CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Computer Engineering DepartmentUniversity of BojnordBojnordIran

Personalised recommendations