Continuous decaying of telco big data with data postdiction

  • Constantinos CostaEmail author
  • Andreas Konstantinidis
  • Andreas Charalampous
  • Demetrios Zeinalipour-Yazti
  • Mohamed F. Mokbel


In this paper, we present two novel decaying operators for Telco Big Data (TBD), coined TBD-DP and CTBD-DP that are founded on the notion of Data Postdiction. Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple, which does not exist anymore as it had to be deleted to free up disk space. TBD-DP relies on existing Machine Learning (ML) algorithms to abstract TBD into compact models that can be stored and queried when necessary. Our proposed TBD-DP operator has the following two conceptual phases: (i) in an offline phase, it utilizes a LSTM-based hierarchical ML algorithm to learn a tree of models (coined TBD-DP tree) over time and space; (ii) in an online phase, it uses the TBD-DP tree to recover data within a certain accuracy. Additionally, we provide three decaying focus methods that can be plugged into the operators we propose, namely: (i) FIFO-amnesia, which is based on the time that the tuple was created; (ii) SPATIAL-amnesia, which is based on the cellular tower’s location related with the tuple; and (iii) UNIFORM-amnesia, which picks randomly the tuples to be decayed. Similarly, CTBD-DP enables the decaying of streaming data utilizing the TBD-DP tree to extend and update the stored models. In our experimental setup, we measure the efficiency of the proposed operator using a ∼10GB anonymized real telco network trace. Our experimental results in Tensorflow over HDFS are extremely encouraging as they show that TBD-DP saves an order of magnitude storage space while maintaining a high accuracy on the recovered data. Our experiments also show that CTBD-DP improves the accuracy over streaming data.


Telco big data Data decaying Data reduction Machine learning Spatio-temporal analytics 



  1. 1.
    Abbasoğlu MA, Gedik B, Ferhatosmanoğlu H (2013) Aggregate profile clustering for telco analytics. Proc VLDB Endow 6(12):1234–1237. CrossRefGoogle Scholar
  2. 2.
    Agarwal PK, Cormode G, Huang Z, Phillips J, Wei Z, Yi K (2012) Mergeable summaries. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems, PODS ’12. ACM, New York, pp 23–34.
  3. 3.
    Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) Blinkdb: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM European conference on computer systems, EuroSys ’13. ACM, New York, pp 29–42.
  4. 4.
    Barbará D, DuMouchel W, Faloutsos C, Haas PJ, Hellerstein JM, Ioannidis YE, Jagadish HV, Johnson T, Ng RT, Poosala V, Ross KA, Sevcik KC (1997) The new jersey data reduction report. IEEE Data Eng Bull 20 (4):3–45. Google Scholar
  5. 5.
    Bhattacherjee S, Deshpande A, Sussman A (2014) Pstore: an efficient storage framework for managing scientific data. In: Proceedings of the 26th international conference on scientific and statistical database management, SSDBM ’14. ACM, New York, pp 25:1–25:12.
  6. 6.
    Bhattacherjee S, Chavan A, Huang S, Deshpande A, Parameswaran A (2015) Principles of dataset versioning: exploring the recreation/storage tradeoff. Proc VLDB Endow 8(12):1346–1357CrossRefGoogle Scholar
  7. 7.
    Bicer T, Yin J, Chiu D, Agrawal G, Schuchardt K (2013) Integrating online compression to accelerate large-scale data analytics applications. In: 2013 IEEE 27th International symposium on parallel & distributed processing (IPDPS). IEEE, pp 1205–1216Google Scholar
  8. 8.
    Bouillet E, Kothari R, Kumar V, Mignet L, Nathan S, Ranganathan A, Turaga DS, Udrea O, Verscheure O (2012) Processing 6 billion cdrs/day: from research to production (experience report). In: Proceedings of the 6th ACM international conference on distributed event-based systems, DEBS ’12. ACM, New York, pp 264–267,
  9. 9.
    Braun L, Etter T, Gasparis G, Kaufmann M, Kossmann D, Widmer D, Avitzur A, Iliopoulos A, Levy E, Liang N (2015) Analytics in motion: high performance event-processing and real-time analytics in the same database. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15. ACM, New York, pp 251–264,
  10. 10.
    Burtscher M, Ratanaworabhan P (2009) Fpc: a high-speed compressor for double-precision floating-point data. IEEE Trans Comput 58(1):18–31CrossRefGoogle Scholar
  11. 11.
    Chaudhuri S, Das G, Narasayya V (2007) Optimized stratified sampling for approximate query processing. ACM Trans Database Syst 32:2. CrossRefGoogle Scholar
  12. 12.
    Cormode G, Garofalakis M, Haas PJ, Jermaine C (2012) Synopses for massive data: samples, histograms, wavelets, sketches. Found Trends Datab 4(1–3):1–294. Google Scholar
  13. 13.
    Costa C, Zeinalipour-Yazti D (2018) Telco big data: current state and future directions. In: Proceedings of the 19th IEEE international conference on mobile data management. IEEE Computer Society, ISBN: 978-1-5386-4133-0, June 27, 2018, Aalborg, Denmark, MDM‘18, pp 11–12.
  14. 14.
    Costa C, Chatzimilioudis G, Zeinalipour-Yazti D, Mokbel MF (2017) Efficient exploration of telco big data with compression and decaying. In: 2017 IEEE 33rd international conference on data engineering (ICDE), pp 1332–1343.
  15. 15.
    Costa C, Chatzimilioudis G, Zeinalipour-Yazti D, Mokbel MF (2017) Towards real-time road traffic analytics using telco big data. In: Proceedings of the international workshop on real-time business intelligence and analytics, BIRTE, Munich, Germany, August 28, 2017, pp 5:1–5:5.
  16. 16.
    Costa C, Charalampous A, Konstantinidis A, Zeinalipour-Yazti D, Mokbel MF (2018) Decaying telco big data with data postdiction. In: 2018 19th IEEE international conference on mobile data management (MDM), pp 106–115.
  17. 17.
    Dey R, Salemt FM (2017) Gate-variants of gated recurrent unit (gru) neural networks. In: 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pp 1597–1600.
  18. 18.
    Douglis F, Iyengar A (2003) Application-specific delta-encoding via resemblance detection. In: USENIX Annual technical conference, General Track, pp 113–126Google Scholar
  19. 19.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. CrossRefGoogle Scholar
  20. 20.
    Hu X, Yuan M, Yao J, Deng Y, Chen L, Yang Q, Guan H, Zeng J (2015) Differential privacy in telco big data platform. Proc VLDB Endow 8 (12):1692–1703. CrossRefGoogle Scholar
  21. 21.
    Huang Y, Zhu F, Yuan M, Deng K, Li Y, Ni B, Dai W, Yang Q, Zeng J (2015) Telco churn prediction with big data. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD. ACM, New York, pp 607–618,
  22. 22.
    Iyer AP, Li LE, Stoica I (2015) Celliq: real-time cellular network analytics at scale. In: Proceedings of the 12th USENIX conference on networked systems design and implementation, NSDI’15. USENIX Association, Berkeley, pp 309–322Google Scholar
  23. 23.
    Kersten ML (2015) Big data space fungus. In: CIDR 2015, Seventh biennial conference on innovative data systems research, Asilomar, CA, USA, January 4-7, 2015, Online ProceedingsGoogle Scholar
  24. 24.
    Kersten ML, Sidirourgos L (2017) A database system with amnesia. In: CIDRGoogle Scholar
  25. 25.
    Krishna K, Jain D, Mehta SV, Choudhary S (2018) An lstm based system for prediction of human activities with durations. Proc ACM Interact Mob Wearable Ubiquitous Technol 1(4):147:1–147:31. CrossRefGoogle Scholar
  26. 26.
    LaChapelle C (2016) The cost of data storage and management: where is the it headed in 2016?
  27. 27.
    Laiho J, Wacker A, Novosad T (2006) Radio network planning and optimisation for UMTS. WileyGoogle Scholar
  28. 28.
    Lakshminarasimhan S, Shah N, Ethier S, Klasky S, Latham R, Ross R, Samatova NF (2011) Compressing the incompressible with isabela: in-situ reduction of spatio-temporal data. In: European conference on parallel processing. Springer, pp 366–379Google Scholar
  29. 29.
    Luo C, Zeng J, Yuan M, Dai W, Yang Q (2016) Telco user activity level prediction with massive mobile broadband data. ACM Trans Intell Syst Technol 7(4):63,1–63,30. CrossRefGoogle Scholar
  30. 30.
    Savitz E (2012) Forbes magazine., [Online; April 16, 2012]
  31. 31.
    Schendel ER, Jin Y, Shah N, Chen J, Chang CS, Ku SH, Ethier S, Klasky S, Latham R, Ross R et al (2012) Isobar preconditioner for effective and high-throughput lossless data compression. In: 2012 IEEE 28th international conference on data engineering. IEEE, pp 138–149Google Scholar
  32. 32.
    Sidirourgos L, Martin, Boncz P (2011) Sciborq: Scientific data management with bounds on runtime and quality. In: Proc. of the Int’l conf. on innovative data systems research (CIDR, pp 296–301)Google Scholar
  33. 33.
    Soroush E, Balazinska M (2013) Time travel in a scientific array database. In: 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, pp 98–109Google Scholar
  34. 34.
    Wei Z, Luo G, Yi K, Du X, Wen JR (2015) Persistent data sketching. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15. ACM, New York, pp 795–810.
  35. 35.
    Yan H, Ding S, Suel T (2009) Inverted index compression and query processing with optimized document ordering. In: Proceedings of the 18th international conference on World wide web. ACM, pp 401–410Google Scholar
  36. 36.
    You LL, Pollack KT, Long DD, Gopinath K (2011) Presidio: a framework for efficient archival data storage. ACM Trans Storage (TOS) 7(2):6Google Scholar
  37. 37.
    Yuan M, Deng K, Zeng J, Li Y, Ni B, He X, Wang F, Dai W, Yang Q (2014) Oceanst: a distributed analytic system for large-scale spatiotemporal mobile broadband data. Proc VLDB Endow 7(13):1561–1564. CrossRefGoogle Scholar
  38. 38.
    Zeng K, Agarwal S, Dave A, Armbrust M, Stoica I (2015) G-ola: generalized on-line aggregation for interactive analysis on big data. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15. ACM, New York, pp 913–918.
  39. 39.
    Zhang S, Yang Y, Fan W, Lan L, Yuan M (2014) Oceanrt: real-time analytics over large temporal data. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 1099–1102,
  40. 40.
    Zhu F, Luo C, Yuan M, Zhu Y, Zhang Z, Gu T, Deng K, Rao W, Zeng J (2016) City-scale localization with telco big data. In: Proceedings of the 25th ACM international on conference on information and knowledge management, CIKM. ACM, New York, pp 439–448,

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of PittsburghPittsburghUSA
  2. 2.Frederick UniversityNicosiaCyprus
  3. 3.University of CyprusNicosiaCyprus
  4. 4.Qatar Computing Research InstituteHBKUDohaQatar
  5. 5.University of MinnesotaMinneapolisUSA

Personalised recommendations