Massive Data Analysis: Tasks, Tools, Applications, and Challenges

  • Murali K. Pusala
  • Mohsen Amini Salehi
  • Jayasimha R. Katukuri
  • Ying Xie
  • Vijay Raghavan


In this study, we provide an overview of the state-of-the-art technologies in programming, computing, and storage of the massive data analytics landscape. We shed light on different types of analytics that can be performed on massive data. For that, we first provide a detailed taxonomy on different analytic types along with examples of each type. Next, we highlight technology trends of massive data analytics that are available for corporations, government agencies, and researchers. In addition, we enumerate several instances of opportunities that exist for turning massive data into knowledge. We describe and position two distinct case studies of massive data analytics that are being investigated in our research group: recommendation systems in e-commerce applications; and link discovery to predict unknown association of medical concepts. Finally, we discuss the lessons we have learnt and open challenges faced by researchers and businesses in the field of massive data analytics.


Recommendation System Link Prediction Graph Database Hadoop Distribute File System MapReduce Framework 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    5 must-have lessons from the 2014 holiday season. Accessed 6 March 2015
  2. 2.
    6 uses of big data for online retailers. Accessed 28 Feb 2015
  3. 3.
    Abbasi A, Albrecht C, Vance A, Hansen J (2012) Metafraud: a meta-learning framework for detecting financial fraud. MIS Q 36(4):1293–1327Google Scholar
  4. 4.
    Al Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. In: Aggarwal CC (ed) Social network data analytics, pp 243–275. Springer USGoogle Scholar
  5. 5.
    Alaçam O, Dalcı D (2009) A usability study of webmaps with eye tracking tool: The effects of iconic representation of information. In: Proceedings of the 13th international conference on human-computer interaction. Part I: new trends, pp 12–21. SpringerGoogle Scholar
  6. 6.
    Amini Salehi M, Caldwell T, Fernandez A, Mickiewicz E, Redberg D, Rozier EWD, Zonouz S (2014) RESeED: regular expression search over encrypted data in the cloud. In: Proceedings of the 7th IEEE Cloud conference, Cloud ’14, pp 673–680Google Scholar
  7. 7.
    Assunção MD, Calheiros RN, Bianchi S, Netto MAS, Buyya R (2014) Big data computing and clouds: Trends and future directions. J Parallel Distrib ComputGoogle Scholar
  8. 8.
    Australian square kilometer array pathfinder radio telescope. Accessed 28 Feb 2015
  9. 9.
    Big data and content analytics: measuring the ROI. Accessed 28 Feb 2015
  10. 10.
    Buckinx W, Verstraeten G, Van den Poel D (2007) Predicting customer loyalty using the internal transactional database. Expert Syst Appl 32(1):125–134CrossRefGoogle Scholar
  11. 11.
    Chen Y, Canny JF (2011) Recommending ephemeral items at web scale. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, pp 1013–1022. ACMGoogle Scholar
  12. 12.
    Dean J (2006) Experiences with MapReduce, an abstraction for large-scale computation. In: Proceedings of the 15th international conference on parallel architectures and compilation techniques, PACT ’06Google Scholar
  13. 13.
    Dean J, Ghemawat S (2008) MapReduce: Simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  14. 14.
    Descriptive, predictive, prescriptive: transforming asset and facilities management with analytics (2013)Google Scholar
  15. 15.
    Enhancing exploration and production with big data in oil & gas. Accessed 28 Feb 2015
  16. 16.
    Facebook. Accessed 14 March 2015
  17. 17.
    Farris A (2012) How big data is changing the oil & gas industry. Analyt MagGoogle Scholar
  18. 18.
    Gartner survey reveals that 73 percent of organizations have invested or plan to invest in big data in the next two years. Accessed 28 Feb 2015
  19. 19.
    Gartner taps predictive analytics as next big business intelligence trend. Accessed 28 Feb 2015
  20. 20.
    Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: Proceedings of the 19th ACM symposium on operating systems principles, SOSP ’03, pp 29–43Google Scholar
  21. 21.
    Gudivada VN, Baeza-Yates R, Raghavan VV (2015) Big data: promises and problems. Computer 3:20–23Google Scholar
  22. 22.
    Gudivada VN, Rao D, Raghavan VV (2014) NoSQL systems for big data management. In: IEEE World congress on Services (SERVICES), 2014, pp 190–197. IEEEGoogle Scholar
  23. 23.
    Gudivada VN, Rao D, Raghavan VV (2014) Renaissance in data management systems: SQL, NoSQL, and NewSQL. IEEE Computer (in Press)Google Scholar
  24. 24.
    HPCC vs Hadoop. Accessed 14 March 2015
  25. 25.
    IBM netfinity predictive failure analysis. Accessed 14 March 2015
  26. 26.
    Indrawan-Santiago M (2012) Database research: are we at a crossroad? reflection on NoSQL. In: 2012 15th International conference on network-based information systems (NBiS), pp 45–51Google Scholar
  27. 27.
    Instagram. Accessed 28 Feb 2015
  28. 28.
    Jayasimha K, Rajyashree M, Tolga K (2013) Large-scale recommendations in a dynamic marketplace. In Workshop on large scale recommendation systems at RecSys 13:Google Scholar
  29. 29.
    Jayasimha K, Rajyashree M, Tolga K (2015) Subjective similarity: personalizing alternative item recommendations. In: WWW workshop: Ad targeting at scaleGoogle Scholar
  30. 30.
    Katukuri JR, Xie Y, Raghavan VV, Gupta A (2012) Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks. BMC Genom 13(Suppl 3):S5Google Scholar
  31. 31.
    Katukuri J, Konik ,T Mukherjee R, Kolay S (2014) Recommending similar items in large-scale online marketplaces. In: 2014 IEEE International conference on Big Data (Big Data), pp 868–876. IEEEGoogle Scholar
  32. 32.
    Ko SY, Hoque I, Cho B, Gupta I (2010) Making cloud intermediate data fault-tolerant. In: Proceedings of the 1st ACM symposium on cloud computing, SoCC ’10, pp 181–192Google Scholar
  33. 33.
    Lam C (2010) Hadoop in action, 1st edn. Manning Publications Co., Greenwich, CT, USAGoogle Scholar
  34. 34.
    Li W, Yang Y, Yuan D (2011) A novel cost-effective dynamic data replication strategy for reliability in cloud data centres. In: Proceedings of the Ninth IEEE international conference on dependable, autonomic and secure computing, DASC ’11, pp 496–502Google Scholar
  35. 35.
    Lohr S (2012) The age of big data. New York Times 11Google Scholar
  36. 36.
    Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. arxiv preprint. arXiv:1006.4990
  37. 37.
    Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD ’10, pp 135–146Google Scholar
  38. 38.
    Manyika J, Michael C, Brad B, Jacques B, Richard D, Charles R (2011) Angela Hung Byers, and McKinsey Global Institute. The next frontier for innovation, competition, and productivity, Big dataGoogle Scholar
  39. 39.
    Martin A, Knauth T, Creutz S, Becker D, Weigert S, Fetzer C, Brito A (2011) Low-overhead fault tolerance for high-throughput data processing systems. In: Proceedings of the 31st International conference on distributed computing systems, ICDCS ’11, pp 689–699Google Scholar
  40. 40.
    Middleton AM, Bayliss DA, Halliday G (2011) ECL/HPCC: A unified approach to big data. In: Furht B, Escalante A (eds) Handbook of data intensive computing, pp 59–107. Springer, New YorkGoogle Scholar
  41. 41.
    Middleton AM (2011) Lexisnexis, and risk solutions. White Paper HPCC systems: data intensive supercomputing solutions. SolutionsGoogle Scholar
  42. 42.
    NASA applies text analytics to airline safety. Accessed 28 Feb 2015
  43. 43.
    New IDC worldwide big data technology and services forecast shows market expected to grow to $32.4 billion in 2017. Accessed 28 Feb 2015
  44. 44.
    Processing large-scale graph data: A guide to current technology. Accessed 08 Sept 2015
  45. 45.
    Purdue university achieves remarkable results with big data. Accessed 28 Feb 2015
  46. 46.
    Reinsel R, Minton S, Turner V, Gantz JF (2014) The digital universe of opportunities: rich data and increasing value of the internet of thingsGoogle Scholar
  47. 47.
    Resources:HPCC systems. Accessed 14 March 2015
  48. 48.
    Russom P et al (2011) Big data analytics. TDWI Best Practices Report, Fourth QuarterGoogle Scholar
  49. 49.
    Salehi M, Buyya R (2010) Adapting market-oriented scheduling policies for cloud computing. In: Algorithms and architectures for parallel processing, vol 6081 of ICA3PP’ 10. Springer, Berlin, pp 351–362Google Scholar
  50. 50.
    Sato K (2012) An inside look at google bigquery. White paper.
  51. 51.
    Shinnar A, Cunningham D, Saraswat V, Herta B (2012) M3r: Increased performance for in-memory hadoop jobs. Proc VLDB Endown 5(12):1736–1747CrossRefGoogle Scholar
  52. 52.
    Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: Proceedings of the 26th IEEE symposium on mass storage systems and technologies, MSST ’10, pp 1–10Google Scholar
  53. 53.
    Singh VK, Gao M, Jain R (2012) Situation recognition: an evolving problem for heterogeneous dynamic big multimedia data. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12, pp 1209–1218, New York, NY, USA, 2012. ACMGoogle Scholar
  54. 54.
    The future of big data? three use cases of prescriptive analytics. Accessed 02 March 2015
  55. 55.
    The large Hadron collider. Accessed 28 Feb 2015
  56. 56.
  57. 57.
    Troester M (2012) Big data meets big data analytics, p 13Google Scholar
  58. 58.
    Van den Poel D, Buckinx W (2005) Predicting online-purchasing behaviour. Eur J Oper Res 166(2):557–575MathSciNetCrossRefMATHGoogle Scholar
  59. 59.
    VC funding trends in big data (IDC report). Accessed 28 Feb 2015
  60. 60.
    Wang J, Gong W, Varman P, Xie C (2012) Reducing storage overhead with small write bottleneck avoiding in cloud raid system. In: Proceedings of the 2012 ACM/IEEE 13th international conference on grid computing, GRID ’12, pp 174–183, Washington, DC, USA, 2012. IEEE Computer SocietyGoogle Scholar
  61. 61.
    Wolpin S (2006) An exploratory study of an intranet dashboard in a multi-state healthcare systemGoogle Scholar
  62. 62.
    Youtube statistics. Accessed 28 Feb 2015
  63. 63.
    Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Fut Gen Comput Syst 26(8):1200–1214CrossRefGoogle Scholar
  64. 64.
    Yuan D, Cui L, Liu X (2014) Cloud data management for scientific workflows: research issues, methodologies, and state-of-the-art. In: 10th International conference on semantics, knowledge and grids (SKG), pp 21–28Google Scholar
  65. 65.
    Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, NSDI’12, pp 2–12. USENIX AssociationGoogle Scholar
  66. 66.
    Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: Cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing, HotCloud’10, pp 10–15Google Scholar
  67. 67.
    Zhang C, Chang EC, Yap RHC (2014) Tagged-MapReduce: a general framework for secure computing with mixed-sensitivity data on hybrid clouds. In: Proceedings of 14th IEEE/ACM international symposium on cluster, cloud and grid computing, pp 31–40Google Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  • Murali K. Pusala
    • 1
  • Mohsen Amini Salehi
    • 2
  • Jayasimha R. Katukuri
    • 1
  • Ying Xie
    • 3
  • Vijay Raghavan
    • 1
  1. 1.Center of Advanced Computer Studies (CACS)University of Louisiana LafayetteLafayetteUSA
  2. 2.School of Computing and InformaticsUniversity of Louisiana LafayetteLafayetteUSA
  3. 3.Department of Computer ScienceKennesaw State UniversityKennesawUSA

Personalised recommendations