Big Data Analytics: Views from Statistical and Computational Perspectives

  • Saumyadipta Pyne
  • B. L. S. Prakasa Rao
  • S. B. Rao


Without any doubt, the most discussed current trend in computer science and statistics is BIG DATA. Different people think of different things when they hear about big data. For the statistician, the issues are how to get usable information out of datasets that are too huge and complex for many of the traditional or classical methods to handle. For the computer scientist, big data poses problems of data storage and management, communication, and computation. For the citizen, big data brings up questions of privacy and confidentiality. This introductory chapter touches some key aspects of big data and its analysis. Far from being an exhaustive overview of this fast emerging field, this is a discussion on statistical and computational views that the authors owe to many researchers, organizations, and online sources.


Concept Drift Unstructured Data Hadoop Distribute File System Differential Privacy Data Stream Management System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Kennedy R, King G, Lazer D, Vespignani A (2014) The parable of google flu. Traps in big data analysis. Science 343:1203–1205Google Scholar
  2. 2.
    Fokoue E (2015) A taxonomy of Big Data for optimal predictive machine learning and data mining. arXiv.1501.0060v1 [stat.ML] 3 Jan 2015
  3. 3.
    Chandrasekaran V, Jodan MI (2013) Computational and statistical tradeoffs via convex relaxation. Proc Natl Acad Sci USA 110:E1181–E1190MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Matloff N (2016) Big n versus big p in Big data. In: Bühlmann P, Drineas P (eds) Handbook of Big Data. CRC Press, Boca Raton, pp 21–32Google Scholar
  5. 5.
    Portnoy S (1988) Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann Stat 16:356–366MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Tibshirani R (1996) Regression analysis and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetMATHGoogle Scholar
  7. 7.
    Report of National Research Council (2013) Frontiers in massive data analysis. National Academies Press, Washington D.CGoogle Scholar
  8. 8.
    Gama J (2010) Knowledge discovery from data streams. Chapman Hall/CRC, Boca RatonCrossRefMATHGoogle Scholar
  9. 9.
    Cormode G, Muthukrishnan S (2005) An improved data stream summary: the count-min sketch and its applications. J Algorithms 55:58–75MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Aggarwal C (2007) Data streams: models and algorithms. Springer, BerlinGoogle Scholar
  11. 11.
    Rastogi R, Guha S, Shim K (1998) Cure: an efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD, pp 73–84Google Scholar
  12. 12.
    Ma H, Zhao W, He C (2009) Parallel k-means clustering based on MapReduce. CloudCom, pp 674–679Google Scholar
  13. 13.
    Aflalo Y, Kimmel R (2013) Spectral multidimensional scaling. Proc Natl Acad Sci USA 110:18052–18057MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Johnson WB, Lindenstrauss J (1984) Extensions of lipschitz mappings into a hilbert space. Contemp Math 26:189–206MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the ICML, pp 186–193Google Scholar
  16. 16.
    Zimek A (2015) Clustering high-dimensional data. In: Data clustering: algorithms and applications. CRC Press, Boca RatonGoogle Scholar
  17. 17.
    University of California at Berkeley AMP Lab. Accessed April 2016
  18. 18.
    Pyne S, Vullikanti A, Marathe M (2015) Big data applications in health sciences and epidemiology. In: Raghavan VV, Govindaraju V, Rao CR (eds) Handbook of statistics, vol 33. Big Data analytics. Elsevier, Oxford, pp 171–202Google Scholar
  19. 19.
    Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives and prospects. Science 349(255–60):26MathSciNetGoogle Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  • Saumyadipta Pyne
    • 1
  • B. L. S. Prakasa Rao
    • 2
  • S. B. Rao
    • 2
  1. 1.Indian Institute of Public HealthHyderabadIndia
  2. 2.C.R. Rao Advanced Institute of Mathematics, Statistics and Computer ScienceHyderabadIndia

Personalised recommendations