Statistical Challenges with Big Data in Management Science

  • Arnab Laha


In the past few years, there has been an increasing awareness that the enormous amount of data being captured by both public and private organisations can be profitably used for decision making. Aided by low-cost computer hardware, fast processing speeds and advancements in data storage technologies, Big Data Analytics has emerged as a fast growing field. However, the statistical challenges that are faced by statisticians and data scientists, while doing analytics with Big Data has not been adequately discussed. In this paper, we discuss the several statistical challenges that are encountered while analyzing Big data for management decision making. These challenges give statisticians significant opportunities for developing new statistical methods. Two methods—Symbolic Data Analysis and Approximate Stream Regression—which holds promise in addressing some of the challenges with Big Data are discussed briefly with real life examples. Two case studies of applications of analytics in management—one in marketing management and the other in human resource management—are discussed.


Exponentially Weighted Move Average Concept Drift Statistical Challenge Streaming Data Advertisement Video 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bhattacharya A, Bhattacharya R (2012) Nonparametric inference on manifolds: with applications to shape spaces. Cambridge University Press, CambridgeGoogle Scholar
  2. 2.
    Billard L (2011) Brief overview of symbolic data and analytic issues. Stat Anal Data Min 4(2):149–156MathSciNetCrossRefGoogle Scholar
  3. 3.
    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth IncGoogle Scholar
  4. 4.
    Coppersmith D, Winograd S (1990) Matrix multiplication via arithmetic progressions. J Symb Comput 9:251–280MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press and McGraw HillGoogle Scholar
  6. 6.
    Davenport TH, Harris JG (2007) Competing on analytics: the new science of winning. Harvard Business School Publishing CorporationGoogle Scholar
  7. 7.
    Desarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:249–282MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Dryden IL, Mardia KV (1998) Statistical shape analysis. WileyGoogle Scholar
  9. 9.
    Jammalamadaka SR, Sengupta A (2001) Topics in circular statistics. World ScientificGoogle Scholar
  10. 10.
    Karr AF, Sanil AP, Banks DL (2006) Data quality: a statistical perspective. Stat Methodol 3(2):137173MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Landefield S (2014) Uses of big data for official statistics: privacy, incentives, statistical challenges, and other issues. Discussion Paper, International conference on big data for official statistics, Beijing, China, 28–30 Oct 2014. Accessed 30 May 2015
  12. 12.
    Le-Rademacher J, Billard L (2011) Likelihood functions and some maximum likelihood estimators for symbolic data. J Stat Plan Inference 141:1593–1602MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Mahalonobis PC (1965) Statistics as a key technology. Am Stat 19(2):43–46Google Scholar
  14. 14.
    Majumdar K, Mukherjee S (2011) Designing intelligent recommendations for cross selling. In: Video documentation of 2nd IIMA International conference on advanced data analysis, business analytics and intelligence, DVD-II, IIM Ahmedabad, IndiaGoogle Scholar
  15. 15.
    Mardia KV, Jupp PE (1999) Directional statistics. WileyGoogle Scholar
  16. 16.
    Montgomery DC (2012) Statistical quality control, 7th edn. WileyGoogle Scholar
  17. 17.
    Nadungodage CH, Xia Y, Li F, Lee JJ, Ge J (2011) StreamFitter: a real time linear regression analysis system for continuous data streams. In: Xu J, Kim MH, Unland R (eds) Database systems for advanced applications, pp 458–461. SpringerGoogle Scholar
  18. 18.
    Noirhomme-Fraiture M, Brito P (2011) Far beyond the classical models: symbolic data analysis. Stat Anal Data Min 4(2):157–170MathSciNetCrossRefGoogle Scholar
  19. 19.
    Ramsey JO, Silverman BW (2005) Functional data analysis, 2nd edn. SpringerGoogle Scholar
  20. 20.
    Reiter JP (2012) Statistical approaches to protecting confidentiality for microdata and their effects on the quality of statistical inferences. Public Opin Q 76(1):163–181CrossRefGoogle Scholar
  21. 21.
    Rao CR (1973) Linear statistical inference and its applications, 2nd edn. WileyGoogle Scholar
  22. 22.
    Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho ACPLF, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv 46, 1, 13:1–13:31Google Scholar
  23. 23.
    Smith BC, Leimkuhler JF, Darrow RM (1992) Yield management at American Airlines. Interfaces 22(2):8–31CrossRefGoogle Scholar
  24. 24.
    Strassen V (1969) Gaussian elimination is not optimal. Numerische Mathematik 13:354–356MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Srivastava R (2015) Analytics for improving talent acquisition process. In: Video documentation of 4th IIMA International conference on advanced data analysis, business analytics and intelligence, DVD-II, IIM Ahmedabad, IndiaGoogle Scholar
  26. 26.
    Tandon R, Chakraborty A, Srinivasan G, Shroff M, Abdullah A, Shamasundar B, Sinha R, Subramanian S, Hill D, Dhore P (2013) Hewlett Packard: delivering profitable growth for using operations research. Interfaces 43(1):48–61CrossRefGoogle Scholar
  27. 27.
    Wegman EJ, Solka JL (2005) Statistical data mining. In: Rao CR, Wegman EJ, Solka JL (eds) Data mining and data visualization, handbook of statistics, vol 24. ElsevierGoogle Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.Indian Institute of Management AhmedabadAhmedabadIndia

Personalised recommendations