Abstract
Let us face it. Statistics and mathematics deter almost everyone except the ones who choose to specialize in it. If you kept reading and reached this far in the book you are probably now considering skipping the chapters on Data Science and moving on to the next on Strategy because, well, it sounds more exciting. Thus, let us start this chapter on statistics by a simple example that illustrates why it is worth reading and why consultants may increasingly use mathematics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
One may recommend “Naked Statistics” from Charles Wheelan [89], which introduces the overall field of statistics in a simple and humoristic way …technical expertise not required.
- 2.
The software-hardware interface defines the field of Robotics as an application of Cybernetics, a field invented by the late Norbert Wiener and from where Machine Learning emerged as a subfield.
- 3.
Pearson correlation is the most common in loose usage.
- 4.
By general purpose, I mean the assumption of linear relationship between variables, which is often what is meant by a “simple” model in mathematics.
- 5.
- 6.
All 1-dimentional values in mathematics are referred to as scalars; multi-dimensional objects may bear different names, most common of which are vectors, matrices and tensors.
- 7.
Hyperspace is the name given to a space made of more than three dimensions (i.e. three variables). A plane that lies in a hyperspace is defined by more than two vectors, and called a hyperplane. It does not have a physical representation in our 3D world. The way scientists present “hyper-“objects such as hyperplanes is by presenting consecutive 2D planes along different values of the 4th variable, the 5th variable, etc. This is why the use of functions, matrices and tensors is strictly needed to handle computations in multivariable spaces.
- 8.
As mentioned in Sect. 6.1, the standard error is the standard deviation of the means of different sub-samples drawn from the original sample or population
- 9.
The 80/20 rule, or Pareto principle, is a principle commonly used in business and economics that states that 80% of a problem stem from only 20% of its causes. It was first suggested by the late Joseph Juran, one of the most prominent management consultants of the twentieth century.
References
Sarkar et al (2011) Translational bioinformatics: linking knowledge across biological and clinical realms. J Am Med Inform Assoc 18:354–357
Marx V (2013) The big challenges of big data. Nature 498:255–260
Siegel E (2013) Predictive analytics: the power to predict who will click, buy, lie, or die. Wiley, Hoboken
Wheelan C (2013) Naked statistics. Norton, New York
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2):1137–1145
Lee Rodgers J, Nicewander WA (1988) Thirteen ways to look at the correlation coefficient. Am Stat 42(1):59–66
Cover TM, Thomas JA (2012) Elements of information theory. Wiley, New York
Kullback S (1959) Information theory and statistics. Wiley, New York
Gower JC (1985) Properties of Euclidean and non-Euclidean distance matrices. Linear Algebra Appl 67:81–97
Legendre A (1805) Nouvelles méthodes pour la détermination des orbites des comètes. Didot, Paris
Ozer DJ (1985) Correlation and the coefficient of determination. Psychol Bull 97(2):307
Nagelkerke NJ (1991) A note on a general definition of the coefficient of determination. Biometrika 78(3):691–692
Aiken LS, West SG, Reno RR (1991) Multiple regression: testing and interpreting interactions. Sage, London
Gibbons MR (1982) Multivariate tests of financial models: a new approach. J Financ Econ 10(1):3–27
Berger JO (2013) Statistical decision theory and Bayesian analysis. Springer, New York
Ng A (2008) Artificial intelligence and machine learning, online video lecture series. Stanford University, Stanford. www.see.stanford.edu
Ott RL, Longnecker M (2001) An introduction to statistical methods and data analysis. Cengage Learning, Belmont
Tsitsiklis (2010) Probabilistic systems analysis and applied probability, online video lecture series. MIT, Cambridge. www.ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/video-lectures/
Nuzzo R (2014) Statistical errors. Nature 506(7487):150–152
Goodman SN (1999) Toward evidence-based medical statistics: the p-value fallacy. Ann Intern Med 130(12):995–1004
Lyapunov A (1901) Nouvelle forme du théorème sur la limite de probabilité. Mémoires de l'Académie de St-Petersbourg 12
Baesens B (2014) Analytics in a big data world: the essential guide to data science and its applications. Wiley, New York
Curuksu J (2012) Adaptive conformational sampling based on replicas. J Math Biol 64:917–931
Pidd M (1998) Computer simulation in management science. Wiley, Chichester
Löytynoja A (2014) Machine learning with Matlab, Nordic Matlab expo 2014. MathWorks, Stockholm. www.mathworks.com/videos/machine-learning-with-matlab-92623.html
Becla J, Lim KT, Wang DL (2010) Report from the 3rd workshop on extremely large databases. Data Sci J 8:MR1–MR16
Treinen W (2014) Big data value strategic research and innovation agenda. European Commission Press, New York
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Curuksu, J.D. (2018). Principles of Data Science: Primer. In: Data Driven. Management for Professionals. Springer, Cham. https://doi.org/10.1007/978-3-319-70229-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-70229-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70228-5
Online ISBN: 978-3-319-70229-2
eBook Packages: Business and ManagementBusiness and Management (R0)