Abstract
Most big data analytics research is scattered across multiple disciplines such as applied statistics, machine learning, language technology or databases. Little attention has been paid to aligning big data solutions with end-user’s mental models for conducting exploratory and predictive data analysis. We are particularly interested in the way domain experts perform big data analysis by applying statistics to big data with a focus on statistical learning. In this paper we compare and contrast the different views about data between the fields of statistics and computer science. We review popular analysis techniques and tools within a defined analytics stack. We then propose a model-driven architecture that uses semantic and event processing technologies to achieve a separation of concerns between expressing the mathematical model and the computational requirements. The paper also describes an implementation of the proposed architecture with a case study in funds management.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This is a multiple linear regression, a widely used form in statistical learning.
References
Laney, D.: 3-D data management: controlling data volume, velocity and variety. Application Delivery Strategies by META Group Inc. (2001)
Diebold, F.X.: A personal perspective on the origin(s) and development of “big data”: the phenomenon, the term, and the discipline (Scholarly Paper No. ID 2202843). Social Science Research Network (2012)
Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144. http://doi.org/10.1016/j.ijinfomgt.2014.10.007 (2015)
McKinsey & Company, Big data: The next frontier for innovation, competition, and productivity, p. 156. McKinsey Global Institute (2011)
Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. In: Proc. VLDB Endow. 5(12), 2032–2033 (2012)
Baesens, B.: Analytics in a big data world: the essential guide to data science and its applications. Wiley and SAS Business Series (2014)
NIST/SEMATECH e-Handbook of statistical methods. http://www.itl.nist.gov/div898/handbook/. Accessed 15 Feb 2016
Milosevic, Z., Chen, W., Berry A., Rabhi, F.A.: An open architecture for event-based analytics, submitted to Computing (2015)
Lee, A.S., Hubona, G.S.: A scientific basis for rigor in information systems research. MIS Q. 33(2), 237–262 (2009)
Schutt, R., O’Neil, C.: Doing Data Science: Straight Talk from the Frontline. O’Reilly Media Inc (2013)
Landau, S., Everitt, B.S.: A handbook of statistical analysis using SPSS, pp. 8–11. CRC Press (2004)
Robertson, C.S., Rabhi, F.A., Peat, M.: A service-oriented approach towards real time financial news analysis. In: Consumer Information Systems and Relationship Management: Design, Implementation, and Use: Design, Implementation, and Use (2013)
Tan, A.: Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases, vol. 8. (1999)
Ming, F.: Stock market prediction from WSJ: text mining via sparse matrix factorization. In: 2014 IEEE International Conference on Data Mining (ICDM). IEEE (2014)
Kohavi, R., Provost, F.: Glossary of terms. Mach. Learn. 30, 271–274 (1998)
Deng, L., Yu. D.: Deep learning: methods and applications. Found. Tr. Signal Process. 7(3–4), 197–387 (2014)
Shen, S., Jiang, H., Zhang, T.: Stock market forecasting using machine learning algorithms (2012)
Zaidi, S., Nasir, M.: Teaching and Learning Methods in Medicine. Springer (2015)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning with applications in R. Springer, New York (2013)
Frankel, D.: Model Driven Architecture: Applying MDA to Enterprise Computing. OMG Press (2007)
Atkinson, C., Kühne, T.: Model-driven development: a metamodeling foundation. IEEE Softw. 20(5), 36–41 (2003)
Soley, R.: OMG staff strategy group, model driven architecture. OMG White Paper, pp. 1–12. (April 2000)
Sendall, S., Kozaczynski, W.: Model transformation: the heart and soul of model-driven software development. IEEE Softw. 20(5), 42–45 (2003)
Jouault, F., Allilaire, F., Bézivin, J., Kurtev, I.: ATL: a model transformation tool. Sci. Comput. Program. 72(1–2), 31–39 (2008)
Agrawal, G., Karsai, Z., Kalmar, S., Neema, F., Vizhanyo, A.: The Design of a simple language for graph transformations. J. Softw. Syst. Model. (submitted for publication) (2005)
Gardner, T., Griffin, C.: A review of OMG MOF 2.0 Query/Views/Transformations Submissions and Recommendations Towards the Final Standard. IBM Hurley Development Lab., e-Business Integration Technologies (2003)
Varró, D., Varró, G., Pataricza, A.: Designing the automatic transformation of visual languages. J. Sci. Comput. Program. 44, 205–227 (2002)
W3C Consortium, Semantic Web. https://www.w3.org/standards/semanticweb/. Accessed 18 Feb 2016
W3C Consortium: Resource Description Framework (RDF). http://www.w3.org/RDF/. Accessed 7 Nov 2014
Allemang, D., Hendler, J.: Semantic Web For The Working Ontologist: Effective Modeling in RDFS and OWL. Morgan Kaufmann (2008)
Dodge, Y.: The Oxford Dictionary of Statistical Terms. OUP (2003)
Wikipedia: Latent Variable (Definition). https://en.wikipedia.org/wiki/Latent_variable. Accessed 3 Feb 2016
Info Package for UNSW Data Science Hackathon. http://www.cse.unsw.edu.au/~fethir/HackathonInfo/HackathonStudentPack_v7.pdf. Accessed 10 Sep 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Behnaz, A., Rabhi, F., Peat, M. (2017). A Software Architecture for Enabling Statistical Learning on Big Data. In: Rojas, I., Pomares, H., Valenzuela, O. (eds) Advances in Time Series Analysis and Forecasting. ITISE 2016. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-55789-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-55789-2_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55788-5
Online ISBN: 978-3-319-55789-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)