Skip to main content

A Software Architecture for Enabling Statistical Learning on Big Data

  • Conference paper
  • First Online:
Advances in Time Series Analysis and Forecasting (ITISE 2016)

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Included in the following conference series:

Abstract

Most big data analytics research is scattered across multiple disciplines such as applied statistics, machine learning, language technology or databases. Little attention has been paid to aligning big data solutions with end-user’s mental models for conducting exploratory and predictive data analysis. We are particularly interested in the way domain experts perform big data analysis by applying statistics to big data with a focus on statistical learning. In this paper we compare and contrast the different views about data between the fields of statistics and computer science. We review popular analysis techniques and tools within a defined analytics stack. We then propose a model-driven architecture that uses semantic and event processing technologies to achieve a separation of concerns between expressing the mathematical model and the computational requirements. The paper also describes an implementation of the proposed architecture with a case study in funds management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is a multiple linear regression, a widely used form in statistical learning.

References

  1. Laney, D.: 3-D data management: controlling data volume, velocity and variety. Application Delivery Strategies by META Group Inc. (2001)

    Google Scholar 

  2. Diebold, F.X.: A personal perspective on the origin(s) and development of “big data”: the phenomenon, the term, and the discipline (Scholarly Paper No. ID 2202843). Social Science Research Network (2012)

    Google Scholar 

  3. Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144. http://doi.org/10.1016/j.ijinfomgt.2014.10.007 (2015)

  4. McKinsey & Company, Big data: The next frontier for innovation, competition, and productivity, p. 156. McKinsey Global Institute (2011)

    Google Scholar 

  5. Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. In: Proc. VLDB Endow. 5(12), 2032–2033 (2012)

    Google Scholar 

  6. Baesens, B.: Analytics in a big data world: the essential guide to data science and its applications. Wiley and SAS Business Series (2014)

    Google Scholar 

  7. NIST/SEMATECH e-Handbook of statistical methods. http://www.itl.nist.gov/div898/handbook/. Accessed 15 Feb 2016

  8. Milosevic, Z., Chen, W., Berry A., Rabhi, F.A.: An open architecture for event-based analytics, submitted to Computing (2015)

    Google Scholar 

  9. Lee, A.S., Hubona, G.S.: A scientific basis for rigor in information systems research. MIS Q. 33(2), 237–262 (2009)

    Google Scholar 

  10. Schutt, R., O’Neil, C.: Doing Data Science: Straight Talk from the Frontline. O’Reilly Media Inc (2013)

    Google Scholar 

  11. Landau, S., Everitt, B.S.: A handbook of statistical analysis using SPSS, pp. 8–11. CRC Press (2004)

    Google Scholar 

  12. Robertson, C.S., Rabhi, F.A., Peat, M.: A service-oriented approach towards real time financial news analysis. In: Consumer Information Systems and Relationship Management: Design, Implementation, and Use: Design, Implementation, and Use (2013)‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬

    Google Scholar 

  13. Tan, A.: Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases, vol. 8. (1999)

    Google Scholar 

  14. Ming, F.: Stock market prediction from WSJ: text mining via sparse matrix factorization. In: 2014 IEEE International Conference on Data Mining (ICDM). IEEE (2014)

    Google Scholar 

  15. Kohavi, R., Provost, F.: Glossary of terms. Mach. Learn. 30, 271–274 (1998)

    Article  Google Scholar 

  16. Deng, L., Yu. D.: Deep learning: methods and applications. Found. Tr. Signal Process. 7(3–4), 197–387 (2014)

    Google Scholar 

  17. Shen, S., Jiang, H., Zhang, T.: Stock market forecasting using machine learning algorithms (2012)

    Google Scholar 

  18. Zaidi, S., Nasir, M.: Teaching and Learning Methods in Medicine. Springer (2015)

    Google Scholar 

  19. James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning with applications in R. Springer, New York (2013)

    Google Scholar 

  20. Frankel, D.: Model Driven Architecture: Applying MDA to Enterprise Computing. OMG Press (2007)

    Google Scholar 

  21. Atkinson, C., Kühne, T.: Model-driven development: a metamodeling foundation. IEEE Softw. 20(5), 36–41 (2003)

    Article  Google Scholar 

  22. Soley, R.: OMG staff strategy group, model driven architecture. OMG White Paper, pp. 1–12. (April 2000)

    Google Scholar 

  23. Sendall, S., Kozaczynski, W.: Model transformation: the heart and soul of model-driven software development. IEEE Softw. 20(5), 42–45 (2003)

    Article  Google Scholar 

  24. Jouault, F., Allilaire, F., Bézivin, J., Kurtev, I.: ATL: a model transformation tool. Sci. Comput. Program. 72(1–2), 31–39 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  25. Agrawal, G., Karsai, Z., Kalmar, S., Neema, F., Vizhanyo, A.: The Design of a simple language for graph transformations. J. Softw. Syst. Model. (submitted for publication) (2005)

    Google Scholar 

  26. Gardner, T., Griffin, C.: A review of OMG MOF 2.0 Query/Views/Transformations Submissions and Recommendations Towards the Final Standard. IBM Hurley Development Lab., e-Business Integration Technologies (2003)

    Google Scholar 

  27. Varró, D., Varró, G., Pataricza, A.: Designing the automatic transformation of visual languages. J. Sci. Comput. Program. 44, 205–227 (2002)

    Article  MATH  Google Scholar 

  28. W3C Consortium, Semantic Web. https://www.w3.org/standards/semanticweb/. Accessed 18 Feb 2016

  29. W3C Consortium: Resource Description Framework (RDF). http://www.w3.org/RDF/. Accessed 7 Nov 2014

  30. Allemang, D., Hendler, J.: Semantic Web For The Working Ontologist: Effective Modeling in RDFS and OWL. Morgan Kaufmann (2008)

    Google Scholar 

  31. Dodge, Y.: The Oxford Dictionary of Statistical Terms. OUP (2003)

    Google Scholar 

  32. Wikipedia: Latent Variable (Definition). https://en.wikipedia.org/wiki/Latent_variable. Accessed 3 Feb 2016

  33. Info Package for UNSW Data Science Hackathon. http://www.cse.unsw.edu.au/~fethir/HackathonInfo/HackathonStudentPack_v7.pdf. Accessed 10 Sep 2016

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Behnaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Behnaz, A., Rabhi, F., Peat, M. (2017). A Software Architecture for Enabling Statistical Learning on Big Data. In: Rojas, I., Pomares, H., Valenzuela, O. (eds) Advances in Time Series Analysis and Forecasting. ITISE 2016. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-55789-2_24

Download citation

Publish with us

Policies and ethics