Advertisement

Implementing Big Data Analytics Projects in Business

  • Françoise Fogelman-SouliéEmail author
  • Wenhuan Lu
Chapter
Part of the Studies in Big Data book series (SBD, volume 16)

Abstract

Big Data analytics present both opportunities and challenges for companies. It is important that, before embarking on a Big Data project, companies understand the value offered by Big Data and the processes needed to extract it. This chapter discusses why companies should progressively increase their data volumes and the process to follow for implementing a Big Data project. We present a variety of architectures, from in-memory servers to Hadoop, to handle Big Data. We introduce the concept of Data Lake and discuss its benefits for companies and the research still required to fully deploy it. We illustrate some of the points discussed in the chapter through the presentation of various architectures available for running Big Data initiatives, and discuss the expected evolution of hardware and software tools in the near future.

Keywords

Feature Engineering Hadoop MapReduce Data Lake Horizontal Scaling Fraudulent Transaction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Amatriain, X., Basilico, J.: Netflix Recommendations: Beyond the 5 stars. Netflix Techblog. (6 April 6 2012)Google Scholar
  2. 2.
    Amin, R., Arefin, T.: The empirical study on the factors affecting datawarehousing success. Int. J. Latest Trends Comput. 1(2), 138–142 (Dec 2010)Google Scholar
  3. 3.
    Anderson, M., Antenucci, D., Bittorf, V., Burgess, M., Cafarella, M. J., Kumar, A., Niu, F., Park, Y., Ré, C. & Zhang, C.: Brainwash: A Data System for Feature Engineering. CIDR’13 (2013)Google Scholar
  4. 4.
    Chapus, B., Fogelman Soulié, F., Marcadé, E., Sauvage, J.: Mining on social networks. In: Gettler Summa, M., Bottou, L., Goldfarb, B., F. Murtagh (eds.) Statistical Learning and Data Science, Computer Science and Data Analysis Series. CRC Press, Chapman & Hall (2011)Google Scholar
  5. 5.
    Conway, D.: The Data Science Venn Diagram (2013). Blog. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  6. 6.
    Davenport, T.H., Patil, D.J.: Data Scientist: The Sexiest Job of the 21st Century. Harvard Bus. Rev. 70–76 (Oct 2012)Google Scholar
  7. 7.
    Davenport, T.H.: Competing on analytics. Harvard Bus. Rev. 84, 98–107 (2006)Google Scholar
  8. 8.
    Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)CrossRefGoogle Scholar
  9. 9.
    Driscoll, M.: Building data startups: Fast, big, and focused. Low costs and cloud tools are empowering new data startups. O’Reilly Radar (August 9, 2011)Google Scholar
  10. 10.
    Eckerson, W.W.: Predictive Analytics. Extending the Value of Your Data Warehousing Investment. TDWI Best Practices. Report. Q1, 2007 (2007)Google Scholar
  11. 11.
    Fogelman-Soulié, F., Mekki, A., Sean, S., & Stepniewski, P.: Utilisation des réseaux sociaux dans la lutte contre la fraude à la carte bancaire sur Internet. In: Bennani, Y., Viennet, E. (eds.) Apprentissage Artificiel & Fouille de Données. Revue des Nouvelles Technologies de l’Information, RNTI-A-6. Hermann, pp. 99–119 (2012) (in French)Google Scholar
  12. 12.
    Fogelman Soulié, F., Marcadé, E.: Industrial Mining of Massive Data Sets. Mining massive Data Sets for Security. In: Fogelman-Soulié, F., Perrotta, D., Pikorski, J., Steinberger, R. (eds.) Advances in data mining, search, social networks and text mining and their applications to security, pp. 44-61. IOS Press. NATO ASI Series (2008)Google Scholar
  13. 13.
    Gantz, J.F.: The Expanding Digital Universe. IDC White Paper (March 2007)Google Scholar
  14. 14.
    Groupement des Cartes Bancaires CB: Activity, Report (2013)Google Scholar
  15. 15.
    Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)CrossRefGoogle Scholar
  16. 16.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning (vol. 2, no. 1). Springer, New York (2009)Google Scholar
  17. 17.
    Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. (TOIS) 22(1), 5–53 (2004)CrossRefGoogle Scholar
  18. 18.
    Herschel, G., Linden, A., Kart, L.: Magic Quadrant for Advanced Analytics Platforms. Gartner Report G00270612 (2015)Google Scholar
  19. 19.
    Heudecker, N., White, A.: The Data Lake Fallacy: All Water and Little Substance. Gartner Report G00264950 (2014)Google Scholar
  20. 20.
    Hilbert, M., López, P.: The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011)CrossRefGoogle Scholar
  21. 21.
    Leinweber, D.J.: Stupid data miner tricks: overfitting the S & P 500. J. Investing 16(1), 15–22 (2007)CrossRefGoogle Scholar
  22. 22.
    Lam, C.: Hadoop in action. Manning Publications Co (2010)Google Scholar
  23. 23.
    Laney, D.: Big Data’s 10 Biggest Vision and Strategy Questions. Gartner Blog (2015)Google Scholar
  24. 24.
    Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety. Application Delivery Strategies, Meta Group (2001)Google Scholar
  25. 25.
    Machlis, S.: Chart and image gallery: 30+ free tools for data visualization and analysis. Computerworld (2013)Google Scholar
  26. 26.
    Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Hung Byers, A.: Big data: The next frontier for innovation, competition, and productivity. Report, McKinsey Global Institute (2011)Google Scholar
  27. 27.
    Olson, M.: Hadoop: Scalable, flexible data storage and analysis. IQT Quart 1(3), 14–18. (Spring 2010)Google Scholar
  28. 28.
    Piatetsky, G.: KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll. KDnuggets (2014)Google Scholar
  29. 29.
    Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)Google Scholar
  30. 30.
    Rexer, K.: 2013 Data Miner Survey. Rexer Analytics (2013)Google Scholar
  31. 31.
    Stein, B., Morrison, A.: The enterprise data lake: Better integration and deeper analytics. PwC Technology Forecast: Rethinking integration. Issue 1 (2014)Google Scholar
  32. 32.
    Turck, M.: The state of big data in 2014 (chart). VB News (2014)Google Scholar
  33. 33.
    Vapnik, V.: Estimation of dependences based on empirical data. Springer. Information sciences and Statistics. Reprint of 1982 Edition with afterword (2006)Google Scholar
  34. 34.
    Vasanth, R.: The Rise Of Big Data Industry: A Market Worth 53.4 Billion By 2017 ! Dazeinfo (2014)Google Scholar
  35. 35.
    Zhou, T., Ren, J., Medo, M., Zhang, Y.-C.: Bipartite network projection and personal recommendation. Phys. Rev. E 76(4), 046115 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Computer SoftwareTianjin UniversityTianjinChina

Personalised recommendations