Implementing Big Data Analytics Projects in Business
- 1 Citations
- 3.5k Downloads
Abstract
Big Data analytics present both opportunities and challenges for companies. It is important that, before embarking on a Big Data project, companies understand the value offered by Big Data and the processes needed to extract it. This chapter discusses why companies should progressively increase their data volumes and the process to follow for implementing a Big Data project. We present a variety of architectures, from in-memory servers to Hadoop, to handle Big Data. We introduce the concept of Data Lake and discuss its benefits for companies and the research still required to fully deploy it. We illustrate some of the points discussed in the chapter through the presentation of various architectures available for running Big Data initiatives, and discuss the expected evolution of hardware and software tools in the near future.
Keywords
Feature Engineering Hadoop MapReduce Data Lake Horizontal Scaling Fraudulent TransactionReferences
- 1.Amatriain, X., Basilico, J.: Netflix Recommendations: Beyond the 5 stars. Netflix Techblog. (6 April 6 2012)Google Scholar
- 2.Amin, R., Arefin, T.: The empirical study on the factors affecting datawarehousing success. Int. J. Latest Trends Comput. 1(2), 138–142 (Dec 2010)Google Scholar
- 3.Anderson, M., Antenucci, D., Bittorf, V., Burgess, M., Cafarella, M. J., Kumar, A., Niu, F., Park, Y., Ré, C. & Zhang, C.: Brainwash: A Data System for Feature Engineering. CIDR’13 (2013)Google Scholar
- 4.Chapus, B., Fogelman Soulié, F., Marcadé, E., Sauvage, J.: Mining on social networks. In: Gettler Summa, M., Bottou, L., Goldfarb, B., F. Murtagh (eds.) Statistical Learning and Data Science, Computer Science and Data Analysis Series. CRC Press, Chapman & Hall (2011)Google Scholar
- 5.Conway, D.: The Data Science Venn Diagram (2013). Blog. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
- 6.Davenport, T.H., Patil, D.J.: Data Scientist: The Sexiest Job of the 21st Century. Harvard Bus. Rev. 70–76 (Oct 2012)Google Scholar
- 7.Davenport, T.H.: Competing on analytics. Harvard Bus. Rev. 84, 98–107 (2006)Google Scholar
- 8.Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)CrossRefGoogle Scholar
- 9.Driscoll, M.: Building data startups: Fast, big, and focused. Low costs and cloud tools are empowering new data startups. O’Reilly Radar (August 9, 2011)Google Scholar
- 10.Eckerson, W.W.: Predictive Analytics. Extending the Value of Your Data Warehousing Investment. TDWI Best Practices. Report. Q1, 2007 (2007)Google Scholar
- 11.Fogelman-Soulié, F., Mekki, A., Sean, S., & Stepniewski, P.: Utilisation des réseaux sociaux dans la lutte contre la fraude à la carte bancaire sur Internet. In: Bennani, Y., Viennet, E. (eds.) Apprentissage Artificiel & Fouille de Données. Revue des Nouvelles Technologies de l’Information, RNTI-A-6. Hermann, pp. 99–119 (2012) (in French)Google Scholar
- 12.Fogelman Soulié, F., Marcadé, E.: Industrial Mining of Massive Data Sets. Mining massive Data Sets for Security. In: Fogelman-Soulié, F., Perrotta, D., Pikorski, J., Steinberger, R. (eds.) Advances in data mining, search, social networks and text mining and their applications to security, pp. 44-61. IOS Press. NATO ASI Series (2008)Google Scholar
- 13.Gantz, J.F.: The Expanding Digital Universe. IDC White Paper (March 2007)Google Scholar
- 14.Groupement des Cartes Bancaires CB: Activity, Report (2013)Google Scholar
- 15.Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)CrossRefGoogle Scholar
- 16.Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning (vol. 2, no. 1). Springer, New York (2009)Google Scholar
- 17.Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. (TOIS) 22(1), 5–53 (2004)CrossRefGoogle Scholar
- 18.Herschel, G., Linden, A., Kart, L.: Magic Quadrant for Advanced Analytics Platforms. Gartner Report G00270612 (2015)Google Scholar
- 19.Heudecker, N., White, A.: The Data Lake Fallacy: All Water and Little Substance. Gartner Report G00264950 (2014)Google Scholar
- 20.Hilbert, M., López, P.: The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011)CrossRefGoogle Scholar
- 21.Leinweber, D.J.: Stupid data miner tricks: overfitting the S & P 500. J. Investing 16(1), 15–22 (2007)CrossRefGoogle Scholar
- 22.Lam, C.: Hadoop in action. Manning Publications Co (2010)Google Scholar
- 23.Laney, D.: Big Data’s 10 Biggest Vision and Strategy Questions. Gartner Blog (2015)Google Scholar
- 24.Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety. Application Delivery Strategies, Meta Group (2001)Google Scholar
- 25.Machlis, S.: Chart and image gallery: 30+ free tools for data visualization and analysis. Computerworld (2013)Google Scholar
- 26.Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Hung Byers, A.: Big data: The next frontier for innovation, competition, and productivity. Report, McKinsey Global Institute (2011)Google Scholar
- 27.Olson, M.: Hadoop: Scalable, flexible data storage and analysis. IQT Quart 1(3), 14–18. (Spring 2010)Google Scholar
- 28.Piatetsky, G.: KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll. KDnuggets (2014)Google Scholar
- 29.Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)Google Scholar
- 30.Rexer, K.: 2013 Data Miner Survey. Rexer Analytics (2013)Google Scholar
- 31.Stein, B., Morrison, A.: The enterprise data lake: Better integration and deeper analytics. PwC Technology Forecast: Rethinking integration. Issue 1 (2014)Google Scholar
- 32.Turck, M.: The state of big data in 2014 (chart). VB News (2014)Google Scholar
- 33.Vapnik, V.: Estimation of dependences based on empirical data. Springer. Information sciences and Statistics. Reprint of 1982 Edition with afterword (2006)Google Scholar
- 34.Vasanth, R.: The Rise Of Big Data Industry: A Market Worth 53.4 Billion By 2017 ! Dazeinfo (2014)Google Scholar
- 35.Zhou, T., Ren, J., Medo, M., Zhang, Y.-C.: Bipartite network projection and personal recommendation. Phys. Rev. E 76(4), 046115 (2007)CrossRefGoogle Scholar