Skip to main content

Implementing Big Data Analytics Projects in Business

  • Chapter
  • First Online:
  • 4230 Accesses

Part of the book series: Studies in Big Data ((SBD,volume 16))

Abstract

Big Data analytics present both opportunities and challenges for companies. It is important that, before embarking on a Big Data project, companies understand the value offered by Big Data and the processes needed to extract it. This chapter discusses why companies should progressively increase their data volumes and the process to follow for implementing a Big Data project. We present a variety of architectures, from in-memory servers to Hadoop, to handle Big Data. We introduce the concept of Data Lake and discuss its benefits for companies and the research still required to fully deploy it. We illustrate some of the points discussed in the chapter through the presentation of various architectures available for running Big Data initiatives, and discuss the expected evolution of hardware and software tools in the near future.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    For example: https://www.data.gouv.fr/en/, http://open-data.europa.eu/en/data/, http://publicdata.eu/, http://www.data.go.jp/, http://dataportals.org/.

  2. 2.

    http://www.netflixprize.com/.

  3. 3.

    http://www.bull.com/download/bullion/B-bullion-2014-enWeb.pdf.

  4. 4.

    https://datafloq.com/big-data-open-source-tools/os-home/.

  5. 5.

    http://www.datarobot.com/, http://www.dataiku.com/, https://www.palantir.com/.

  6. 6.

    https://spark.apache.org/mllib/ is Apache Spark’s machine learning library; http://scikit-learn.org/ is a machine learning library in Python.

  7. 7.

    https://github.com/apache/spark; https://github.com/scikit-learn/scikit-learn.

  8. 8.

    http://mahout.apache.org/.

References

  1. Amatriain, X., Basilico, J.: Netflix Recommendations: Beyond the 5 stars. Netflix Techblog. (6 April 6 2012)

    Google Scholar 

  2. Amin, R., Arefin, T.: The empirical study on the factors affecting datawarehousing success. Int. J. Latest Trends Comput. 1(2), 138–142 (Dec 2010)

    Google Scholar 

  3. Anderson, M., Antenucci, D., Bittorf, V., Burgess, M., Cafarella, M. J., Kumar, A., Niu, F., Park, Y., Ré, C. & Zhang, C.: Brainwash: A Data System for Feature Engineering. CIDR’13 (2013)

    Google Scholar 

  4. Chapus, B., Fogelman Soulié, F., Marcadé, E., Sauvage, J.: Mining on social networks. In: Gettler Summa, M., Bottou, L., Goldfarb, B., F. Murtagh (eds.) Statistical Learning and Data Science, Computer Science and Data Analysis Series. CRC Press, Chapman & Hall (2011)

    Google Scholar 

  5. Conway, D.: The Data Science Venn Diagram (2013). Blog. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

  6. Davenport, T.H., Patil, D.J.: Data Scientist: The Sexiest Job of the 21st Century. Harvard Bus. Rev. 70–76 (Oct 2012)

    Google Scholar 

  7. Davenport, T.H.: Competing on analytics. Harvard Bus. Rev. 84, 98–107 (2006)

    Google Scholar 

  8. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)

    Article  Google Scholar 

  9. Driscoll, M.: Building data startups: Fast, big, and focused. Low costs and cloud tools are empowering new data startups. O’Reilly Radar (August 9, 2011)

    Google Scholar 

  10. Eckerson, W.W.: Predictive Analytics. Extending the Value of Your Data Warehousing Investment. TDWI Best Practices. Report. Q1, 2007 (2007)

    Google Scholar 

  11. Fogelman-Soulié, F., Mekki, A., Sean, S., & Stepniewski, P.: Utilisation des réseaux sociaux dans la lutte contre la fraude à la carte bancaire sur Internet. In: Bennani, Y., Viennet, E. (eds.) Apprentissage Artificiel & Fouille de Données. Revue des Nouvelles Technologies de l’Information, RNTI-A-6. Hermann, pp. 99–119 (2012) (in French)

    Google Scholar 

  12. Fogelman Soulié, F., Marcadé, E.: Industrial Mining of Massive Data Sets. Mining massive Data Sets for Security. In: Fogelman-Soulié, F., Perrotta, D., Pikorski, J., Steinberger, R. (eds.) Advances in data mining, search, social networks and text mining and their applications to security, pp. 44-61. IOS Press. NATO ASI Series (2008)

    Google Scholar 

  13. Gantz, J.F.: The Expanding Digital Universe. IDC White Paper (March 2007)

    Google Scholar 

  14. Groupement des Cartes Bancaires CB: Activity, Report (2013)

    Google Scholar 

  15. Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)

    Article  Google Scholar 

  16. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning (vol. 2, no. 1). Springer, New York (2009)

    Google Scholar 

  17. Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. (TOIS) 22(1), 5–53 (2004)

    Article  Google Scholar 

  18. Herschel, G., Linden, A., Kart, L.: Magic Quadrant for Advanced Analytics Platforms. Gartner Report G00270612 (2015)

    Google Scholar 

  19. Heudecker, N., White, A.: The Data Lake Fallacy: All Water and Little Substance. Gartner Report G00264950 (2014)

    Google Scholar 

  20. Hilbert, M., López, P.: The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011)

    Article  Google Scholar 

  21. Leinweber, D.J.: Stupid data miner tricks: overfitting the S & P 500. J. Investing 16(1), 15–22 (2007)

    Article  Google Scholar 

  22. Lam, C.: Hadoop in action. Manning Publications Co (2010)

    Google Scholar 

  23. Laney, D.: Big Data’s 10 Biggest Vision and Strategy Questions. Gartner Blog (2015)

    Google Scholar 

  24. Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety. Application Delivery Strategies, Meta Group (2001)

    Google Scholar 

  25. Machlis, S.: Chart and image gallery: 30+ free tools for data visualization and analysis. Computerworld (2013)

    Google Scholar 

  26. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Hung Byers, A.: Big data: The next frontier for innovation, competition, and productivity. Report, McKinsey Global Institute (2011)

    Google Scholar 

  27. Olson, M.: Hadoop: Scalable, flexible data storage and analysis. IQT Quart 1(3), 14–18. (Spring 2010)

    Google Scholar 

  28. Piatetsky, G.: KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll. KDnuggets (2014)

    Google Scholar 

  29. Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)

    Google Scholar 

  30. Rexer, K.: 2013 Data Miner Survey. Rexer Analytics (2013)

    Google Scholar 

  31. Stein, B., Morrison, A.: The enterprise data lake: Better integration and deeper analytics. PwC Technology Forecast: Rethinking integration. Issue 1 (2014)

    Google Scholar 

  32. Turck, M.: The state of big data in 2014 (chart). VB News (2014)

    Google Scholar 

  33. Vapnik, V.: Estimation of dependences based on empirical data. Springer. Information sciences and Statistics. Reprint of 1982 Edition with afterword (2006)

    Google Scholar 

  34. Vasanth, R.: The Rise Of Big Data Industry: A Market Worth 53.4 Billion By 2017 ! Dazeinfo (2014)

    Google Scholar 

  35. Zhou, T., Ren, J., Medo, M., Zhang, Y.-C.: Bipartite network projection and personal recommendation. Phys. Rev. E 76(4), 046115 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Françoise Fogelman-Soulié .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Fogelman-Soulié, F., Lu, W. (2016). Implementing Big Data Analytics Projects in Business. In: Japkowicz, N., Stefanowski, J. (eds) Big Data Analysis: New Algorithms for a New Society. Studies in Big Data, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-26989-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26989-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26987-0

  • Online ISBN: 978-3-319-26989-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics