Implementing Big Data Analytics Projects in Business

Fogelman-Soulié, Françoise; Lu, Wenhuan

doi:10.1007/978-3-319-26989-4_6

Implementing Big Data Analytics Projects in Business

Françoise Fogelman-Soulié⁴ &
Wenhuan Lu⁴

Chapter
First Online: 17 December 2015

4230 Accesses

Part of the book series: Studies in Big Data ((SBD,volume 16))

Abstract

Big Data analytics present both opportunities and challenges for companies. It is important that, before embarking on a Big Data project, companies understand the value offered by Big Data and the processes needed to extract it. This chapter discusses why companies should progressively increase their data volumes and the process to follow for implementing a Big Data project. We present a variety of architectures, from in-memory servers to Hadoop, to handle Big Data. We introduce the concept of Data Lake and discuss its benefits for companies and the research still required to fully deploy it. We illustrate some of the points discussed in the chapter through the presentation of various architectures available for running Big Data initiatives, and discuss the expected evolution of hardware and software tools in the near future.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
For example: https://www.data.gouv.fr/en/, http://open-data.europa.eu/en/data/, http://publicdata.eu/, http://www.data.go.jp/, http://dataportals.org/.
2.
http://www.netflixprize.com/.
3.
http://www.bull.com/download/bullion/B-bullion-2014-enWeb.pdf.
4.
https://datafloq.com/big-data-open-source-tools/os-home/.
5.
http://www.datarobot.com/, http://www.dataiku.com/, https://www.palantir.com/.
6.
https://spark.apache.org/mllib/ is Apache Spark’s machine learning library; http://scikit-learn.org/ is a machine learning library in Python.
7.
https://github.com/apache/spark; https://github.com/scikit-learn/scikit-learn.
8.
http://mahout.apache.org/.

References

Amatriain, X., Basilico, J.: Netflix Recommendations: Beyond the 5 stars. Netflix Techblog. (6 April 6 2012)
Google Scholar
Amin, R., Arefin, T.: The empirical study on the factors affecting datawarehousing success. Int. J. Latest Trends Comput. 1(2), 138–142 (Dec 2010)
Google Scholar
Anderson, M., Antenucci, D., Bittorf, V., Burgess, M., Cafarella, M. J., Kumar, A., Niu, F., Park, Y., Ré, C. & Zhang, C.: Brainwash: A Data System for Feature Engineering. CIDR’13 (2013)
Google Scholar
Chapus, B., Fogelman Soulié, F., Marcadé, E., Sauvage, J.: Mining on social networks. In: Gettler Summa, M., Bottou, L., Goldfarb, B., F. Murtagh (eds.) Statistical Learning and Data Science, Computer Science and Data Analysis Series. CRC Press, Chapman & Hall (2011)
Google Scholar
Conway, D.: The Data Science Venn Diagram (2013). Blog. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Davenport, T.H., Patil, D.J.: Data Scientist: The Sexiest Job of the 21st Century. Harvard Bus. Rev. 70–76 (Oct 2012)
Google Scholar
Davenport, T.H.: Competing on analytics. Harvard Bus. Rev. 84, 98–107 (2006)
Google Scholar
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)
Article Google Scholar
Driscoll, M.: Building data startups: Fast, big, and focused. Low costs and cloud tools are empowering new data startups. O’Reilly Radar (August 9, 2011)
Google Scholar
Eckerson, W.W.: Predictive Analytics. Extending the Value of Your Data Warehousing Investment. TDWI Best Practices. Report. Q1, 2007 (2007)
Google Scholar
Fogelman-Soulié, F., Mekki, A., Sean, S., & Stepniewski, P.: Utilisation des réseaux sociaux dans la lutte contre la fraude à la carte bancaire sur Internet. In: Bennani, Y., Viennet, E. (eds.) Apprentissage Artificiel & Fouille de Données. Revue des Nouvelles Technologies de l’Information, RNTI-A-6. Hermann, pp. 99–119 (2012) (in French)
Google Scholar
Fogelman Soulié, F., Marcadé, E.: Industrial Mining of Massive Data Sets. Mining massive Data Sets for Security. In: Fogelman-Soulié, F., Perrotta, D., Pikorski, J., Steinberger, R. (eds.) Advances in data mining, search, social networks and text mining and their applications to security, pp. 44-61. IOS Press. NATO ASI Series (2008)
Google Scholar
Gantz, J.F.: The Expanding Digital Universe. IDC White Paper (March 2007)
Google Scholar
Groupement des Cartes Bancaires CB: Activity, Report (2013)
Google Scholar
Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning (vol. 2, no. 1). Springer, New York (2009)
Google Scholar
Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. (TOIS) 22(1), 5–53 (2004)
Article Google Scholar
Herschel, G., Linden, A., Kart, L.: Magic Quadrant for Advanced Analytics Platforms. Gartner Report G00270612 (2015)
Google Scholar
Heudecker, N., White, A.: The Data Lake Fallacy: All Water and Little Substance. Gartner Report G00264950 (2014)
Google Scholar
Hilbert, M., López, P.: The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011)
Article Google Scholar
Leinweber, D.J.: Stupid data miner tricks: overfitting the S & P 500. J. Investing 16(1), 15–22 (2007)
Article Google Scholar
Lam, C.: Hadoop in action. Manning Publications Co (2010)
Google Scholar
Laney, D.: Big Data’s 10 Biggest Vision and Strategy Questions. Gartner Blog (2015)
Google Scholar
Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety. Application Delivery Strategies, Meta Group (2001)
Google Scholar
Machlis, S.: Chart and image gallery: 30+ free tools for data visualization and analysis. Computerworld (2013)
Google Scholar
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Hung Byers, A.: Big data: The next frontier for innovation, competition, and productivity. Report, McKinsey Global Institute (2011)
Google Scholar
Olson, M.: Hadoop: Scalable, flexible data storage and analysis. IQT Quart 1(3), 14–18. (Spring 2010)
Google Scholar
Piatetsky, G.: KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll. KDnuggets (2014)
Google Scholar
Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)
Google Scholar
Rexer, K.: 2013 Data Miner Survey. Rexer Analytics (2013)
Google Scholar
Stein, B., Morrison, A.: The enterprise data lake: Better integration and deeper analytics. PwC Technology Forecast: Rethinking integration. Issue 1 (2014)
Google Scholar
Turck, M.: The state of big data in 2014 (chart). VB News (2014)
Google Scholar
Vapnik, V.: Estimation of dependences based on empirical data. Springer. Information sciences and Statistics. Reprint of 1982 Edition with afterword (2006)
Google Scholar
Vasanth, R.: The Rise Of Big Data Industry: A Market Worth 53.4 Billion By 2017 ! Dazeinfo (2014)
Google Scholar
Zhou, T., Ren, J., Medo, M., Zhang, Y.-C.: Bipartite network projection and personal recommendation. Phys. Rev. E 76(4), 046115 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Software, Tianjin University, Beiyangyuan Campus, 135 Ya Guan Road, Jinan District, Tianjin, 300350, China
Françoise Fogelman-Soulié & Wenhuan Lu

Authors

Françoise Fogelman-Soulié
View author publications
You can also search for this author in PubMed Google Scholar
Wenhuan Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Françoise Fogelman-Soulié .

Editor information

Editors and Affiliations

University of Ottawa, Ottawa, Ontario, Canada
Nathalie Japkowicz
Institute of Computing Sciences, Poznań University of Technology, Poznań, Poland
Jerzy Stefanowski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fogelman-Soulié, F., Lu, W. (2016). Implementing Big Data Analytics Projects in Business. In: Japkowicz, N., Stefanowski, J. (eds) Big Data Analysis: New Algorithms for a New Society. Studies in Big Data, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-26989-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-26989-4_6
Published: 17 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26987-0
Online ISBN: 978-3-319-26989-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics