Skip to main content

Abstract

The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users’ activities. These large data sets have been labeled as “Big Data”, and their storage, processing and analysis presents a plethora of new challenges to computer science researchers and IT professionals. In addition to efficient data management, additional complexity arises from dealing with semi-structured or unstructured data, and from time critical processing requirements. In order to understand these massive amounts of data, advanced visualization and data exploration techniques are required.

Innovative approaches to these challenges have been developed during recent years, and continue to be a hot topic for research and industry in the future. An investigation of current approaches reveals that usually only one or two aspects are addressed, either in the data management, processing, analysis or visualization. This paper presents the vision of an integrated platform for big data analysis that combines all these aspects. Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, a more efficient usage of system resources, and an improved usability during the end-to-end data analysis process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 149.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alexandrov, A., Ewen, S., Heimel, M., Hueske, F., Kao, O., Markl, V., Nijkamp, E., Warneke, D.: MapReduce and PACT - Comparing Data Parallel Programming Models. In: Proceedings of the 14th Conference on Database Systems for Business, Technology, and Web (BTW), pp. 25–44 (2011)

    Google Scholar 

  2. Agrawal, D., Das, S., El Abbadi, A.: Big Data and Cloud Computing: Current State and Future Opportunities. In: 14th International Conference on Extending Database Technology, EDBT (2011)

    Google Scholar 

  3. Apache Cassandra, http://cassandra.apache.org/

  4. Apache Mahout, http://mahout.apache.org/

  5. Banker, K.: MongoDB in Action. Manning Publications Co. (2012)

    Google Scholar 

  6. Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. In: Studies in Classification, Data Analysis, and Knowledge Organization, GfKL (2007)

    Google Scholar 

  7. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallahch, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A Distributed Storage System for Structured Data. In: Seventh Symposium on Operating System Design and Implementation, OSDL (2006)

    Google Scholar 

  8. Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-Reduce for Machine Learning on Multicore. In: Twentieth Annual Conference on Neural Information Processing Systems (NIPS), pp. 281–288 (2006)

    Google Scholar 

  9. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce Online. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI), p. 21 (2010)

    Google Scholar 

  10. Czajkowski, G., Dvorsky, M., Zhao, J., Conley, M.: Sorting Petabytes with MapReduce (September 2011)

    Google Scholar 

  11. Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: Integrating R and Hadoop. In: SIGMOD, pp. 987–998 (2010)

    Google Scholar 

  12. Dean, J., Ghemawat, S.: MapReduce – Simplified data processing on large clusters. In: Proceedings of the Sixth Symposium on Operating System Design and Implementation (2004); Journal Version: Communications of the ACM 51(1), 107–113 (2008)

    Google Scholar 

  13. Gartner Research: Hype Cycle for Emerging Technologies (July 2011), http://www.gartner.com/it/page.jsp?id=1763814

  14. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google File System. ACM SIGOPS Operating Systems Review 37(5), 29–43 (2003)

    Article  Google Scholar 

  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

  16. Hameurlain, A., Küng, J., Wagner, R., Böhm, C., Eder, J., Plant, C. (eds.): Transactions on Large-Scale Data- and Knowledge-Centered Systems IV. LNCS, vol. 6990. Springer, Heidelberg (2011)

    Google Scholar 

  17. IBM: Bringing big data to the enterprise, http://www-01.ibm.com/software/data/bigdata/

  18. Kelly, J.: Big Data Market Size and Vendor Revenues, Wikibon Report (March 2012), http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues

  19. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: The next frontier for innovation, competition, and productivity. McKinsey Report (May 2011)

    Google Scholar 

  20. Miller, G.A.: The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. The Psychological Review 63, 81–97 (1956)

    Article  Google Scholar 

  21. Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: Distributed Stream Computing Platform. In: The 10th IEEE International Conference on Data Mining (ICDM) Workshops, pp. 170–177 (2010)

    Google Scholar 

  22. O’Reilly Media: Big Data Now (September 2011)

    Google Scholar 

  23. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD, pp. 1099–1110 (2008)

    Google Scholar 

  24. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-Scale Data Analysis. In: SIGMOD, pp. 165–178 (2009)

    Google Scholar 

  25. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2012) ISBN 3-900051-07-0

    Google Scholar 

  26. Russom, P.: Big Data Analytics. TDWI Report (Q4 2011)

    Google Scholar 

  27. Stratosphere Research Initiative, http://www.stratosphere.eu/

  28. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive – a warehousing solution over a Map-Reduce framework. PVLDB 2(2), 1626–1629 (2009)

    Google Scholar 

  29. Warden, P.: Big Data Glossary. O’Reilly Media Publications, USA (2011)

    Google Scholar 

  30. White, T.: Hadoop: The Definitive Guide. O’Reilly Media Publications, USA (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahdi Bohlouli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bohlouli, M. et al. (2013). Towards an Integrated Platform for Big Data Analysis. In: Fathi, M. (eds) Integration of Practice-Oriented Knowledge Technology: Trends and Prospectives. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34471-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34471-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34470-1

  • Online ISBN: 978-3-642-34471-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics