Towards an Integrated Platform for Big Data Analysis

Bohlouli, Mahdi; Schulz, Frank; Angelis, Lefteris; Pahor, David; Brandic, Ivona; Atlan, David; Tate, Rosemary

doi:10.1007/978-3-642-34471-8_4

Mahdi Bohlouli²,
Frank Schulz³,
Lefteris Angelis⁴,
David Pahor⁵,
Ivona Brandic⁶,
David Atlan⁷ &
…
Rosemary Tate⁸

2610 Accesses
15 Citations

Abstract

The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users’ activities. These large data sets have been labeled as “Big Data”, and their storage, processing and analysis presents a plethora of new challenges to computer science researchers and IT professionals. In addition to efficient data management, additional complexity arises from dealing with semi-structured or unstructured data, and from time critical processing requirements. In order to understand these massive amounts of data, advanced visualization and data exploration techniques are required.

Innovative approaches to these challenges have been developed during recent years, and continue to be a hot topic for research and industry in the future. An investigation of current approaches reveals that usually only one or two aspects are addressed, either in the data management, processing, analysis or visualization. This paper presents the vision of an integrated platform for big data analysis that combines all these aspects. Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, a more efficient usage of system resources, and an improved usability during the end-to-end data analysis process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 149.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alexandrov, A., Ewen, S., Heimel, M., Hueske, F., Kao, O., Markl, V., Nijkamp, E., Warneke, D.: MapReduce and PACT - Comparing Data Parallel Programming Models. In: Proceedings of the 14th Conference on Database Systems for Business, Technology, and Web (BTW), pp. 25–44 (2011)
Google Scholar
Agrawal, D., Das, S., El Abbadi, A.: Big Data and Cloud Computing: Current State and Future Opportunities. In: 14th International Conference on Extending Database Technology, EDBT (2011)
Google Scholar
Apache Cassandra, http://cassandra.apache.org/
Apache Mahout, http://mahout.apache.org/
Banker, K.: MongoDB in Action. Manning Publications Co. (2012)
Google Scholar
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. In: Studies in Classification, Data Analysis, and Knowledge Organization, GfKL (2007)
Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallahch, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A Distributed Storage System for Structured Data. In: Seventh Symposium on Operating System Design and Implementation, OSDL (2006)
Google Scholar
Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-Reduce for Machine Learning on Multicore. In: Twentieth Annual Conference on Neural Information Processing Systems (NIPS), pp. 281–288 (2006)
Google Scholar
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce Online. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI), p. 21 (2010)
Google Scholar
Czajkowski, G., Dvorsky, M., Zhao, J., Conley, M.: Sorting Petabytes with MapReduce (September 2011)
Google Scholar
Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: Integrating R and Hadoop. In: SIGMOD, pp. 987–998 (2010)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce – Simplified data processing on large clusters. In: Proceedings of the Sixth Symposium on Operating System Design and Implementation (2004); Journal Version: Communications of the ACM 51(1), 107–113 (2008)
Google Scholar
Gartner Research: Hype Cycle for Emerging Technologies (July 2011), http://www.gartner.com/it/page.jsp?id=1763814
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google File System. ACM SIGOPS Operating Systems Review 37(5), 29–43 (2003)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Google Scholar
Hameurlain, A., Küng, J., Wagner, R., Böhm, C., Eder, J., Plant, C. (eds.): Transactions on Large-Scale Data- and Knowledge-Centered Systems IV. LNCS, vol. 6990. Springer, Heidelberg (2011)
Google Scholar
IBM: Bringing big data to the enterprise, http://www-01.ibm.com/software/data/bigdata/
Kelly, J.: Big Data Market Size and Vendor Revenues, Wikibon Report (March 2012), http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: The next frontier for innovation, competition, and productivity. McKinsey Report (May 2011)
Google Scholar
Miller, G.A.: The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. The Psychological Review 63, 81–97 (1956)
Article Google Scholar
Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: Distributed Stream Computing Platform. In: The 10th IEEE International Conference on Data Mining (ICDM) Workshops, pp. 170–177 (2010)
Google Scholar
O’Reilly Media: Big Data Now (September 2011)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD, pp. 1099–1110 (2008)
Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-Scale Data Analysis. In: SIGMOD, pp. 165–178 (2009)
Google Scholar
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2012) ISBN 3-900051-07-0
Google Scholar
Russom, P.: Big Data Analytics. TDWI Report (Q4 2011)
Google Scholar
Stratosphere Research Initiative, http://www.stratosphere.eu/
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive – a warehousing solution over a Map-Reduce framework. PVLDB 2(2), 1626–1629 (2009)
Google Scholar
Warden, P.: Big Data Glossary. O’Reilly Media Publications, USA (2011)
Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media Publications, USA (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Knowledge Based Systems & Knowledge Management, University of Siegen, Hölderlinstr. 3, 57068, Siegen, Germany
Mahdi Bohlouli
SAP Research, Karlsruhe, Germany
Frank Schulz
Aristotle University, Thessaloniki, Greece
Lefteris Angelis
Arctur d.o.o, Nova Gorica, Slovenia
David Pahor
Technical University of Vienna, Vienna, Austria
Ivona Brandic
Phenosystems SA, Bruxelles, Belgium
David Atlan
University of Sussex, Brighton, United Kingdom
Rosemary Tate

Authors

Mahdi Bohlouli
View author publications
You can also search for this author in PubMed Google Scholar
Frank Schulz
View author publications
You can also search for this author in PubMed Google Scholar
Lefteris Angelis
View author publications
You can also search for this author in PubMed Google Scholar
David Pahor
View author publications
You can also search for this author in PubMed Google Scholar
Ivona Brandic
View author publications
You can also search for this author in PubMed Google Scholar
David Atlan
View author publications
You can also search for this author in PubMed Google Scholar
Rosemary Tate
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahdi Bohlouli .

Editor information

Editors and Affiliations

Inst. for Kno. Based Sys. & Kno. Mana., Research Center for Knowledge Managemnet, University of Siegen, Hölderlinstrasse 3, Siegen, 57068, Germany
Madjid Fathi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bohlouli, M. et al. (2013). Towards an Integrated Platform for Big Data Analysis. In: Fathi, M. (eds) Integration of Practice-Oriented Knowledge Technology: Trends and Prospectives. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34471-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-34471-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34470-1
Online ISBN: 978-3-642-34471-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics