Big Data Analytics on Modern Hardware Architectures: A Technology Survey

Saecker, Michael; Markl, Volker

doi:10.1007/978-3-642-36318-4_6

Michael Saecker⁸ &
Volker Markl⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 138))

Included in the following conference series:

European Big Data Management and Analytics Summer School

8547 Accesses
21 Citations
3 Altmetric

Abstract

Big Data Analytics has the goal to analyze massive datasets, which increasingly occur in web-scale business intelligence problems. The common strategy to handle these workloads is to distribute the processing utilizing massive parallel analysis systems or to use big machines able to handle the workload. We discuss massively parallel analysis systems and their programming models. Furthermore, we discuss the application of modern hardware architectures for database processing. Today, many different hardware architectures apart from traditional CPUs can be used to process data. GPUs or FPGAs, among other new hardware, are usually employed as co-processors to accelerate query execution. The common point of these architectures is their massive inherent parallelism as well as a different programming model compared to the classical von Neumann CPUs. Such hardware architectures offer the processing capability to distribute the workload among the CPU and other processors, and enable systems to process bigger workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

CERN: Worldwide LHC Computing Grid (December 2011), http://public.web.cern.ch/public/en/LHC/Computing-en.html
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Hung Byers, A.: Big Data: The Next Frontier for Innovation, Competition, and Productivity (June 2011), http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation
Liu, X., Thomsen, C., Bach Pedersen, T.: The ETLMR MapReduce-Based ETL Framework. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 586–588. Springer, Heidelberg (2011)
Chapter Google Scholar
Alexandrov, A., Ewen, S., Heimel, M., Hueske, F., Kao, O., Markl, V., Nijkamp, E., Warneke, D.: MapReduce and PACT - Comparing Data Parallel Programming Models. In: Proceedings of the 14th Conference on Database Systems for Business, Technology, and Web, BTW 2011, pp. 25–44. GI, Bonn (2011)
Google Scholar
Gillick, D., Faria, A., Denero, J.: MapReduce: Distributed Computing for Machine Learning (2006)
Google Scholar
Ghoting, A., Krishnamurthy, R., Pednault, E., Reinwald, B., Sindhwani, V., Tatikonda, S., Tian, Y., Vaithyanathan, S.: SystemML: Declarative Machine Learning on MapReduce. In: Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011, pp. 231–242. IEEE Computer Society, Washington, DC (2011)
Google Scholar
Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: Mining Peta-scale Graphs. Knowl. Inf. Syst. 27(2), 303–325 (2011)
Article Google Scholar
Cohen, J.: Graph Twiddling in a MapReduce World. Computing in Science Engineering 11(4), 29–41 (2009)
Article Google Scholar
Zhao, W., Ma, H., He, Q.: Parallel K-Means Clustering Based on MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)
Chapter Google Scholar
The Apache Software Foundation: Applications powered by Hadoop (December 2011), http://wiki.apache.org/hadoop/PoweredBy
Facebook: Hadoop (December 2011), http://www.facebook.com/note.php?note_id=16121578919
Office of Electricity Delivery & Energy Reliability, U.S. Department of Energy: Smart Grid (December 2011), http://energy.gov/oe/technology-development/smart-grid
Henschen, D.: 12 Top Big Data Analytics Players (December 2011), http://www.informationweek.com/news/galleries/software/bi/231900870
Amdahl, G.M.: Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities. In: Proceedings of the April 18-20, 1967, Spring Joint Computer Conference, AFIPS 1967 (Spring), pp. 483–485. ACM, New York (1967)
Google Scholar
Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. IEEE Transactions on Networking 11 (February 2003)
Google Scholar
Computer History Museum: 1965 - “Moore’s Law” Predicts the Future of Integrated Circuits (December 2011), http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html
Intel Corporation: White Paper: Intel Next Generation Intel Microarchitecture (Nehalem) (2008), http://www.intel.com/pressroom/archive/reference/whitepaper_nehalem.pdf
Intel Corporation: Intel Xeon Processor 7500 Series: Product Brief (December 2011), http://www.intel.com/content/www/de/de/mission-critical/mission-critical-computing-xeon-7500-brief.html
Intel Corporation: Intel Xeon Processor E7-8870 Specification (December 2011), http://ark.intel.com/products/53580/Intel-Xeon-Processor-E7-8870-%2830M-Cache-2_40-GHz-6_40-GTs-Intel-QPI%29
Advanced Micro Devices, Inc.: AMD Opteron 6282 SE Specification (December 2011), http://products.amd.com/en-us/OpteronCPUDetail.aspx?id=756
Codd, E.F.: A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 377–387 (1970)
Article Google Scholar
Astrahan, M.M., Blasgen, M.W., Chamberlin, D.D., Eswaran, K.P., Gray, J.N., Griffiths, P.P., King, W.F., Lorie, R.A., McJones, P.R., Mehl, J.W., Putzolu, G.R., Traiger, I.L., Wade, B.W., Watson, V.: System R: Relational Approach to Database Management. ACM Trans. Database Syst. 1, 97–137 (1976)
Article Google Scholar
Held, G.D., Stonebraker, M.R., Wong, E.: INGRES: A Relational Data Base System. In: Proceedings of the May 19-22, 1975, National Computer Conference and Exposition, AFIPS 1975, pp. 409–416. ACM, New York (1975)
Google Scholar
Copeland, G.P., Khoshafian, S.N.: A Decomposition Storage Model. In: Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data, SIGMOD 1985, pp. 268–279. ACM, New York (1985)
Chapter Google Scholar
Boncz, P.A., Kersten, M.L., Manegold, S.: Breaking the Memory Wall in MonetDB. Communications of the ACM 51(12), 77–85 (2008)
Article Google Scholar
Boncz, P.A., Manegold, S., Kersten, M.L.: Database Architecture Optimized for the New Bottleneck: Memory Access. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 54–65. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a Modern Processor: Where Does Time Go? In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 266–277. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Plattner, H., Zeier, A.: In-Memory Data Management: An Inflection Point for Enterprise Applications. Springer (2011)
Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-Store: A Column-oriented DBMS. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P.K., Ooi, B.C. (eds.) VLDB, pp. 553–564. ACM (2005)
Google Scholar
Vertica: Vertica (December 2011), http://www.vertica.com/
Actian Corporation: Vectorwise (December 2011), http://www.actian.com/products/vectorwise
Fushimi, S., Kitsuregawa, M., Tanaka, H.: An Overview of the System Software of a Parallel Relational Database Machine GRACE. In: Proceedings of the 12th International Conference on Very Large Data Bases, VLDB 1986, pp. 209–219. Morgan Kaufmann Publishers Inc., San Francisco (1986)
Google Scholar
DeWitt, D.J., Gerber, R.H., Graefe, G., Heytens, M.L., Kumar, K.B., Muralikrishna, M.: GAMMA - A High Performance Dataflow Database Machine. In: Proceedings of the 12th International Conference on Very Large Data Bases, VLDB 1986, pp. 228–237. Morgan Kaufmann Publishers Inc., San Francisco (1986)
Google Scholar
Teradata Corporation: Teradata (December 2011), http://www.teradata.com/
EMC Corporation: Greenplum (December 2011), http://www.greenplum.com
Teradata Corporation: Aster Data (December 2011), http://www.asterdata.com/
Friedman, E., Pawlowski, P., Cieslewicz, J.: SQL/MapReduce: A Practical Approach to Self-describing, Polymorphic, and Parallelizable User-defined Functions. Proc. VLDB Endow. 2, 1402–1413 (2009)
Article Google Scholar
empulse GmbH: ParStream (December 2011), http://www.parstream.com
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google File System. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP 2003, pp. 29–43. ACM, New York (2003)
Chapter Google Scholar
The Apache Software Foundation: Welcome to Hadoop! (December 2011), http://hadoop.apache.org
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A Not-So-Foreign Language for Data Processing. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)
Chapter Google Scholar
The Apache Software Foundation: Welcome to Apache Pig! (December 2011), http://pig.apache.org
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: A Warehousing Solution Over a Map-Reduce Framework. Proc. VLDB Endow. 2, 1626–1629 (2009)
Article Google Scholar
The Apache Software Foundation: Welcome to Hive! (December 2011), http://hive.apache.org
Jaql - Query Language for JavaScript Object Notation (JSON) (December 2011), http://code.google.com/p/jaql/
Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M., Kanne, C.C., Ozcan, F., Shekita, E.J.: Jaql: A Scripting Language for Large Scale Semistructured Data Analysis. In: PVLDB 2011, pp. 1272–1283 (2011)
Google Scholar
IBM: InfoSphere BigInsights (December 2011), http://www-01.ibm.com/software/data/infosphere/biginsights/features.html
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In: EuroSys 2007: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pp. 59–72. ACM, New York (2007)
Chapter Google Scholar
Microsoft Corporation: The Windows HPC Team Blog (November 2011), http://blogs.technet.com/b/windowshpc/archive/2011/11/11/hpc-pack-2008-r2-sp3-and-windows-azure-hpc-scheduler-released.aspx
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P.K., Currey, J.: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In: Draves, R., van Renesse, R. (eds.) OSDI, pp. 1–14. USENIX Association (2008)
Google Scholar
Microsoft Research: The LINQ project (December 2011), http://msdn.microsoft.com/en-us/library/bb397926.aspx
Chaiken, R., Jenkins, B., Larson, P.A., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proc. VLDB Endow. 1, 1265–1276 (2008)
Article Google Scholar
Borkar, V.R., Carey, M.J., Grover, R., Onose, N., Vernica, R.: Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. In: ICDE, pp. 1151–1162 (2011)
Google Scholar
Behm, A., Borkar, V.R., Carey, M.J., Grover, R., Li, C., Onose, N., Vernica, R., Deutsch, A., Papakonstantinou, Y., Tsotras, V.J.: ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-world Models. Distrib. Parallel Databases 29, 185–216 (2011)
Article Google Scholar
XQuery 1.0: An XML Query Language (December 2011), http://www.w3.org/TR/xquery/
Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, pp. 119–130. ACM, New York (2010)
Google Scholar
DeWitt, D.J.: DIRECT - A Multiprocessor Organization for Supporting Relational Data Base Management Systems. In: Proceedings of the 5th Annual Symposium on Computer Architecture, ISCA 1978, pp. 182–189. ACM, New York (1978)
Google Scholar
Downes-Powell, G.: What is a PROM Chip? (December 2011), http://www.ehow.com/info_10005464_prom-chip.html
FPGA Central: History of the Programmable Logic (December 2011), http://www.fpgacentral.com/docs/fpga-tutorial/history-programmable-logic
Brown, S., Rose, J.: Architecture of FPGAs and CPLDs: A Tutorial. IEEE Design and Test of Computers 13, 42–57 (1996)
Article Google Scholar
EngineersGarage: Field Programmable Gate Array (FPGA) (December 2011), http://www.engineersgarage.com/articles/fpga-tutorial-basics
Mueller, R., Teubner, J., Alonso, G.: Data Processing on FPGAs. Proc. VLDB Endow. 2, 910–921 (2009)
Article Google Scholar
Mueller, R., Teubner, J., Alonso, G.: Sorting Networks on FPGAs. The VLDB Journal, 1–23, doi:10.1007/s00778-011-0232-z
Google Scholar
Mitra, A., Vieira, M.R., Bakalov, P., Tsotras, V.J., Najjar, W.A.: Boosting XML Filtering Through a Scalable FPGA-based Architecture. In: CIDR (2009)
Google Scholar
Greaves, D., Singh, S.: Kiwi: Synthesis of FPGA Circuits from Parallel Programs. In: 16th International Symposium on Field-Programmable Custom Computing Machines, FCCM 2008, pp. 3–12 (April 2008)
Google Scholar
Netezza (December 2011), http://www.netezza.com/data-warehouse-appliance-products/index.aspx
Kickfire (December 2011), http://www.kickfire.com/
Scofield, T., Delmerico, J., Chaudhary, V., Valente, G.: XtremeData dbX: An FPGA-Based Data Warehouse Appliance. Computing in Science Engineering 12(4), 66–73 (2010)
Article Google Scholar
GPGPU.org: About GPGPU (December 2011), http://gpgpu.org/about
NVIDIA: CUDA: Parallel Programming Made Easy (December 2011), http://www.nvidia.com/object/cuda_home_new.html
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: Stream Computing on Graphics Hardware. ACM Transactions on Graphics 23, 777–786 (2004)
Article Google Scholar
Khronos Group: The Khronos Group Releases OpenCL 1.0 Specification (December 2011), http://www.khronos.org/news/press/the_khronos_group_releases_opencl_1.0_specification
Microsoft: DirectX 11 DirectCompute: A Teraflop for Everyone (December 2011), http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=16995
Top500.org: Top 500 Supercomputers (November 2011), http://www.top500.org/list/2011/11/100
PCI-SIG: PCIe 3.0 FAQ (December 2011), http://www.pcisig.com/news_room/faqs/pcie3.0_faq/#EQ2
NVIDIA: NVIDIAs Next Generation CUDA Compute Architecture: Fermi (December 2011), http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
NVIDIA: NVIDIA Tesla C2075 (December 2011), http://www.nvidia.com/docs/IO/43395/NV-DS-Tesla-C2075.pdf
NVIDIA: NVIDIA CUDA C Programming Guide (2011)
Google Scholar
Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD 2006, pp. 325–336. ACM, New York (2006)
Chapter Google Scholar
Leischner, N., Osipov, V., Sanders, P.: GPU Sample Sort. In: 24th IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–10 (2010)
Google Scholar
Satish, N., Kim, C., Chhugani, J., Nguyen, A.D., Lee, V.W., Kim, D., Dubey, P.: Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 351–362. ACM, New York (2010)
Google Scholar
Merrill, D.G., Grimshaw, A.S.: Revisiting Sorting for GPGPU Stream Architectures. Technical Report CS2010-03, University of Virginia, Department of Computer Science, Charlottesville, VA (2010)
Google Scholar
Satish, N., Harris, M., Garland, M.: Designing Efficient Sorting Algorithms for Manycore GPUs. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–10. IEEE Computer Society, Washington, DC (2009)
Google Scholar
Wassenberg, J., Sanders, P.: Faster Radix Sort via Virtual Memory and Write-Combining. CoRR abs/1008.2849 (2010)
Google Scholar
He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N., Luo, Q., Sander, P.: Relational Query Coprocessing on Graphics Processors. ACM Transactions on Database Systems (TODS) 34(4), 21 (2009)
Article Google Scholar
He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational Joins on Graphics Processors. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 511–524. ACM (2008)
Google Scholar
Rao, J., Ross, K.A.: Cache Conscious Indexing for Decision-Support in Main Memory. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 78–89. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Kim, C., Chhugani, J., Satish, N., Sedlar, E., Nguyen, A.D., Kaldewey, T., Lee, V.W., Brandt, S.A., Dubey, P.: FAST: Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 339–350. ACM, New York (2010)
Google Scholar
Volk, P.B., Habich, D., Lehner, W.: GPU-Based Speculative Query Processing for Database Operations. In: First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (September 2010)
Google Scholar
Advanced Micro Devices, Inc.: AMD Demonstrates World’s First Fusion APU at Computex 2010 (December 2011), http://www.amd.com/us/press-releases/Pages/amd-demonstrates-2010june02.aspx
Intel Corporation: Intel Atom Embedded Processors (December 2011), http://www.intel.com/content/www/us/en/processors/atom/atom-processor.html
Advanced Micro Devices, Inc.: AMD Fusion Family of APUs: Enabling a Superior, Immersive PC Experience (December 2011), http://www.amd.com/us/Documents/48423_fusion_whitepaper_WEB.pdf
Feldman, M.: First HPC Cluster with AMD Fusion Chips Debuts at Sandia (December 2011), http://www.hpcwire.com/hpcwire/2011-11-02/first_hpc_cluster_with_amd_fusion_chips_debuts_at_sandia.html
Advanced Micro Devices, Inc.: Fusion for Servers (December 2011), http://blogs.amd.com/work/2010/06/10/fusion-for-servers/
Intel Corporation: Intel Many Integrated Core Architecture (December 2011), http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html
Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P.: Larrabee: A Many-core x86 Architecture for Visual Computing. In: ACM SIGGRAPH 2008 Papers, SIGGRAPH 2008, pp. 18:1–18:15. ACM, New York (2008)
Google Scholar
Intel Corporation: Teraflops Research Chip (December 2011), http://techresearch.intel.com/ProjectDetails.aspx?Id=151
Intel Corporation: Single-Chip Cloud Computer (December 2011), http://techresearch.intel.com/ProjectDetails.aspx?Id=1
Intel Corporation: The SCC Platform Overview (December 2011), http://techresearch.intel.com/spaw2/uploads/files/SCC_Platform_Overview.pdf
Shilov, A.: Intel Shows Off ”Knights Corner” MIC Compute Accelerator (December 2011), http://www.xbitlabs.com/news/cpu/display/20111115163857_Intel_Shows_Off_Knights_Corner_MIC_Compute_Accelerator.html
Fang, W., He, B., Luo, Q., Govindaraju, N.K.: Mars: Accelerating MapReduce with Graphics Processors. IEEE Transactions on Parallel and Distributed Systems 22, 608–620 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Berlin, Berlin, Germany
Michael Saecker & Volker Markl

Authors

Michael Saecker
View author publications
You can also search for this author in PubMed Google Scholar
Volker Markl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MAS Laboratory, Ecole Centrale Paris, Châtenay-Malabry, France
Marie-Aude Aufaure
Department of Computer and Decision Engineering (CoDE), Université Libre de Bruxelles, Brussels, Belgium
Esteban Zimányi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Saecker, M., Markl, V. (2013). Big Data Analytics on Modern Hardware Architectures: A Technology Survey. In: Aufaure, MA., Zimányi, E. (eds) Business Intelligence. eBISS 2012. Lecture Notes in Business Information Processing, vol 138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36318-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-36318-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36317-7
Online ISBN: 978-3-642-36318-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics