Abstract
Big Data Analytics has the goal to analyze massive datasets, which increasingly occur in web-scale business intelligence problems. The common strategy to handle these workloads is to distribute the processing utilizing massive parallel analysis systems or to use big machines able to handle the workload. We discuss massively parallel analysis systems and their programming models. Furthermore, we discuss the application of modern hardware architectures for database processing. Today, many different hardware architectures apart from traditional CPUs can be used to process data. GPUs or FPGAs, among other new hardware, are usually employed as co-processors to accelerate query execution. The common point of these architectures is their massive inherent parallelism as well as a different programming model compared to the classical von Neumann CPUs. Such hardware architectures offer the processing capability to distribute the workload among the CPU and other processors, and enable systems to process bigger workloads.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
CERN: Worldwide LHC Computing Grid (December 2011), http://public.web.cern.ch/public/en/LHC/Computing-en.html
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Hung Byers, A.: Big Data: The Next Frontier for Innovation, Competition, and Productivity (June 2011), http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation
Liu, X., Thomsen, C., Bach Pedersen, T.: The ETLMR MapReduce-Based ETL Framework. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 586–588. Springer, Heidelberg (2011)
Alexandrov, A., Ewen, S., Heimel, M., Hueske, F., Kao, O., Markl, V., Nijkamp, E., Warneke, D.: MapReduce and PACT - Comparing Data Parallel Programming Models. In: Proceedings of the 14th Conference on Database Systems for Business, Technology, and Web, BTW 2011, pp. 25–44. GI, Bonn (2011)
Gillick, D., Faria, A., Denero, J.: MapReduce: Distributed Computing for Machine Learning (2006)
Ghoting, A., Krishnamurthy, R., Pednault, E., Reinwald, B., Sindhwani, V., Tatikonda, S., Tian, Y., Vaithyanathan, S.: SystemML: Declarative Machine Learning on MapReduce. In: Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011, pp. 231–242. IEEE Computer Society, Washington, DC (2011)
Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: Mining Peta-scale Graphs. Knowl. Inf. Syst. 27(2), 303–325 (2011)
Cohen, J.: Graph Twiddling in a MapReduce World. Computing in Science Engineering 11(4), 29–41 (2009)
Zhao, W., Ma, H., He, Q.: Parallel K-Means Clustering Based on MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)
The Apache Software Foundation: Applications powered by Hadoop (December 2011), http://wiki.apache.org/hadoop/PoweredBy
Facebook: Hadoop (December 2011), http://www.facebook.com/note.php?note_id=16121578919
Office of Electricity Delivery & Energy Reliability, U.S. Department of Energy: Smart Grid (December 2011), http://energy.gov/oe/technology-development/smart-grid
Henschen, D.: 12 Top Big Data Analytics Players (December 2011), http://www.informationweek.com/news/galleries/software/bi/231900870
Amdahl, G.M.: Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities. In: Proceedings of the April 18-20, 1967, Spring Joint Computer Conference, AFIPS 1967 (Spring), pp. 483–485. ACM, New York (1967)
Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. IEEE Transactions on Networking 11 (February 2003)
Computer History Museum: 1965 - “Moore’s Law” Predicts the Future of Integrated Circuits (December 2011), http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html
Intel Corporation: White Paper: Intel Next Generation Intel Microarchitecture (Nehalem) (2008), http://www.intel.com/pressroom/archive/reference/whitepaper_nehalem.pdf
Intel Corporation: Intel Xeon Processor 7500 Series: Product Brief (December 2011), http://www.intel.com/content/www/de/de/mission-critical/mission-critical-computing-xeon-7500-brief.html
Intel Corporation: Intel Xeon Processor E7-8870 Specification (December 2011), http://ark.intel.com/products/53580/Intel-Xeon-Processor-E7-8870-%2830M-Cache-2_40-GHz-6_40-GTs-Intel-QPI%29
Advanced Micro Devices, Inc.: AMD Opteron 6282 SE Specification (December 2011), http://products.amd.com/en-us/OpteronCPUDetail.aspx?id=756
Codd, E.F.: A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 377–387 (1970)
Astrahan, M.M., Blasgen, M.W., Chamberlin, D.D., Eswaran, K.P., Gray, J.N., Griffiths, P.P., King, W.F., Lorie, R.A., McJones, P.R., Mehl, J.W., Putzolu, G.R., Traiger, I.L., Wade, B.W., Watson, V.: System R: Relational Approach to Database Management. ACM Trans. Database Syst. 1, 97–137 (1976)
Held, G.D., Stonebraker, M.R., Wong, E.: INGRES: A Relational Data Base System. In: Proceedings of the May 19-22, 1975, National Computer Conference and Exposition, AFIPS 1975, pp. 409–416. ACM, New York (1975)
Copeland, G.P., Khoshafian, S.N.: A Decomposition Storage Model. In: Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data, SIGMOD 1985, pp. 268–279. ACM, New York (1985)
Boncz, P.A., Kersten, M.L., Manegold, S.: Breaking the Memory Wall in MonetDB. Communications of the ACM 51(12), 77–85 (2008)
Boncz, P.A., Manegold, S., Kersten, M.L.: Database Architecture Optimized for the New Bottleneck: Memory Access. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 54–65. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a Modern Processor: Where Does Time Go? In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 266–277. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Plattner, H., Zeier, A.: In-Memory Data Management: An Inflection Point for Enterprise Applications. Springer (2011)
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-Store: A Column-oriented DBMS. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P.K., Ooi, B.C. (eds.) VLDB, pp. 553–564. ACM (2005)
Vertica: Vertica (December 2011), http://www.vertica.com/
Actian Corporation: Vectorwise (December 2011), http://www.actian.com/products/vectorwise
Fushimi, S., Kitsuregawa, M., Tanaka, H.: An Overview of the System Software of a Parallel Relational Database Machine GRACE. In: Proceedings of the 12th International Conference on Very Large Data Bases, VLDB 1986, pp. 209–219. Morgan Kaufmann Publishers Inc., San Francisco (1986)
DeWitt, D.J., Gerber, R.H., Graefe, G., Heytens, M.L., Kumar, K.B., Muralikrishna, M.: GAMMA - A High Performance Dataflow Database Machine. In: Proceedings of the 12th International Conference on Very Large Data Bases, VLDB 1986, pp. 228–237. Morgan Kaufmann Publishers Inc., San Francisco (1986)
Teradata Corporation: Teradata (December 2011), http://www.teradata.com/
EMC Corporation: Greenplum (December 2011), http://www.greenplum.com
Teradata Corporation: Aster Data (December 2011), http://www.asterdata.com/
Friedman, E., Pawlowski, P., Cieslewicz, J.: SQL/MapReduce: A Practical Approach to Self-describing, Polymorphic, and Parallelizable User-defined Functions. Proc. VLDB Endow. 2, 1402–1413 (2009)
empulse GmbH: ParStream (December 2011), http://www.parstream.com
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google File System. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP 2003, pp. 29–43. ACM, New York (2003)
The Apache Software Foundation: Welcome to Hadoop! (December 2011), http://hadoop.apache.org
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A Not-So-Foreign Language for Data Processing. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)
The Apache Software Foundation: Welcome to Apache Pig! (December 2011), http://pig.apache.org
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: A Warehousing Solution Over a Map-Reduce Framework. Proc. VLDB Endow. 2, 1626–1629 (2009)
The Apache Software Foundation: Welcome to Hive! (December 2011), http://hive.apache.org
Jaql - Query Language for JavaScript Object Notation (JSON) (December 2011), http://code.google.com/p/jaql/
Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M., Kanne, C.C., Ozcan, F., Shekita, E.J.: Jaql: A Scripting Language for Large Scale Semistructured Data Analysis. In: PVLDB 2011, pp. 1272–1283 (2011)
IBM: InfoSphere BigInsights (December 2011), http://www-01.ibm.com/software/data/infosphere/biginsights/features.html
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In: EuroSys 2007: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pp. 59–72. ACM, New York (2007)
Microsoft Corporation: The Windows HPC Team Blog (November 2011), http://blogs.technet.com/b/windowshpc/archive/2011/11/11/hpc-pack-2008-r2-sp3-and-windows-azure-hpc-scheduler-released.aspx
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P.K., Currey, J.: DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In: Draves, R., van Renesse, R. (eds.) OSDI, pp. 1–14. USENIX Association (2008)
Microsoft Research: The LINQ project (December 2011), http://msdn.microsoft.com/en-us/library/bb397926.aspx
Chaiken, R., Jenkins, B., Larson, P.A., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proc. VLDB Endow. 1, 1265–1276 (2008)
Borkar, V.R., Carey, M.J., Grover, R., Onose, N., Vernica, R.: Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. In: ICDE, pp. 1151–1162 (2011)
Behm, A., Borkar, V.R., Carey, M.J., Grover, R., Li, C., Onose, N., Vernica, R., Deutsch, A., Papakonstantinou, Y., Tsotras, V.J.: ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-world Models. Distrib. Parallel Databases 29, 185–216 (2011)
XQuery 1.0: An XML Query Language (December 2011), http://www.w3.org/TR/xquery/
Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, pp. 119–130. ACM, New York (2010)
DeWitt, D.J.: DIRECT - A Multiprocessor Organization for Supporting Relational Data Base Management Systems. In: Proceedings of the 5th Annual Symposium on Computer Architecture, ISCA 1978, pp. 182–189. ACM, New York (1978)
Downes-Powell, G.: What is a PROM Chip? (December 2011), http://www.ehow.com/info_10005464_prom-chip.html
FPGA Central: History of the Programmable Logic (December 2011), http://www.fpgacentral.com/docs/fpga-tutorial/history-programmable-logic
Brown, S., Rose, J.: Architecture of FPGAs and CPLDs: A Tutorial. IEEE Design and Test of Computers 13, 42–57 (1996)
EngineersGarage: Field Programmable Gate Array (FPGA) (December 2011), http://www.engineersgarage.com/articles/fpga-tutorial-basics
Mueller, R., Teubner, J., Alonso, G.: Data Processing on FPGAs. Proc. VLDB Endow. 2, 910–921 (2009)
Mueller, R., Teubner, J., Alonso, G.: Sorting Networks on FPGAs. The VLDB Journal, 1–23, doi:10.1007/s00778-011-0232-z
Mitra, A., Vieira, M.R., Bakalov, P., Tsotras, V.J., Najjar, W.A.: Boosting XML Filtering Through a Scalable FPGA-based Architecture. In: CIDR (2009)
Greaves, D., Singh, S.: Kiwi: Synthesis of FPGA Circuits from Parallel Programs. In: 16th International Symposium on Field-Programmable Custom Computing Machines, FCCM 2008, pp. 3–12 (April 2008)
Netezza (December 2011), http://www.netezza.com/data-warehouse-appliance-products/index.aspx
Kickfire (December 2011), http://www.kickfire.com/
Scofield, T., Delmerico, J., Chaudhary, V., Valente, G.: XtremeData dbX: An FPGA-Based Data Warehouse Appliance. Computing in Science Engineering 12(4), 66–73 (2010)
GPGPU.org: About GPGPU (December 2011), http://gpgpu.org/about
NVIDIA: CUDA: Parallel Programming Made Easy (December 2011), http://www.nvidia.com/object/cuda_home_new.html
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: Stream Computing on Graphics Hardware. ACM Transactions on Graphics 23, 777–786 (2004)
Khronos Group: The Khronos Group Releases OpenCL 1.0 Specification (December 2011), http://www.khronos.org/news/press/the_khronos_group_releases_opencl_1.0_specification
Microsoft: DirectX 11 DirectCompute: A Teraflop for Everyone (December 2011), http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=16995
Top500.org: Top 500 Supercomputers (November 2011), http://www.top500.org/list/2011/11/100
PCI-SIG: PCIe 3.0 FAQ (December 2011), http://www.pcisig.com/news_room/faqs/pcie3.0_faq/#EQ2
NVIDIA: NVIDIAs Next Generation CUDA Compute Architecture: Fermi (December 2011), http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
NVIDIA: NVIDIA Tesla C2075 (December 2011), http://www.nvidia.com/docs/IO/43395/NV-DS-Tesla-C2075.pdf
NVIDIA: NVIDIA CUDA C Programming Guide (2011)
Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD 2006, pp. 325–336. ACM, New York (2006)
Leischner, N., Osipov, V., Sanders, P.: GPU Sample Sort. In: 24th IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–10 (2010)
Satish, N., Kim, C., Chhugani, J., Nguyen, A.D., Lee, V.W., Kim, D., Dubey, P.: Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 351–362. ACM, New York (2010)
Merrill, D.G., Grimshaw, A.S.: Revisiting Sorting for GPGPU Stream Architectures. Technical Report CS2010-03, University of Virginia, Department of Computer Science, Charlottesville, VA (2010)
Satish, N., Harris, M., Garland, M.: Designing Efficient Sorting Algorithms for Manycore GPUs. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–10. IEEE Computer Society, Washington, DC (2009)
Wassenberg, J., Sanders, P.: Faster Radix Sort via Virtual Memory and Write-Combining. CoRR abs/1008.2849 (2010)
He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N., Luo, Q., Sander, P.: Relational Query Coprocessing on Graphics Processors. ACM Transactions on Database Systems (TODS) 34(4), 21 (2009)
He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational Joins on Graphics Processors. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 511–524. ACM (2008)
Rao, J., Ross, K.A.: Cache Conscious Indexing for Decision-Support in Main Memory. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 78–89. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Kim, C., Chhugani, J., Satish, N., Sedlar, E., Nguyen, A.D., Kaldewey, T., Lee, V.W., Brandt, S.A., Dubey, P.: FAST: Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 339–350. ACM, New York (2010)
Volk, P.B., Habich, D., Lehner, W.: GPU-Based Speculative Query Processing for Database Operations. In: First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (September 2010)
Advanced Micro Devices, Inc.: AMD Demonstrates World’s First Fusion APU at Computex 2010 (December 2011), http://www.amd.com/us/press-releases/Pages/amd-demonstrates-2010june02.aspx
Intel Corporation: Intel Atom Embedded Processors (December 2011), http://www.intel.com/content/www/us/en/processors/atom/atom-processor.html
Advanced Micro Devices, Inc.: AMD Fusion Family of APUs: Enabling a Superior, Immersive PC Experience (December 2011), http://www.amd.com/us/Documents/48423_fusion_whitepaper_WEB.pdf
Feldman, M.: First HPC Cluster with AMD Fusion Chips Debuts at Sandia (December 2011), http://www.hpcwire.com/hpcwire/2011-11-02/first_hpc_cluster_with_amd_fusion_chips_debuts_at_sandia.html
Advanced Micro Devices, Inc.: Fusion for Servers (December 2011), http://blogs.amd.com/work/2010/06/10/fusion-for-servers/
Intel Corporation: Intel Many Integrated Core Architecture (December 2011), http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html
Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P.: Larrabee: A Many-core x86 Architecture for Visual Computing. In: ACM SIGGRAPH 2008 Papers, SIGGRAPH 2008, pp. 18:1–18:15. ACM, New York (2008)
Intel Corporation: Teraflops Research Chip (December 2011), http://techresearch.intel.com/ProjectDetails.aspx?Id=151
Intel Corporation: Single-Chip Cloud Computer (December 2011), http://techresearch.intel.com/ProjectDetails.aspx?Id=1
Intel Corporation: The SCC Platform Overview (December 2011), http://techresearch.intel.com/spaw2/uploads/files/SCC_Platform_Overview.pdf
Shilov, A.: Intel Shows Off ”Knights Corner” MIC Compute Accelerator (December 2011), http://www.xbitlabs.com/news/cpu/display/20111115163857_Intel_Shows_Off_Knights_Corner_MIC_Compute_Accelerator.html
Fang, W., He, B., Luo, Q., Govindaraju, N.K.: Mars: Accelerating MapReduce with Graphics Processors. IEEE Transactions on Parallel and Distributed Systems 22, 608–620 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Saecker, M., Markl, V. (2013). Big Data Analytics on Modern Hardware Architectures: A Technology Survey. In: Aufaure, MA., Zimányi, E. (eds) Business Intelligence. eBISS 2012. Lecture Notes in Business Information Processing, vol 138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36318-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-36318-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36317-7
Online ISBN: 978-3-642-36318-4
eBook Packages: Computer ScienceComputer Science (R0)