Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Hybrid Systems Based on Traditional Database Extensions

Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_254-1

Synonyms

Definition

A hybrid system based on traditional database extensions refers to a federated system between Hadoop and an enterprise data warehouse (EDW). In such a system, a query may need to combine data stored in both Hadoop and an EDW.

Overview

The co-existence of Hadoop and enterprise data warehouses (EDWs), together with the new application requirement of correlating data stored in both systems, has created the need for a special federation between Hadoop-like big data platforms and EDWs. This entry presents the motivation behind such hybrid systems, highlights the unique challenges of building them, surveys existing hybrid solutions, and finally discusses potential future directions.

Introduction

More and more enterprises today start to embrace Hadoop-like big data technologies to process huge volumes of data and drive actionable insights. The Hadoop Distributed File System (HDFS)...

This is a preview of subscription content, log in to check access.

References

  1. Adali S, Seluk Candan K, Papakonstantinou Y, Subrahmanian VS (1996) Query caching and optimization in distributed mediator systems. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data (SIGMOD’96), pp 137–146Google Scholar
  2. Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M (2015) Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data (SIGMOD’15), pp 1383–1394Google Scholar
  3. Beyer KS, Ercegovac V, Gemulla R, Balmin A, Eltabakh MY, Kanne C-C, Özcan F, Shekita EJ (2011) Jaql: a scripting language for large scale semistructured data analysis. PVLDB 4(12):1272–1283Google Scholar
  4. Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426CrossRefGoogle Scholar
  5. DeWitt DJ, Halverson A, Nehme RV, Shankar S, Aguilar-Saborit J, Avanes A, Flasza M, Gramling J (2013) Split query processing in Polybase. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data (SIGMOD’13), pp 1255–1266Google Scholar
  6. Gray SC, Ozcan F, Pereyra H, van der Linden B, Zubiri A (2015) SQL-on-hadoop without compromise: how big SQL 3.0 from IBM represents an important leap forward for speed, portability and robust functionality in SQL-on-hadoop solutions. http://public.dhe.ibm.com/common/ssi/ecm/sw/en/sww14019usen/SWW14019USEN.PDF
  7. Hive (2018) Apache Hive. https://hive.apache.org/
  8. Hortonworks (2015) Modern data architecture with apache hadoop – the hybrid data warehouse. https:// www.denodo.com/en/system/files/document-attachme- nts/wp-hortonworks-01-ab.pdf
  9. Josifovski V, Schwarz P, Haas L, Lin E (2002) Garlic: a new flavor of federated query processing for Db2. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data (SIGMOD’02), pp 524–532Google Scholar
  10. Kornacker M, Behm A, Bittorf V, Bobrovytsky T, Ching C, Choi A, Erickson J, Grund M, Hecht D, Jacobs M, Joshi I, Kuff L, Kumar D, Leblang A, Li N, Pandis I, Robinson H, Rorke D, Rus S, Russell J, Tsirogiannis D, Wanderman-Milne S, Yoder M (2015) Impala: a modern, open-source SQL engine for hadoop. In: Proceedings of the 2015 conference on innovative data systems research (CIDR)Google Scholar
  11. McClary D (2014) Oracle big data SQL: one fast query, all your data. https://blogs.oracle.com/datawarehousing/entry/oracle_big_data_sql_one
  12. Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data (SIGMOD’08), pp 1099–1110Google Scholar
  13. Oracle (2012) High performance connectors for load and access of data from Hadoop to Oracle database. http:// www.oracle.com/technetwork/bdc/hadoop-loader/con- nectors-hdfs-wp-1674035.pdf
  14. Özcan F, Hoa D, Beyer KS, Balmin A, Liu CJ, Li Y (2011) Emerging trends in the enterprise data analytics: connecting hadoop and Db2 warehouse. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data (SIGMOD’11), pp 1161–1164Google Scholar
  15. Papakonstantinou Y, Gupta A, Garcia-Molina H, Ullman JD (1995) A query translation scheme for rapid implementation of wrappers. In: Proceedings of the 1995 international conference on deductive and object-oriented databases (DOOD’95), pp 161–186CrossRefGoogle Scholar
  16. Pig (2018) Apache Pig. https://pig.apache.org
  17. Portnoy D (2013) The furture of hybrid data warehouse-Hadoop implementations. https://www.slideshare.net/ DavidPortnoy/hybrid-data-warehouse-hadoop-implem- entations
  18. Shan M-C, Ahmed R, Davis J, Du W, Kent W (1995) Pegasus: a heterogeneous information management system. In: Kim W (ed) Modern database systems. ACM Press/Addison-Wesley Publishing Co., New York/Reading, pp 664–682Google Scholar
  19. Shankar R (2015) Top reasons for powering your enterprise data warehouse with hadoop. http://www.cignex. com/blog/top-reasons-powering-your-enterprise-data- warehouse-hadoop
  20. SparkSQL (2018) Spark SQL. https://spark.apache.org/sql
  21. Sqoop (2018) Apache Sqoop. http://sqoop.apache.org
  22. Teradata (2016) Take a giant step with Teradata QueryGrid. http://blogs.teradata.com/data-points/take-a-giant-step-with-teradata-querygrid
  23. Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive: a warehousing solution over a map-reduce framework. Proc VLDB Endow (PVLDB) 2(2):1626–1629CrossRefGoogle Scholar
  24. Tian Y, Zou T, Ozcan F, Goncalves R, Pirahesh H (2015) Joins for hybrid warehouses: exploiting massive parallelism in hadoop and enterprise data warehouses. In: Proceedings of the 2015 international conference on extending database technology (EDBT’15), pp 373–384Google Scholar
  25. Tian Y, Özcan F, Zou T, Goncalves R, Pirahesh H (2016) Building a hybrid warehouse: efficient joins between data stored in hdfs and enterprise warehouse. ACM Trans Database Syst 41(4):21:1–21:38MathSciNetCrossRefGoogle Scholar
  26. Tomasic A, Raschid L, Valduriez P (1998) Scaling access to heterogeneous data sources with DISCO. IEEE Trans Know Data Eng (TKDE). 10(5):808–823CrossRefGoogle Scholar
  27. Vertica (2016) Hadoop integration guide – HP Vertica analytic database. https://my.vertica.com/docs/7.0.x/PDF/HP_Vertica_7.0.x_HadoopIntegration.pdf

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM Research – AlmadenSan JoseUSA

Section editors and affiliations

  • Yuanyuan Tian
    • 1
  • Fatma Özcan
    • 2
  1. 1.IBM Almaden Research CenterSAN JOSEUSA
  2. 2.IBM Research – AlmadenSan JoseUSA