Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 443))

  • 797 Accesses

Abstract

As scaling out applications with multiple servers has become a popular industry practice, we investigate collaborating distributed Query Engines (QEs) to support graph-structured SQL dataflow processes. A SQL dataflow process consists of queries (optionally with UDFs) linked with relational dataflow. We focus on using Distributed Caching Platform (DCP) for inter-QEs data communication. While DCP has gained popularity lately, exchanging query results tuple-by-tuple through DCP is often inefficient due to the tiny granularity of cache access and the overhead of data conversion and interpretation. This has motivated us to explore a new and more efficient mechanism for inter-QEs communication, taking advantage of DCP’s binary protocol. We propose the page-flow approach characterized by extending and externalizing the database buffer pool to DCP to allow the producer QE to put query results as data pages (blocks) to the DCP to be retrieved by the consumer QE. In this way, the relational dataflow logically becomes binary page-flow; the tuples contained in the transferred pages are exactly in the format required by the relational operators thus can be feed in queries directly without any conversion. Further, using pages as mini-batches of tuples, enhances the latency of DCP access. We have implemented this mechanism on a cluster of PostgreSQL engines. Our experiments results demonstrate its value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouzeid, Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In: VLDB 2009 (2009)

    Google Scholar 

  2. Nori, A.: Distributed Caching Platforms. In: VLDB 2010 (2010)

    Google Scholar 

  3. Baer, J., Wang, W.: On the inclusion properties for multi-level cache hierarchies. In: Proc. ISCA 1988 (1988)

    Google Scholar 

  4. Bryant, R.E.: Data-Intensive Supercomputing: The case for DISC, CMU-CS-07-128 (2007)

    Google Scholar 

  5. Chen, Q., Hsu, M., Zeller, H.: Experience in Continuous analytics as a Service (CaaaS). In: EDBT 2011 (2011)

    Google Scholar 

  6. Chen, Q., Hsu, M.: Query Engine Net for Streaming Analytics. In: Proc. 19th International Conference on Cooperative Information Systems, CoopIS (2011)

    Google Scholar 

  7. DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J., Royalty, J., Shankar, S., Krioukov, A.: Clustera: An Integrated Computation And Data Management System. In: VLDB 2008 (2008)

    Google Scholar 

  8. Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed data-parallel programs from sequential building blocks. In: EuroSys 2007 (March 2007)

    Google Scholar 

  9. Franklin, M.J., et al.: Continuous Analytics: Rethinking Query Processing in a NetworkEffect World. In: CIDR 2009 (2009)

    Google Scholar 

  10. Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.C.: SPADE: The System S Declarative Stream Processing Engine. In: ACM SIGMOD 2008 (2008)

    Google Scholar 

  11. Memcached (2010), http://www.memcached.org/

  12. Membase, http://www.couchbase.com/

  13. EhCache (2010), http://www.terracotta.org/

  14. Vmware vFabric GemFire (2010), http://www.gemstone.com/

  15. IBM Websphere Extreme Scale Cache (2010), http://www.ibm.com/

  16. AppFabric Cache (2010), http://msdn.microsoft.com/

  17. Liarou, E., et al.: Exploiting the Power of Relational Databases for Efficient Stream Processing. In: EDBT 2009 (2009)

    Google Scholar 

  18. The Wafflegrid Project, http://www.wafflegrid.com/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiming Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, Q., Hsu, M., Wu, R. (2013). Page-Flow in Query Engine Grid. In: Lee, R. (eds) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2012. Studies in Computational Intelligence, vol 443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32172-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32172-6_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32171-9

  • Online ISBN: 978-3-642-32172-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics