Skip to main content

Incremental Evaluation of Continuous Analytic Queries in HIFUN

  • Conference paper
  • First Online:
Information Search, Integration, and Personalization (ISIP 2019)

Abstract

A huge amount of data is generated each day from various sources. Analysis of these massive data is difficult, and requires new forms of processing to enable enhanced decision making, insight discovery and process optimization. In addition, besides their ever increasing volume, datasets change frequently, and as such, results to continuous queries have to be updated at short intervals. In this paper, we address the problem of evaluating continuous queries over big data streams that are frequently updated, adopting HIFUN, a high-level query language introduced recently. HIFUN offers a clear separation between the conceptual layer, where analytic queries are defined independently of the nature and location of data, and the physical layer where queries are evaluated, by encoding them as map-reduce jobs or as SQL group-by queries. Using HIFUN, we devise an algorithm for incremental processing of continuous queries, processing only the most recent data partition, and exploiting already computed information, without requiring evaluating the query over the complete dataset. Subsequently, we translate the generic algorithm to both SQL and MapReduce using SPARK, exploiting the query rewriting method provided by HIFUN. The experiments performed show the advantages of our solution in terms of query answering efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agathangelos, G., Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Incremental data partitioning of RDF Data in SPARK. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 11155, pp. 50–54. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98192-5_10

    Chapter  Google Scholar 

  2. Agathangelos, G., Troullinou, G., Kondylakis, H., et al.: RDF Query answering using apache spark: review and assessment. In: ICDE Workshops, pp. 54–59 (2018)

    Google Scholar 

  3. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2009)

    Google Scholar 

  4. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2004)

    Article  Google Scholar 

  5. Zaharia, M.A., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. Ann. Emerg. Med. 39(6), 691–692 (2002)

    Article  Google Scholar 

  6. Karimov, J., Rabl, T., Katsifodimos, A., Samarev, R., Heiskanen, H., Markl, V.: Benchmarking distributed stream data processing systems. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1507–1518 (2018). Author, F.: Contribution title. In: 9th International Proceedings on Proceedings, pp. 1–2. Publisher, Location (2010)

    Google Scholar 

  7. Zaharia, M.A., Das, T., Li, D.H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: SOSP (2013)

    Google Scholar 

  8. Armbrust, M., et al.: Structured streaming: a declarative API for real-time applications in apache spark. In: SIGMOD Conference (2018)

    Google Scholar 

  9. Iqbal, M.S., Soomro, T.R.: Big data analysis: apache storm perspective. Int. J. Comput. Trends Technol. 19, 9–14 (2015)

    Article  Google Scholar 

  10. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38, 28–38 (2015)

    Google Scholar 

  11. Akidau, T., et al.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8, 1792–1803 (2015)

    Google Scholar 

  12. Babu, S., Widom, J.: Continuous queries over data streams. ACM SIGMOD Rec. 30, 109–120 (2001)

    Article  Google Scholar 

  13. Gupta, A., Mumick, I.S.: Materialized Views: Techniques, Implementations, and Applications. MIT Press, Cambridge (1999)

    Book  Google Scholar 

  14. Blakeley, J.A., Larson, P., Tompa, F.W.: Efficiently updating materialized views. ACM SIGMOD Rec. 15, 61–71 (1986)

    Google Scholar 

  15. Ahmad, Y., Kennedy, O., Koch, C., Nikolic, M.: DBToaster: higher-order delta processing for dynamic, frequently fresh views. PVLDB 5, 968–979 (2012)

    Google Scholar 

  16. Spyratos, N., Sugibuchi, T.: HIFUN - a high level functional query language for big data analytics. J. Intell. Inf. Syst. 51, 529–555 (2018). https://doi.org/10.1007/s10844-018-0495-6

    Article  Google Scholar 

  17. Spyratos, N., Sugibuchi, T.: A high-level query language for big data analytics (2014)

    Google Scholar 

  18. Jesus, P., Baquero, C., Almeida, P.S.: A survey of distributed data aggregation algorithms. IEEE Commun. Surv. Tutorials 17, 381–404 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petros Zervoudakis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zervoudakis, P., Kondylakis, H., Plexousakis, D., Spyratos, N. (2020). Incremental Evaluation of Continuous Analytic Queries in HIFUN. In: Flouris, G., Laurent, D., Plexousakis, D., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration, and Personalization. ISIP 2019. Communications in Computer and Information Science, vol 1197. Springer, Cham. https://doi.org/10.1007/978-3-030-44900-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44900-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44899-8

  • Online ISBN: 978-3-030-44900-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics