Automatic Scaling of Resources in a Storm Topology

Gkolemis, Evangelos; Doka, Katerina; Koziris, Nectarios

doi:10.1007/978-3-319-74875-7_10

Automatic Scaling of Resources in a Storm Topology

Evangelos Gkolemis¹⁶,
Katerina Doka¹⁶ &
Nectarios Koziris¹⁶

Conference paper
First Online: 28 January 2018

465 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10739))

Abstract

In the Big Data era, the batch processing of large volumes of data is simply not enough - data needs to be processed fast to support continuous reactions to changing conditions in real-time. Distributed stream processing systems have emerged as platforms of choice for applications that rely on real-time analytics, with Apache Storm [2] being one of the most prevalent representatives. Whether deployed on physical or virtual infrastructures, distributed stream processing systems are expected to make the most out of the available resources, i.e., achieve the highest throughput or lowest latency with the minimum resource utilisation. However, for Storm - as for most such systems - this is a cumbersome trial-and-error procedure, tied to the specific workload that needs to be processed and requiring manual tweaking of resource-related topology parameters. To this end, we propose ARiSTO, a system that automatically decides on the appropriate amount of resources to be provisioned for each node of the Storm workflow topology based on user-defined performance and cost constraints. ARiSTO employs two mechanisms: a static, model-based one, used at bootstrap time to predict the resource-related parameters that better fit the user needs and a dynamic, rule-based one that elastically auto-scales the allocated resources in order to maintain the desired performance even under changes in load. The experimental evaluation of our prototype proves the ability of ARiSto to efficiently decide on the resource-related configuration parameters, maintaining the desired throughput at all times.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 60.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/vgolemis/aristo.
2.
The capacity metric, which is measured for each topology bolt and takes values between (0,1), represents the percentage of time that a bolt is active (i.e., processing tuples). If this metric approaches 1 the bolt works near its maximum capacity and needs additional parallelism.

References

Apache Flink. https://flink.apache.org/
Apache Storm. http://storm.apache.org/
Apache Storm Concepts. http://storm.apache.org/releases/1.0.0/Concepts.html
Auto-Scaling Resources in a Topology. https://issues.apache.org/jira/browse/STORM-594
DataTorrent. https://www.datatorrent.com/
Healthcare Looks to Real-Time Big Data Analytics for Insights. https://healthitanalytics.com/news/healthcare-looks-to-real-time-big-data-analytics-for-insights
Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse. https://www.infoq.com/articles/stream-processing-hadoop
S4 Distributed Data Platform. http://incubator.apache.org/s4
Spark Streaming. https://spark.apache.org/streaming/
Understanding the Parallelism of a Storm Topology. storm.apache.org/releases/1.0.0/Understanding-the-parallelism-of-a-Storm-topology.html
Use Cases for Real Time Stream Processing Systems. https://dzone.com/articles/need-for-using-real-time-stream-processing-systems
Weka 3: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka/
Akidau, T., et al.: Millwheel: fault-tolerant stream processing at internet scale. In: VLDB 2014, vol. 6, no. 11, pp. 1033–1044 (2013)
Google Scholar
Floratou, A., et al.: Dhalion: self-regulating stream processing in heron. In: VLDB 2017 (2017)
Google Scholar
Heinze, T., et al.: Latency-aware elastic scaling for distributed data stream processing systems. In: DEBS 2014, pp. 13–22. ACM (2014)
Google Scholar
Kalyvianaki, E., Wiesemann, W., Vu, Q.H., Kuhn, D., Pietzuch, P.: SQPR: stream query planning with reuse. In: ICDE 2011, pp. 840–851. IEEE (2011)
Google Scholar
Kulkarni, S., et al.: Twitter heron: stream processing at scale. In: SIGMOD 2015, pp. 239–250. ACM (2015)
Google Scholar
Lohrmann, B., Janacik, P., Kao, O.: Elastic stream processing with latency guarantees. In: ICDCS 2015, pp. 399–410. IEEE (2015)
Google Scholar
Markl, V., et al.: Robust query processing through progressive optimization. In: SIGMOD 2004, pp. 659–670. ACM (2004)
Google Scholar
Tatbul, N., et al.: Load shedding in a data stream manager. In: VLDB 2003, pp. 309–320. VLDB Endowment (2003)
Google Scholar
Wolf, J., Bansal, N., Hildrum, K., Parekh, S., Rajan, D., Wagle, R., Wu, K.-L., Fleischer, L.: SODA: an optimizing scheduler for large-scale stream-based distributed computer systems. In: Issarny, V., Schantz, R. (eds.) Middleware 2008. LNCS, vol. 5346, pp. 306–325. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89856-6_16
Chapter Google Scholar
Xing, Y., Zdonik, S., Hwang, J.-H.: Dynamic load distribution in the borealis stream processor. In: ICDE 2005, pp. 791–802. IEEE (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Computing Systems Laboratory, National Technical University of Athens, Athens, Greece
Evangelos Gkolemis, Katerina Doka & Nectarios Koziris

Authors

Evangelos Gkolemis
View author publications
You can also search for this author in PubMed Google Scholar
Katerina Doka
View author publications
You can also search for this author in PubMed Google Scholar
Nectarios Koziris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katerina Doka .

Editor information

Editors and Affiliations

IST AUSTRIA, Klosterneuburg, Austria
Dan Alistarh
University of Athens, Athens, Greece
Alex Delis
University of Cyprus, Nicosia, Greece
George Pallis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gkolemis, E., Doka, K., Koziris, N. (2018). Automatic Scaling of Resources in a Storm Topology. In: Alistarh, D., Delis, A., Pallis, G. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2017. Lecture Notes in Computer Science(), vol 10739. Springer, Cham. https://doi.org/10.1007/978-3-319-74875-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-74875-7_10
Published: 28 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74874-0
Online ISBN: 978-3-319-74875-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics