The Art of Side Effects
Spark Streaming applications by design are stateless and side-effect free: running the same application an infinite number of times results in the same behavior and output. Similar to functional programming, this simplifies debugging and reasoning about the state of a program, because input and output paths are deterministic. Although side-effect-free applications have many advantages, in distributed systems side effects cannot be completely avoided, especially when interfacing with external systems. For this reason, Spark Streaming provides a primitive called foreachRDD, which is the Swiss Army Knife of side effects for micro-batch processing. This chapter introduces design patterns for enabling side effects in Spark Streaming applications.