The basic idea is buffer the data of a continuously arriving data stream for a certain amount of time. The resulting data stream segments are processed by Spark as usual. The resulting intermediate results are combined by applying specific operations like union.
- Apache Spark Streaming - official web site
- Spark Streaming Programming Guide
- M. Zaharia and T. Das and H. Li and S. Shenker and I. Stoica "Discretized Streams: An Efficient and Fault-tolerant Model for Stream Processing on Large Clusters" Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing (HotCloud'12), 2012, Pages 10-10, USENIX Association Berkeley, CA, USA