Multi-tenant Pub/Sub Processing for Real-Time Data Streams
Devices and sensors generate streams of data across a diversity of locations and protocols. That data usually reaches a central platform that is used to store and process the streams. Processing can be done in real time, with transformations and enrichment happening on-the-fly, but it can also happen after data is stored and organized in repositories. In the former case, stream processing technologies are required to operate on the data; in the latter batch analytics and queries are of common use.
This paper introduces a runtime to dynamically construct data stream processing topologies based on user-supplied code. These dynamic topologies are built on-the-fly using a data subscription model defined by the applications that consume data. Each user-defined processing unit is called a Service Object. Every Service Object consumes input data streams and may produce output streams that others can consume. The subscription-based programing model enables multiple users to deploy their own data-processing services. The runtime does the dynamic forwarding of data and execution of Service Objects from different users. Data streams can originate in real-world devices or they can be the outputs of Service Objects.
The runtime leverages Apache STORM for parallel data processing, that combined with dynamic user-code injection provides multi-tenant stream processing topologies. In this work we describe the runtime, its features and implementation details, as well as we include a performance evaluation of some of its core components.
KeywordsBig Data Analytics Stream processing Real-time data processing Programming models Internet of Things IoT
This work is partially supported by the European Research Council (ERC) under the EU Horizon 2020 programme (GA 639595), the Spanish Ministry of Economy, Industry and Competitivity (TIN2015-65316-P) and the Generalitat de Catalunya (2014-SGR-1051).
- 1.Apache Flink official website. http://flink.apache.org
- 2.Apache Storm official website. http://storm.apache.org
- 3.evrythng official website. evrythng.com
- 4.Xively official website. xively.com
- 5.Abadi, D.J., et al.: The design of the borealis stream processing engine. In: CIDR, vol. 5, pp. 277–289 (2005)Google Scholar
- 7.Ali, M., Chandramouli, B., Goldstein, J., Schindlauer, R.: The extensibility framework in Microsoft StreamInsight. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 1242–1253. IEEE (2011)Google Scholar
- 8.Balazinska, M., Balakrishnan, H., Stonebraker, M.: Load management and high availability in the medusa distributed stream processing system. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 929–930. ACM (2004)Google Scholar
- 9.Barga, R.S., Goldstein, J., Ali, M., Hong, M.: Consistent streaming through time: a vision for event stream processing. arXiv preprint cs/0612115 (2006)Google Scholar
- 10.Kleppmann, M., Kreps, J.: Kafka, Samza and the unix philosophy of distributed dataGoogle Scholar
- 11.Kuntschke, R., Stegmaier, B., Kemper, A., Reiser, A.: StreamGlobe: processing and sharing data streams in grid-based P2P infrastructures. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 1259–1262. VLDB Endowment (2005)Google Scholar
- 12.Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 170–177. IEEE (2010)Google Scholar
- 13.Pedrinaci, C., Liu, D., Maleshkova, M., Lambert, D., Kopecky, J., Domingue, J.: iServe: a linked services publishing platform. In: The 7th Extended Semantic Web Ontology Repositories and Editors for the Semantic Web Workshop, vol. 596, June 2010. http://oro.open.ac.uk/23093/
- 14.Qin, Y., Sheng, Q.Z., Falkner, N.J.G., Dustdar, S., Wang, H., Vasilakos, A.V.: When things matter: a data-centric view of the internet of things. CoRR abs/1407.2704 (2014). http://arxiv.org/abs/1407.2704