Advertisement

A Distributed Rule Engine for Streaming Big Data

  • Debo CaiEmail author
  • Di Hou
  • Yong Qi
  • Jinpei Yan
  • Yu Lu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11242)

Abstract

The rules engine has been widely used in industry and academia, because it can separate the rules from the execution logic and incorporate the features of expert knowledge. With the advent of big data era, the amount of data has grown at an unprecedented rate. However, traditional rule engines based on PCs or servers are hard to handle streaming big data owing to limitation of hardware performance. The structured streaming computing framework can provide new solutions for these challenges. In this paper, we design a distributed rule engine based on Kafka and Structured Streaming (KSSRE), and propose a rule-fact matching strategy using the Spark SQL engine to support a large number of event stream inferences. KSSRE uses DataFrame to store data and inherits the load balancing, scalability and fault-tolerance mechanisms of Spark2.x. In addition, in order to remove the possible repetitive rules and optimize the matching process, we use the ternary grid model [1] for representing rules and design a scheduling model to improve the memory sharing in the matching process. The evaluation shows that KSSRE has a better performance, scalability and fault tolerance based on DBLP data sets.

Keywords

Rule engine Spark2.x Event stream 

Notes

Acknowledgment

This work is partially supported by the National Key Research and Development Program of China under Grant No. 2016YFB1000600.

References

  1. 1.
    Erdani, Y.: Developing algorithms of ternary grid technique for optimizing expert system’s knowledge base. In: 2006 Seminar Nasional Aplikasi Teknologi Informasi (2006)Google Scholar
  2. 2.
    Apache Kafka. http://kafka.apache.org/. Accessed May 2018
  3. 3.
    Structured Streaming. http://spark.apache.org. Accessed May 2018
  4. 4.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRefGoogle Scholar
  5. 5.
    Cao, B., Yin, J., Zhang, Q., Ye, Y.: A MapReduce-based architecture for rule matching in production system, pp. 790–795. IEEE (2010)Google Scholar
  6. 6.
    Zhou, R., Wang, G., Wang, J., Li, J.: RUNES II: a distributed rule engine based on rete network in cloud computing. Int. J. Grid Distrib. Comput. 7, 91–110 (2014)CrossRefGoogle Scholar
  7. 7.
    Chen, Y., Bordbar, B.: DRESS: a rule engine on spark for event stream processing, pp. 46–51. ACM (2016)Google Scholar
  8. 8.
    Zhang, J., Yang, J., Li, J.: When rule engine meets big data: design and implementation of a distributed rule engine using spark, pp. 41–49. IEEE (2017)Google Scholar
  9. 9.
    Liang, S., Fodor, P., Wan, H., Kifer, M.: OpenRuleBench: an analysis of the performance of rule engines. In: Proceedings of the 18th International Conference on World Wide Web, pp. 601–610. ACM (2009)Google Scholar
  10. 10.
    DBLP: computer science bibliography. http://dblp.uni-trier.de/db/. Accessed May 2018
  11. 11.
    Forgy, C.L.: Rete: a fast algorithm for the many pattern/many object pattern match problem. Artif. Intell. 19, 17–37 (1982)CrossRefGoogle Scholar
  12. 12.
    Drools. https://www.drools.org/. Accessed May 2018

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyXi’an Jiaotong UniversityXi’anChina
  2. 2.Troops 69064 of PLAXinjiangChina

Personalised recommendations