Dr. Hadoop: In Search of a Needle in a Haystack
Dr. Hadoop is a framework to achieve the infinite scalability of metadata, hot-standby, high availability, automatic fault-tolerance, least hiccup time upon failure. The design is based on the Doubly Circular Linked List. However, the Dr. Hadoop considers the conventional hashing scheme of metadata. In this paper, Bloom Filter is integrated with Dr. Hadoop to boost up the metadata service performance. The conventional hashing is replaced by Bloom Filter in Dr. Hadoop to reduce extra space consumption. We propose to implement an existing Bloom Filter on the platform of the Dr. Hadoop. The rFilter is a variant of Bloom Filter, that is considered for the Dr Hadoop framework. We deploy rFilter in Dr. Hadoop platform and rFilter exhibits very good performance as compared to other variants of Bloom Filter. We conduct a series of rigorous experiments using Microsoft Traces to investigate the behavior of the rFilter on the Dr. Hadoop framework.
KeywordsDr. Hadoop In-memory replication Bloom Filter Cuckoo filter rFilter Big data Metadata server Microsoft Traces Distributed system
- 2.Fan, B., Andersen, D.G., Kaminsky, M., Mitzenmacher, M.D.: Cuckoo filter: practically better than bloom. In: Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, pp. 75–88. ACM (2014)Google Scholar
- 4.SNIA: IOTTA: Storage networking industry’s association. Accessed 20 Apr 2018. http://iotta.snia.org/tracetypes/6
- 6.Patgiri, R.: MDS: in-depth insight. In: 2016 International Conference on Information Technology (ICIT), pp. 193–199, December 2016Google Scholar
- 7.Patgiri, R., Borgohain, S.K., Bhattacharjee, A.: rFilter: a scalable and space-efficient membership filter. In: 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 478–485 (2018)Google Scholar