FixMe: A Self-organizing Isolated Anomaly Detection Architecture for Large Scale Distributed Systems
Monitoring a system is the ability of collecting and analyzing relevant information provided by the monitored devices so as to be continuously aware of the system state. However, the ever growing complexity and scale of systems makes both real time monitoring and fault detection a quite tedious task. Thus the usually adopted option is to focus solely on a subset of information states, so as to provide coarse-grained indicators. As a consequence, detecting isolated failures or anomalies is a quite challenging issue. In this work, we propose to address this issue by pushing the monitoring task at the edge of the network. We present a peer-to-peer based architecture, which enables nodes to adaptively and efficiently self-organize according to their “health” indicators. By exploiting both temporal and spatial correlations that exist between a device and its vicinity, our approach guarantees that only isolated anomalies (an anomaly is isolated if it impacts solely a monitored device) are reported on the fly to the network operator. We show that the end-to-end detection process, i.e., from the local detection to the management operator reporting, requires a logarithmic number of messages in the size of the network.
KeywordsAnomaly Detection Distribute Hash Table Management Node Split Operation Quality Position
Unable to display preview. Download preview PDF.
- 1.Broadband Forum: TR-069 CPE WAN Management Protocol Issue 1, Amend.4 (2011)Google Scholar
- 2.Rabkin, A., Katz, R.: Chukwa: a system for reliable large-scale log collection. In: Proceedings of the International Conference on Large Installation System Administration, LISLA (2010)Google Scholar
- 3.Zhao, Y., Tan, Y., Gong, Z., Gu, X., Wamboldt, M.: Self-correlating predictive information tracking for large-scale production systems. In: Proceedings of the International Conference on Autonomic Computing, ICAC (2009)Google Scholar
- 4.Desphand, A., Guestrin, E., Madden, S.: Model-driven data acquisition in sensor networks. In: Proceedings of the International Conference on Very Large Databases, VLDB (2002)Google Scholar
- 5.Krishnamurthy, S., He, T., Zhou, G., Stankovic, J.A., Son, S.H.: RESTORE: A Real-time Event Correlation and Storage Service for Sensor Networks. In: Proceedings of the International Conference on Network Sensing Systems, INSS (2006)Google Scholar
- 8.Xiong, X., Mokbel, M., Aref, W.: SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-Temporal Databases. In: Proceedings of the IEEE International Conference on Data Engineering, ICDE (2005)Google Scholar
- 12.Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: Proceedings of the SIGCOMM Conference (2001)Google Scholar
- 13.Stoica, I., Morris, R., Karger, D.R., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the SIGCOMM Conference (2001)Google Scholar
- 14.Lin, J.: Broadcast scheduling for a p2p spanning tree. In: Proceedings of the IEEE International Conference on Communications (2008)Google Scholar
- 15.Kovacs, B., Vida, R.: An adaptive approach to enhance the performance of content-addressable networks. In: Proceedings of the International Conference on Network and Computer Science, ICNS (2007)Google Scholar
- 16.Anceaume, E., Ludinard, R., Ravoaja, A., Brasileiro, F.V.: Peercube: A hypercube-based p2p overlay robust against collusion and churn. In: Proceedings of the IEEE International Conference on Self-Adaptive and Self-Organizing Systems, SASO (2008)Google Scholar