Log integration on large scale for global networking monitoring
- 28 Downloads
Supposing that the overall situation is dug out from the distributed monitoring nodes, there should be two critical obstacles, heterogenous schema and instance, to integrating heterogeneous data from different monitoring sensors. To tackle the challenge of heterogenous schema, an instance-based approach for schema mapping, named instance-based machine-learning (IML) approach was described. And to solve the problem of heterogenous instance, a novel approach, called statistic-based clustering (SBC) approach, which utilized clustering and statistics technologies to match large scale sources holistically, was also proposed. These two algorithms utilized the machine-leaning and clustering technology to improve the accuracy. Experimental analysis shows that the IML approach is more precise than SBC approach, reaching at least precision of 81% and recall rate of 82%. Simulation studies further show that SBC can tackle large scale sources holistically with 85% recall rate when there are 38 data sources.
Key wordsmachine-learning clustering data integration schema matching instance matching
Unable to display preview. Download preview PDF.
- US-CERT. Technical cyber security alerts[EB/OL]. [2005-10-04]. https://doi.org/www.us-cert.gov/cas/techalerts/.
- MIAO Jia-jia. GS-TMS: A global stream-based threat monitor system[C]//Proceedings of the 34th International Conference on Very Large Data Bases. Auckland: VLDB Endowment, 2008: 1678–1687.Google Scholar
- DOAN A, HALEVY A Y. Semantic-integration research in the database community[J]. AI Magazine, 2005, 26(5): 183–194.Google Scholar
- HE B, CHANG K C. Making holistic schema matching robust: An ensemble approach[C]//Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. Chicago: ACM Press, 2005: 429–438.Google Scholar
- HE B, CHANG K C. Discovering complex matching across web query interfaces: A correlation mining approach[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle: ACM Press, 2004: 148–157.Google Scholar
- YANG Q, ZHANG H H, LI T. Mining web logs for prediction models in www caching and perfecting[C]//Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago: ACM Press, 2001: 473–478.Google Scholar
- DAEMI A, CALMET J. From ontologies to trust through entropy[C]//Proceedings of the International Conference on Advances in Intelligent Systems — Theory and Applications. Luxembourg: IEEE Computer Society, 2004: 12–43.Google Scholar