EVIDIST: A Similarity Measure for Uncertain Data Streams
Large amount of data generated by sensors, and increased use of privacy-preserving techniques have led to an increasing interest in mining uncertain data streams. Traditional distance measures such as the Euclidean distance do not always work well for uncertain data streams. In this paper, we present EVIDIST, a new distance measure for uncertain data streams, where uncertainty is modeled as sample observations at each time slot. We conduct an extensive experimental evaluation of EVIDIST (Evidential Distance) on the 1-NN classification task with 15 real datasets. The results show that, compared with Euclidean distance, EVIDIST increases the classification accuracy by about 13 % and is also far more resilient to error.
KeywordsData mining Distance measure Similarity Uncertain data streams
- 3.Henrikson, J.: Completeness and total boundedness of the Hausdorff metric. MIT Undergraduate J. Math. 1, 69–80 (1999)Google Scholar
- 4.Keogh, E., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data. Accessed 5 March 2015
- 5.Orang, M., Shiri, N.: An experimental evaluation of similarity measures for uncertain time series. In: Proceedings of the 18th International Database Engineering and Applications Symposium, pp. 261–264 (2014)Google Scholar
- 6.Sarangi, S.R., Murthy, K.: DUST: a generalized notion of similarity between uncertain time series. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 383–392 (2010)Google Scholar
- 11.Yeh, M.-Y., Wu, K.-L., Yu, P.S., Chen, M.-S.: PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 684–695 (2009)Google Scholar