Advertisement

Cross-Outlier Detection

  • Spiros Papadimitriou
  • Christos Faloutsos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2750)

Abstract

The problem of outlier detection has been studied in the context of several domains and has received attention from the database research community. To the best of our knowledge, work up to date focuses exclusively on the problem as follows [10]: “given a single set of observations in some space, find those that deviate so as to arouse suspicion that they were generated by a different mechanism.” However, in several domains, we have more than one set of observations (or, equivalently, as single set with class labels assigned to each observation). For example, in astronomical data, labels may involve types of galaxies (e.g., spiral galaxies with abnormal concentration of elliptical galaxies in their neighborhood; in biodiversity data, labels may involve different population types, e.g., patches of different species populations, food types, diseases, etc). A single observation may look normal both within its own class, as well as within the entire set of observations. However, when examined with respect to other classes, it may still arouse suspicions. In this paper we consider the problem “given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels.” This variant has significant practical importance. Many of the existing outlier detection approaches cannot be extended to this case. We present one practical approach for dealing with this problem and demonstrate its performance on real and synthetic datasets.

Keywords

Association Rule Outlier Detection Spiral Galaxy Elliptical Galaxy Local Outlier Factor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: Proc. SIGMOD (2001)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. VLDB, pp. 487–499 (1994)Google Scholar
  3. 3.
    Arning, A., Agrawal, R., Raghavan, P.: A linear methodfor deviation detection in large database. In: Proc. KDD, pp. 164–169 (1996)Google Scholar
  4. 4.
    Barbará, D., Chen, P.: Using the fractal dimension to cluster datasets. In: Proc. KDD, pp. 260–264 (2000)Google Scholar
  5. 5.
    Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley, Chichester (1994)zbMATHGoogle Scholar
  6. 6.
    Belussi, A., Faloutsos, C.: Estimating the selectivity of spatial queries using the correlation fractal dimension. In: Proc. VLDB, pp. 299–310 (1995)Google Scholar
  7. 7.
    Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.-P.: A cost model for nearest neighbor search in high-dimensional data space. In: Proc. PODS, pp. 78–86 (1997)Google Scholar
  8. 8.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying density based local outliers. In: Proc. SIGMOD Conf., pp. 93–104 (2000)Google Scholar
  9. 9.
    Faloutsos, C., Seeger, B., T̃raina Jr., C., Traina, A.: Spatial join selectivity using power laws. In: Proc. SIGMOD, pp. 177–188 (2000)Google Scholar
  10. 10.
    Hawkins, D.M.: Identification of Outliers. Chapman and Hall, Boca Raton (1980)zbMATHGoogle Scholar
  11. 11.
    Jagadish, H.V., Koudas, N., Muthukrishnan, S.: Mining deviants in a time series database. In: VLDB, pp. 102–113 (1999)Google Scholar
  12. 12.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comp. Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  13. 13.
    Johnson, T., Kwok, I., Ng, R.T.: Fast computation of 2-dimensional depth contours. In: Proc. KDD, pp. 224–228 (1998)Google Scholar
  14. 14.
    Knorr, E.M., Ng, R. T.: Algorithms for mining distance-based outliers in large datasets. In: Proc. VLDB 1998, pp. 392–403 (1998)Google Scholar
  15. 15.
    Knorr, E.M., Ng, R.T.: Finding aggregate proximity relationships and commonalities in spatial data mining. IEEE TKDE 8(6), 884–897 (1996)Google Scholar
  16. 16.
    Knorr, E.M., Ng, R.T.: A unified notion of outliers: Properties and computation. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 219–222. Springer, Heidelberg (1997)Google Scholar
  17. 17.
    Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: VLDB, pp. 211–222 (1999)Google Scholar
  18. 18.
    Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: Algorithms and applications. VLDB Journal 8, 237–253 (2000)CrossRefGoogle Scholar
  19. 19.
    Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proc. VLDB, pp. 144–155 (1994)Google Scholar
  20. 20.
    Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast outlier detection using the local correlation integral. In: Proc. ICDE (2003)Google Scholar
  21. 21.
    Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley and Sons, Chichester (1987)zbMATHCrossRefGoogle Scholar
  22. 22.
    Traina, A., Traina, C., Papadimitriou, S., Faloutsos, C.: Tri-Plots: Scalable tools for multidimensional data mining. In: Proc. KDD, pp. 184–193 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Spiros Papadimitriou
    • 1
  • Christos Faloutsos
    • 1
  1. 1.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations