A Methodology for Handling a New Kind of Outliers Present in Gene Expression Patterns

  • Anindya Bhattacharya
  • Rajat K. De
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6744)


Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the effectiveness of a similarity measure. In the present work, we discuss the problem of handling outliers with different existing similarity measures, and introduce the concepts of a new kind of outliers present in gene expression patterns. We formulate a new similarity, incorporated in Euclidean distance and Pearson correlation coefficient, and then use them in various clustering algorithms to group different gene expression profiles. Assessment of the results are done by using functional annotation. Different existing similarity measures in their traditional form are also used with clustering algorithms for performance comparisons. The results suggest that the new similarity improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.


  1. 1.
    Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)CrossRefzbMATHGoogle Scholar
  2. 2.
    Rousseeuw, P., Leory, A.: Robust Regression and Outlier Detection. Wiley, New York (1987)CrossRefGoogle Scholar
  3. 3.
    Barnett, V., Lewis, T.: Outliers in Statistical Data. Wiley, New York (1994)zbMATHGoogle Scholar
  4. 4.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)CrossRefzbMATHGoogle Scholar
  5. 5.
    Shekhar, S., Chawla, S.: A Tour of Spatial Databases. Prentice-Hall, New Jersey (2002)Google Scholar
  6. 6.
    Hu, T., Sung, S.Y.: Detecting pattern-based outliers. Pattern Recognition Letters 24, 3059–3068 (2003)CrossRefGoogle Scholar
  7. 7.
    Schiffman, S.S., Reynolds, M.L., Young, F.W.: Introduction to Multidimensional Scaling: Theory, Methods and Applications. Academic Press, New York (1981)zbMATHGoogle Scholar
  8. 8.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)zbMATHGoogle Scholar
  9. 9.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, New Jersey (1988)zbMATHGoogle Scholar
  10. 10.
    Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)CrossRefGoogle Scholar
  11. 11.
    Bhattacharya, A., De, R.K.: Divisive correlation clustering algorithm (DCCA) for grouping of genes: Detecting varying patterns in expression profiles. Bioinformatics 24, 1359–1366 (2008)CrossRefGoogle Scholar
  12. 12.
    Bhattacharya, A., De, R.K.: Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values. Journal of Biomedical Informatics 43, 560–568 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Anindya Bhattacharya
    • 1
  • Rajat K. De
    • 2
  1. 1.Department of Computer Science and EngineeringNetaji Subhash Engineering CollegeKolkataIndia
  2. 2.Machine Intelligence UnitIndian Statistical InstituteKolkataIndia

Personalised recommendations