Skip to main content

Proximity Measures and Results Validation in Biclustering – A Survey

  • Conference paper
Artificial Intelligence and Soft Computing (ICAISC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7895))

Included in the following conference series:

Abstract

The concept of biclustering evolved from traditional clustering techniques, which have proved to be inadequate for discovering local patterns in gene microarrays, in particular with shifting and scaling patterns. In this work we compare similarity measures applied in different biclustering algorithms and review validation methodologies described in literature. To our best knowledge, this is the first in-depth comparative analysis of proximity measures and validation techniques for biclustering. Current trends in design of similarity measures as well as a rich collection of state-of-the-art benchmark datasets are presented, supporting algorithm designers in classification of comparison and quality assessment criteria of emerging biclustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aguilar-Ruiz, J.: Shifting and scaling patterns from gene expression data. Bioinformatics 21(20), 3840–3845 (2005)

    Article  Google Scholar 

  2. Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., Yu, X., et al.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)

    Article  Google Scholar 

  3. Armstrong, S., Staunton, J., Silverman, L., Pieters, R., den Boer, M., Minden, M., Sallan, S., Lander, E., Golub, T., Korsmeyer, S., et al.: Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30(1), 41–47 (2002)

    Article  Google Scholar 

  4. Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., et al.: Gene ontology: tool for the unification of biology. Nature Genetics 25(1), 25 (2000)

    Article  Google Scholar 

  5. Ayadi, W., Elloumi, M., Hao, J.: Pattern-driven neighborhood search for biclustering of microarray data. BMC bioinformatics 13(suppl. 7), S11 (2012)

    Google Scholar 

  6. Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the Sixth Annual International Conference on Computational Biology, RECOMB 2002, pp. 49–57. ACM, New York (2002), http://doi.acm.org/10.1145/565196.565203

    Chapter  Google Scholar 

  7. Bozdağ, D., Kumar, A.S., Catalyurek, U.V.: Comparative analysis of biclustering algorithms. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 2010, pp. 265–274. ACM, New York (2010), http://doi.acm.org/10.1145/1854776.1854814

    Chapter  Google Scholar 

  8. Bozdağ, D., Parvin, J.D., Catalyurek, U.V.: A biclustering method to discover co-regulated genes using diverse gene expression datasets. In: Rajasekaran, S. (ed.) BICoB 2009. LNCS, vol. 5462, pp. 151–163. Springer, Heidelberg (2009), http://dx.doi.org/10.1007/978-3-642-00727-9_16

    Chapter  Google Scholar 

  9. Bryan, K.: Biclustering of expression data using simulated annealing. In: Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, CBMS 2005, pp. 383–388. IEEE Computer Society Press, Washington, DC (2005), http://dx.doi.org/10.1109/CBMS.2005.37

    Google Scholar 

  10. Chen, G., Jaradat, S., Banerjee, N., Tanaka, T., Ko, M., Zhang, M.: Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data. Statistica Sinica 12(1), 241–262 (2002)

    MathSciNet  MATH  Google Scholar 

  11. Chen, P., Popovich, P.: Correlation: Parametric and nonparametric measures, pp. 137–139. Sage Publications, Incorporated (2002)

    Google Scholar 

  12. Cheng, Y., Church, G.: Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, vol. 8, pp. 93–103 (2000)

    Google Scholar 

  13. Choi, S., Cha, S., Tappert, C.: A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics 8(1), 43–48 (2010)

    Google Scholar 

  14. Dharan, S., Nair, A.S.: Biclustering of gene expression data using reactive greedy randomized adaptive search procedure. BMC Bioinformatics 10(suppl. 1), S27 (2009)

    Google Scholar 

  15. Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences 95(25), 14863–14868 (1998)

    Article  Google Scholar 

  16. Eren, K., Deveci, M., Küçüktunç, O., Çatalyürek, Ü.: A comparative analysis of biclustering algorithms for gene expression data. Briefings in Bioinformatics (2012)

    Google Scholar 

  17. Erten, C., Sözdinler, M.: Biclustering expression data based on expanding localized substructures. In: Rajasekaran, S. (ed.) BICoB 2009. LNCS, vol. 5462, pp. 224–235. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  18. Faith, J., Driscoll, M., Fusaro, V., Cosgrove, E., Hayete, B., Juhn, F., Schneider, S., Gardner, T.: Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata. Nucleic Acids Research 36(suppl. 1), D866–D870 (2008)

    Google Scholar 

  19. Gasch, A., Spellman, P., Kao, C., Carmel-Harel, O., Eisen, M., Storz, G., Botstein, D., Brown, P.: Genomic expression programs in the response of yeast cells to environmental changes. Science Signalling 11(12), 4241 (2000)

    Google Scholar 

  20. Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381–2389 (2003)

    Article  Google Scholar 

  21. Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences 97(22), 12079–12084 (2000)

    Article  Google Scholar 

  22. Gu, J., Liu, J.S.: Bayesian biclustering of gene expression data. BMC genomics 9(suppl. 1), 4 (2008)

    Article  Google Scholar 

  23. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2), 107–145 (2001)

    Article  MATH  Google Scholar 

  24. Hartigan, J.: Direct clustering of a data matrix. Journal of the American Statistical Association 67(337), 123–129 (1972)

    Article  Google Scholar 

  25. Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mitterecker, A., Kasim, A., Khamiakova, T., Van Sanden, S., Lin, D., Talloen, W., et al.: Fabia: factor analysis for bicluster acquisition. Bioinformatics 26(12), 1520–1527 (2010)

    Article  Google Scholar 

  26. Hoshida, Y., Brunet, J., Tamayo, P., Golub, T., Mesirov, J.: Subclass mapping: identifying common subtypes in independent disease data sets. PloS One 2(11), e1195 (2007)

    Google Scholar 

  27. Ihmels, J., Bergmann, S., Barkai, N.: Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13), 1993–2003 (2004)

    Article  Google Scholar 

  28. Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., Barkai, N., et al.: Revealing modular organization in the yeast transcriptional network. Nature Genetics 31(4), 370–378 (2002)

    Google Scholar 

  29. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999), http://doi.acm.org/10.1145/331499.331504

    Article  Google Scholar 

  30. Jain, A.K., Dubes, R.: Algorithms for clustering data. Prentice-Hall, Inc. (1988)

    Google Scholar 

  31. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010), http://dx.doi.org/10.1016/j.patrec.2009.09.011

    Article  Google Scholar 

  32. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., et al: Kegg for linking genomes to life and the environment. Nucleic acids research 36(suppl. 1), D480–D484 (2008)

    Google Scholar 

  33. Kerr, G., Ruskin, H., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 283–293 (2008)

    Article  Google Scholar 

  34. Lazzeroni, L., Owen, A., et al.: Plaid models for gene expression data. Statistica Sinica 12(1), 61–86 (2002)

    MathSciNet  MATH  Google Scholar 

  35. Li, G., Ma, Q., Tang, H., Paterson, A., Xu, Y.: Qubic: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Research 37(15), e101–e101 (2009)

    Google Scholar 

  36. Liu, F., Zhou, H., Liu, J., He, G.: Biclustering of gene expression data using eda-ga hybrid. In: IEEE Congress on Evolutionary Computation, CEC 2006, pp. 1598–1602. IEEE (2006)

    Google Scholar 

  37. Liu, J., Li, Z., Hu, X., Chen, Y.: Biclustering of microarray data with mospo based on crowding distance. BMC bioinformatics 10(suppl. 4), S9 (2009)

    Google Scholar 

  38. Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1(1), 24–45 (2004)

    Article  Google Scholar 

  39. Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the Society for Industrial & Applied Mathematics 5(1), 32–38 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  40. Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Proc. Pacific Symp. Biocomputing, vol. 3, pp. 77–88 (2003)

    Google Scholar 

  41. Myers, J., Well, A.: Research design and statistical analysis. Lawrence Erlbaum (2002)

    Google Scholar 

  42. Nepomuceno, J., Troncoso, A., Aguilar-Ruiz, J., et al.: Biclustering of gene expression data by correlation-based scatter search. BioData Mining 4(3) (2011)

    Google Scholar 

  43. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research 27(1), 29–34 (1999)

    Article  Google Scholar 

  44. Orzechowski, P., Boryczko, K.: Parallel approach for visual clustering of protein databases. Computing and Informatics 29(6+), 1221–1231 (2010), http://www.cai.sk/ojs/index.php/cai/article/view/140

  45. Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)

    Article  Google Scholar 

  46. Romesburg, C.: Cluster analysis for researchers. Lulu. com (2004)

    Google Scholar 

  47. Roy, S., Bhattacharyya, D., Kalita, J.: Deterministic approach for biclustering of co-regulated genes from gene expression data. Advances in Knowledge-Based and Intelligent Information and Engineering Systems 243, 490–499 (2012)

    Google Scholar 

  48. Santamaría, R., Quintales, L., Therón, R.: Methods to bicluster validation and comparison in microarray data. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 780–789. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  49. Sharan, R., Elkon, R., Shamir, R.: et al.: Cluster analysis and its applications to gene expression data. In: Ernst Schering Res Found Workshop, vol. 38, pp. 83–108 (2002)

    Google Scholar 

  50. Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle–regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9(12), 3273–3297 (1998)

    Google Scholar 

  51. Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl. 1), S136–S144 (2002)

    Google Scholar 

  52. Tavazoie, S., Hughes, J., Campbell, M., Cho, R., Church, G., et al.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)

    Article  Google Scholar 

  53. Teng, L., Chan, L.: Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data. Journal of Signal Processing Systems 50(3), 267–280 (2008)

    Article  Google Scholar 

  54. Wilcox, R.: Introduction to robust estimation and hypothesis testing. Academic Press (2005)

    Google Scholar 

  55. Wille, A., Zimmermann, P., Vranová, E., Fürholz, A., Laule, O., Bleuler, S., Hennig, L., Prelic, A., Von Rohr, P., Thiele, L., et al: Sparse graphical gaussian modeling of the isoprenoid gene network in arabidopsis thaliana. Genome Biol. 5(11), R92 (2004)

    Google Scholar 

  56. Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Proceedings of Third IEEE Symposium on Bioinformatics and Bioengineering, pp. 321–327 (March 2003)

    Google Scholar 

  57. Yip, K.Y., Cheung, D.W., Ng, M.K.: Harp: A practical projected clustering algorithm. IEEE Trans. on Knowl. and Data Eng. 16(11), 1387–1397 (2004), http://dx.doi.org/10.1109/TKDE.2004.74

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Orzechowski, P. (2013). Proximity Measures and Results Validation in Biclustering – A Survey. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2013. Lecture Notes in Computer Science(), vol 7895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38610-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38610-7_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38609-1

  • Online ISBN: 978-3-642-38610-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics