Skip to main content

Bi-clustering by Multi-objective Evolutionary Algorithm for Multimodal Analytics and Big Data

  • Chapter
  • First Online:
Multimodal Analytics for Next-Generation Big Data Technologies and Applications

Abstract

Knowledge discovery is a process of finding hidden knowledge from a large volume of data that involves data mining. Data mining unveils interesting relationships among data and the results can help in making valuable predictions or recommendation in various applications. Bi-clustering is an unsupervised machine learning technique that can uncover useful information from Big data. Bi-clustering has many useful applications in various fields such as pattern classification, information retrieval, gene expression data analysis and functional annotation. The goal of bi-clustering is to detect coherent groups of data by performing clustering along the rows and columns dimension of a dataset simultaneously. Using both the rows and columns information in the data, bi-clustering usually requires the optimization of two or more conflicting objectives. In this chapter, we review some recent state-of-the-art multi-objective, evolutionary-based bi-clustering algorithms and discuss their application in data mining for multimodal and Big data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Frost, S.: Drowning in Big Data? Reducing Information Technology Complexities and Costs for Healthcare Organizations (2015)

    Google Scholar 

  2. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)

    Chapter  Google Scholar 

  3. Fan, J., Han, F., Liu, H.: Challenges of big data analysis. Natl. Sci. Rev. 1, 293–314 (2014)

    Article  Google Scholar 

  4. Bailey, K.D.: Numerical Taxonomy and Cluster Analysis. Typologies and Taxonomies, pp. 35–65. Sage, Thousand Oaks (1994)

    Book  Google Scholar 

  5. Zhao, H., Liew, A.W.C., Wang, D.Z., Yan, H.: Biclustering analysis for pattern discovery: current techniques, comparative studies and applications. Curr. Bioinf. 7, 43–55 (2012)

    Article  Google Scholar 

  6. Liew, A.W.C., Gan, X., Law, N.F., Yan, H.: Bicluster Analysis for Coherent Pattern Discovery. In: Encyclopedia of Information Science and Technology, IGI Global, pp. 1665–1674 (2015)

    Google Scholar 

  7. Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67, 123–129 (1972)

    Article  Google Scholar 

  8. Mirkin, B.G.E.: Mathematical classification and clustering. Kluwer Academic, Dordrecht (1996)

    Book  Google Scholar 

  9. Liew, A.W.C.: Biclustering analysis of gene expression data using evolutionary algorithms. In: Iba, H., Noman, N. (eds.) Evolutionary Computation in Gene Regulatory Network Research, pp. 67–95. Wiley, Hoboken (2016)

    Chapter  Google Scholar 

  10. MacDonald, T.J., Brown, K.M., LaFleur, B., Peterson, K., Lawlor, C., Chen, Y., Packer, R.J., Cogen, P., Stephan, D.A.: Expression profiling of medulloblastoma: PDGFRA and the RAS/MAPK pathway as therapeutic targets for metastatic disease. Nat. Genet. 29, 143–152 (2001)

    Article  Google Scholar 

  11. Cha, K., Oh, K., Hwang, T., Yi, G.-S.: Identification of coexpressed gene modules across multiple brain diseases by a biclustering analysis on integrated gene expression data. In: Proceedings of the ACM 8th International Workshop on Data and Text Mining in Bioinformatics, ACM, pp. 17–17 (2014)

    Google Scholar 

  12. Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., Modha, D.S.: A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J. Mach. Learn. Res. 8, 1919–1986 (2007)

    MathSciNet  MATH  Google Scholar 

  13. Goyal, A., Ren, R., Jose, J.M.: Feature subspace selection for efficient video retrieval. In: Boll, S., Tian, Q., Zhang, L., Zhang, Z., Chen, Y.P. (eds.) Advances in Multimedia Modeling. MMM 2010, pp. 725–730. Springer, Berlin (2010)

    Chapter  Google Scholar 

  14. Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 394–405 (2002)

    Google Scholar 

  15. Han, L., Yan, H.: A fuzzy biclustering algorithm for social annotations. J. Inf. Sci. 35, 426–438 (2009)

    Article  Google Scholar 

  16. Li, H., Yan, H.: Bicluster analysis of currency exchange rates. In: Prasad, B. (ed.) Soft Computing Applications in Business, pp. 19–34. Springer, Berlin (2008)

    Chapter  Google Scholar 

  17. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceeding of Intelligent Systems for Molecular Biology (ISMB), American Association for Artificial Intelligence (AAAI), pp. 93–103 (2000)

    Google Scholar 

  18. Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., Coello, C.A.C.: A survey of multiobjective evolutionary algorithms for data mining: Part I. IEEE Trans. Evol. Comput. 18, 4–19 (2014)

    Article  Google Scholar 

  19. Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., Coello, C.A.C.: Survey of multiobjective evolutionary algorithms for data mining: Part II. IEEE Trans. Evol. Comput. 18, 20–35 (2014)

    Article  Google Scholar 

  20. Carmona Saez, P., Chagoyen, M., Tirado, F., Carazo, J.M., Pascual Montano, A.: GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists. Genome Biol. 8, R3 (2007)

    Article  Google Scholar 

  21. Nogales Cadenas, R., Carmona Saez, P., Vazquez, M., Vicente, C., Yang, X., Tirado, F., Carazo, J.M., Pascual Montano, A.: GeneCodis: interpreting gene lists through enrichment analysis and integration of diverse biological information. Nucleic Acids Res. 37, W317–W322 (2009)

    Article  Google Scholar 

  22. De Jong, K.A.: Evolutionary Computation: A Unified Approach. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  23. Coelho, G.P., de França, F.O., Von Zuben, F.J.: A multi-objective multipopulation approach for biclustering. In: de Castro, L.N., Timmis, J. (eds.) Artificial Immune Systems, pp. 71–82. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  24. Liu, J., Li, Z., Hu, X., Chen, Y., Liu, F.: Multi-objective dynamic population shuffled frog-leaping biclustering of microarray data. BMC Genomics. 13, S6 (2012)

    Article  Google Scholar 

  25. Liu, J., Li, Z., Hu, X., Chen, Y., Park, E.K.: Dynamic biclustering of microarray data by multi-objective immune optimization. BMC Genomics. 12, S11 (2011)

    Article  Google Scholar 

  26. Liu, J., Li, Z., Liu, F., Chen, Y.: Multi-objective particle swarm optimization biclustering of microarray data. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, pp. 363–366 (2008)

    Google Scholar 

  27. Maulik, U., Mukhopadhyay, A., Bandyopadhyay, S.: Finding multiple coherent biclusters in microarray data using variable string length multiobjective genetic algorithm. IEEE Trans. Inf. Technol. Biomed. 13, 969–975 (2009)

    Article  Google Scholar 

  28. Mitra, S., Banka, H.: Multi-objective evolutionary biclustering of gene expression data. Pattern Recognit. 39, 2464–2477 (2006)

    Article  Google Scholar 

  29. Seridi, K., Jourdan, L., Talbi, E.G.: Multi-objective evolutionary algorithm for biclustering in microarrays data. In: IEEE Congress on Evolutionary Computation (CEC), IEEE, pp. 2593–2599 (2011)

    Google Scholar 

  30. Seridi, K., Jourdan, L., Talbi, E.G.: Using multiobjective optimization for biclustering microarray data. Appl. Soft Comput. 33, 239–249 (2015)

    Article  Google Scholar 

  31. Golchin, M., Davarpanah, S.H., Liew, A.W.C.: Biclustering analysis of gene expression data using multi-objective evolutionary algorithms. In: Proceeding of the 2015 International Conference on Machine Learning and Cybernetics IEEE, Guangzhou, pp. 505–510 (2015)

    Google Scholar 

  32. M. Golchin, A.W.C. Liew, Bicluster detection using strength pareto front evolutionary algorithm. In: Proceedings of the Australasian Computer Science Week Multiconference, ACM, Canberra, pp. 1–6 (2016)

    Google Scholar 

  33. Golchin, M., Liew, A.W.C.: Parallel biclustering detection using strength pareto front evolutionary algorithm. Inf. Sci. 415–416, 283–297 (2017)

    Article  Google Scholar 

  34. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, pp. 269–274 (2001)

    Google Scholar 

  35. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Washington, DC, pp. 89–98 (2003)

    Google Scholar 

  36. De Castro, L.N., Timmis, J.: Artificial Immune Systems: A New Computational Intelligence Approach. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  37. Divina, F., Aguilar Ruiz, J.S.: Biclustering of expression data with evolutionary computation. IEEE Trans. Knowl. Data Eng. 18, 590–602 (2006)

    Article  Google Scholar 

  38. Roh, H., Park, S.: A novel evolutionary algorithm for bi-clustering of gene expression data based on the order preserving sub-matrix (OPSM) constraint. In: 8th IEEE International Conference on BioInformatics and BioEngineering (BIBE), IEEE, pp. 1–14 (2008)

    Google Scholar 

  39. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)

    Article  Google Scholar 

  40. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength pareto evolutionary algorithm. In: Proceedings of the Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems (EUROGEN), Eidgenössische Technische Hochschule Zürich (ETH), Institut für Technische Informatik und Kommunikationsnetze (TIK), Athens (2001)

    Google Scholar 

  41. Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic algorithms: a tutorial. Reliab. Eng. Syst. Saf. 91, 992–1007 (2006)

    Article  Google Scholar 

  42. Yip, K.Y., Cheung, D.W., Ng, M.K.: Harp: a practical projected clustering algorithm. IEEE Trans. Knowl. Data Eng. 16, 1387–1397 (2004)

    Article  Google Scholar 

  43. Shabalin, A.A., Weigman, V.J., Perou, C.M., Nobel, A.B.: Finding large average submatrices in high dimensional data. Ann. Appl. Stat. 985–1012 (2009)

    Article  MathSciNet  Google Scholar 

  44. Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 77–88 (2003)

    Google Scholar 

  45. Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mitterecker, A., Kasim, A., Khamiakova, T., Van Sanden, S., Lin, D., Talloen, W.: FABIA: factor analysis for bicluster acquisition. Bioinformatics. 26, 1520–1527 (2010)

    Article  Google Scholar 

  46. Zhu, X., Luo, X., Xu, C.: Editorial learning for multimodal data. Neurocomputing. 253, 1–5 (2017)

    Article  Google Scholar 

  47. Bozkır, A.S., Mazman, S.G., Sezer, E.A.: Identification of user patterns in social networks by data mining techniques: Facebook case. In: Second International Symposium on Information Management in a Changing World (IMCW 2010), Ankara, Turkey, pp. 145–153 (2010)

    Google Scholar 

  48. Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell. 2, 65–73 (1998)

    Article  Google Scholar 

  49. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, M.J., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T.: Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000)

    Article  Google Scholar 

  50. Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M., Sherlock, G.: GO: TermFinder—open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics. 20, 3710–3715 (2004)

    Article  Google Scholar 

  51. Kanehisa, M., Goto, S.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000)

    Article  Google Scholar 

  52. Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 524–531 (2005)

    Google Scholar 

  53. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CPRV), IEEE, New York, pp. 2169–2178 (2006)

    Google Scholar 

  54. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)

    Article  Google Scholar 

  55. Leskovec, J., Mcauley, J.J.: Learning to discover social circles in ego networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, Nevada, pp. 539–547 (2012)

    Google Scholar 

  56. Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, ACM, pp. 251–260 (2010)

    Google Scholar 

  57. Bolotaeva, V., Cata, T.: Marketing opportunities with social networks. J. Internet Soc. Netw. Virtual Commun. 2011, 1–8 (2011)

    Google Scholar 

Download references

Acknowledgement

Maryam Golchin is supported by the Australian Government Research Training Program Scholarship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan Wee-Chung Liew .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Golchin, M., Liew, A.WC. (2019). Bi-clustering by Multi-objective Evolutionary Algorithm for Multimodal Analytics and Big Data. In: Seng, K., Ang, Lm., Liew, AC., Gao, J. (eds) Multimodal Analytics for Next-Generation Big Data Technologies and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-97598-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97598-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97597-9

  • Online ISBN: 978-3-319-97598-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics