Abstract
Knowledge discovery is a process of finding hidden knowledge from a large volume of data that involves data mining. Data mining unveils interesting relationships among data and the results can help in making valuable predictions or recommendation in various applications. Bi-clustering is an unsupervised machine learning technique that can uncover useful information from Big data. Bi-clustering has many useful applications in various fields such as pattern classification, information retrieval, gene expression data analysis and functional annotation. The goal of bi-clustering is to detect coherent groups of data by performing clustering along the rows and columns dimension of a dataset simultaneously. Using both the rows and columns information in the data, bi-clustering usually requires the optimization of two or more conflicting objectives. In this chapter, we review some recent state-of-the-art multi-objective, evolutionary-based bi-clustering algorithms and discuss their application in data mining for multimodal and Big data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Frost, S.: Drowning in Big Data? Reducing Information Technology Complexities and Costs for Healthcare Organizations (2015)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
Fan, J., Han, F., Liu, H.: Challenges of big data analysis. Natl. Sci. Rev. 1, 293–314 (2014)
Bailey, K.D.: Numerical Taxonomy and Cluster Analysis. Typologies and Taxonomies, pp. 35–65. Sage, Thousand Oaks (1994)
Zhao, H., Liew, A.W.C., Wang, D.Z., Yan, H.: Biclustering analysis for pattern discovery: current techniques, comparative studies and applications. Curr. Bioinf. 7, 43–55 (2012)
Liew, A.W.C., Gan, X., Law, N.F., Yan, H.: Bicluster Analysis for Coherent Pattern Discovery. In: Encyclopedia of Information Science and Technology, IGI Global, pp. 1665–1674 (2015)
Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67, 123–129 (1972)
Mirkin, B.G.E.: Mathematical classification and clustering. Kluwer Academic, Dordrecht (1996)
Liew, A.W.C.: Biclustering analysis of gene expression data using evolutionary algorithms. In: Iba, H., Noman, N. (eds.) Evolutionary Computation in Gene Regulatory Network Research, pp. 67–95. Wiley, Hoboken (2016)
MacDonald, T.J., Brown, K.M., LaFleur, B., Peterson, K., Lawlor, C., Chen, Y., Packer, R.J., Cogen, P., Stephan, D.A.: Expression profiling of medulloblastoma: PDGFRA and the RAS/MAPK pathway as therapeutic targets for metastatic disease. Nat. Genet. 29, 143–152 (2001)
Cha, K., Oh, K., Hwang, T., Yi, G.-S.: Identification of coexpressed gene modules across multiple brain diseases by a biclustering analysis on integrated gene expression data. In: Proceedings of the ACM 8th International Workshop on Data and Text Mining in Bioinformatics, ACM, pp. 17–17 (2014)
Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., Modha, D.S.: A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J. Mach. Learn. Res. 8, 1919–1986 (2007)
Goyal, A., Ren, R., Jose, J.M.: Feature subspace selection for efficient video retrieval. In: Boll, S., Tian, Q., Zhang, L., Zhang, Z., Chen, Y.P. (eds.) Advances in Multimedia Modeling. MMM 2010, pp. 725–730. Springer, Berlin (2010)
Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 394–405 (2002)
Han, L., Yan, H.: A fuzzy biclustering algorithm for social annotations. J. Inf. Sci. 35, 426–438 (2009)
Li, H., Yan, H.: Bicluster analysis of currency exchange rates. In: Prasad, B. (ed.) Soft Computing Applications in Business, pp. 19–34. Springer, Berlin (2008)
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceeding of Intelligent Systems for Molecular Biology (ISMB), American Association for Artificial Intelligence (AAAI), pp. 93–103 (2000)
Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., Coello, C.A.C.: A survey of multiobjective evolutionary algorithms for data mining: Part I. IEEE Trans. Evol. Comput. 18, 4–19 (2014)
Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., Coello, C.A.C.: Survey of multiobjective evolutionary algorithms for data mining: Part II. IEEE Trans. Evol. Comput. 18, 20–35 (2014)
Carmona Saez, P., Chagoyen, M., Tirado, F., Carazo, J.M., Pascual Montano, A.: GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists. Genome Biol. 8, R3 (2007)
Nogales Cadenas, R., Carmona Saez, P., Vazquez, M., Vicente, C., Yang, X., Tirado, F., Carazo, J.M., Pascual Montano, A.: GeneCodis: interpreting gene lists through enrichment analysis and integration of diverse biological information. Nucleic Acids Res. 37, W317–W322 (2009)
De Jong, K.A.: Evolutionary Computation: A Unified Approach. MIT Press, Cambridge (2006)
Coelho, G.P., de França, F.O., Von Zuben, F.J.: A multi-objective multipopulation approach for biclustering. In: de Castro, L.N., Timmis, J. (eds.) Artificial Immune Systems, pp. 71–82. Springer, Heidelberg (2008)
Liu, J., Li, Z., Hu, X., Chen, Y., Liu, F.: Multi-objective dynamic population shuffled frog-leaping biclustering of microarray data. BMC Genomics. 13, S6 (2012)
Liu, J., Li, Z., Hu, X., Chen, Y., Park, E.K.: Dynamic biclustering of microarray data by multi-objective immune optimization. BMC Genomics. 12, S11 (2011)
Liu, J., Li, Z., Liu, F., Chen, Y.: Multi-objective particle swarm optimization biclustering of microarray data. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, pp. 363–366 (2008)
Maulik, U., Mukhopadhyay, A., Bandyopadhyay, S.: Finding multiple coherent biclusters in microarray data using variable string length multiobjective genetic algorithm. IEEE Trans. Inf. Technol. Biomed. 13, 969–975 (2009)
Mitra, S., Banka, H.: Multi-objective evolutionary biclustering of gene expression data. Pattern Recognit. 39, 2464–2477 (2006)
Seridi, K., Jourdan, L., Talbi, E.G.: Multi-objective evolutionary algorithm for biclustering in microarrays data. In: IEEE Congress on Evolutionary Computation (CEC), IEEE, pp. 2593–2599 (2011)
Seridi, K., Jourdan, L., Talbi, E.G.: Using multiobjective optimization for biclustering microarray data. Appl. Soft Comput. 33, 239–249 (2015)
Golchin, M., Davarpanah, S.H., Liew, A.W.C.: Biclustering analysis of gene expression data using multi-objective evolutionary algorithms. In: Proceeding of the 2015 International Conference on Machine Learning and Cybernetics IEEE, Guangzhou, pp. 505–510 (2015)
M. Golchin, A.W.C. Liew, Bicluster detection using strength pareto front evolutionary algorithm. In: Proceedings of the Australasian Computer Science Week Multiconference, ACM, Canberra, pp. 1–6 (2016)
Golchin, M., Liew, A.W.C.: Parallel biclustering detection using strength pareto front evolutionary algorithm. Inf. Sci. 415–416, 283–297 (2017)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, pp. 269–274 (2001)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Washington, DC, pp. 89–98 (2003)
De Castro, L.N., Timmis, J.: Artificial Immune Systems: A New Computational Intelligence Approach. Springer, Heidelberg (2002)
Divina, F., Aguilar Ruiz, J.S.: Biclustering of expression data with evolutionary computation. IEEE Trans. Knowl. Data Eng. 18, 590–602 (2006)
Roh, H., Park, S.: A novel evolutionary algorithm for bi-clustering of gene expression data based on the order preserving sub-matrix (OPSM) constraint. In: 8th IEEE International Conference on BioInformatics and BioEngineering (BIBE), IEEE, pp. 1–14 (2008)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)
Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength pareto evolutionary algorithm. In: Proceedings of the Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems (EUROGEN), Eidgenössische Technische Hochschule Zürich (ETH), Institut für Technische Informatik und Kommunikationsnetze (TIK), Athens (2001)
Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic algorithms: a tutorial. Reliab. Eng. Syst. Saf. 91, 992–1007 (2006)
Yip, K.Y., Cheung, D.W., Ng, M.K.: Harp: a practical projected clustering algorithm. IEEE Trans. Knowl. Data Eng. 16, 1387–1397 (2004)
Shabalin, A.A., Weigman, V.J., Perou, C.M., Nobel, A.B.: Finding large average submatrices in high dimensional data. Ann. Appl. Stat. 985–1012 (2009)
Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 77–88 (2003)
Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mitterecker, A., Kasim, A., Khamiakova, T., Van Sanden, S., Lin, D., Talloen, W.: FABIA: factor analysis for bicluster acquisition. Bioinformatics. 26, 1520–1527 (2010)
Zhu, X., Luo, X., Xu, C.: Editorial learning for multimodal data. Neurocomputing. 253, 1–5 (2017)
Bozkır, A.S., Mazman, S.G., Sezer, E.A.: Identification of user patterns in social networks by data mining techniques: Facebook case. In: Second International Symposium on Information Management in a Changing World (IMCW 2010), Ankara, Turkey, pp. 145–153 (2010)
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell. 2, 65–73 (1998)
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, M.J., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T.: Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000)
Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M., Sherlock, G.: GO: TermFinder—open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics. 20, 3710–3715 (2004)
Kanehisa, M., Goto, S.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000)
Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 524–531 (2005)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CPRV), IEEE, New York, pp. 2169–2178 (2006)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Leskovec, J., Mcauley, J.J.: Learning to discover social circles in ego networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, Nevada, pp. 539–547 (2012)
Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, ACM, pp. 251–260 (2010)
Bolotaeva, V., Cata, T.: Marketing opportunities with social networks. J. Internet Soc. Netw. Virtual Commun. 2011, 1–8 (2011)
Acknowledgement
Maryam Golchin is supported by the Australian Government Research Training Program Scholarship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Golchin, M., Liew, A.WC. (2019). Bi-clustering by Multi-objective Evolutionary Algorithm for Multimodal Analytics and Big Data. In: Seng, K., Ang, Lm., Liew, AC., Gao, J. (eds) Multimodal Analytics for Next-Generation Big Data Technologies and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-97598-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-97598-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97597-9
Online ISBN: 978-3-319-97598-6
eBook Packages: Computer ScienceComputer Science (R0)