Abstract
In gene expression studies, missing values are a common problem with important consequences for the interpretation of the final data (Satija et al., Nat Biotechnol 33(5):495, 2015). Numerous bioinformatics examination tools are used for cancer prediction, including the data set matrix (Bailey et al., Cell 173(2):371–385, 2018); thus, it is necessary to resolve the problem of missing-values imputation. This chapter presents a review of the research on missing-values imputation approaches for gene expression data. By using local and global correlation of the data, we were able to focus mostly on the differences between the algorithms. We classified the algorithms as global, hybrid, local, or knowledge-based techniques. Additionally, this chapter presents suitable assessments of the different approaches. The purpose of this review is to focus on developments in the current techniques for scientists rather than applying different or newly developed algorithms with identical functional goals. The aim was to adapt the algorithms to the characteristics of the data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Fehrmann RS, Karjalainen JM, Krajewska M, Westra H-J, Maloney D, Simeonov A, Pers TH, Hirschhorn JN, Jansen RC, Schultes EA (2015) Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat Genet 47(2):115
Lima-Tenório MK, Pineda EAG, Ahmad NM, Fessi H, Elaissari A (2015) Magnetic nanoparticles: in vivo cancer diagnosis and therapy. Int J Pharm 493(1-2):313–327
Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B (2018) Comprehensive characterization of cancer driver genes and mutations. Cell 173(2):371–385; e318
Criscuolo E, Spadini S, Lamanna J, Ferro M, Burioni R (2017) Bacteriophages and their immunological applications against infectious threats. J Immunol Res 2017:3780697
Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Satija R, Farrell JA, Gennert D, Schier AF, Regev A (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33(5):495
Lai H-H, Chuang T-H, Wong L-K, Lee M-J, Hsieh C-L, Wang H-L, Chen S-U (2017) Identification of mosaic and segmental aneuploidies by next-generation sequencing in preimplantation genetic screening can improve clinical outcomes compared to array-comparative genomic hybridization. Mol Cytogenet 10(1):14
Danaee P, Ghaeini R, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. In: Pacific symposium on biocomputing 2017. World Scientific, pp 219–229
Larose DT, Larose CD (2014) Discovering knowledge in data: an introduction to data mining. Wiley, Hoboken, NJ
Quinn JJ, Chang HY (2016) Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet 17(1):47
Gogoshin G, Boerwinkle E, Rodin AS (2017) New algorithm and software (BNOmics) for inferring and visualizing Bayesian networks from heterogeneous big biological and genetic data. J Comput Biol 24(4):340–356
Zomorrodi AR, Segrè D (2016) Synthetic ecology of microbes: mathematical models and applications. J Mol Biol 428(5):837–861
Hu W, Lin X, Chen K (2015) Integrated analysis of differential gene expression profiles in hippocampi to identify candidate genes involved in Alzheimer’s disease. Mol Med Rep 12(5):6679–6687
Cressie N (2015) Statistics for spatial data. Wiley, Hoboken, NJ
Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinformatics 2015:198363
Lang KM, Little TD (2018) Principled missing data treatments. Prev Sci 19(3):284–294
Josse J, Husson F (2016) missMDA: a package for handling missing values in multivariate data analysis. J Stat Softw 70(1):1–31
Tsai C-F, Li M-L, Lin W-C (2018) A class center based approach for missing value imputation. Knowl-Based Syst 151:124–135
Garvey C, Meng C, Nagy JG (2018) Singular value decomposition approximation via Kronecker summations for imaging applications. arXiv preprint arXiv:180311525
Chatfield C (2018) Introduction to multivariate analysis. Routledge, New York
Tran CT, Zhang M, Andreae P (2016) A genetic programming-based imputation method for classification with missing data. In: European conference on genetic programming. Springer, pp 149–163
Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H (2014) Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am J Epidemiol 179(6):764–774
Bhattacharya S, Rajan V, Anand A (2017) Clustering high dimensional data using gaussian mixture copula model with lasso based regularization. Google Patents
Fox J (2015) Applied regression analysis and generalized linear models. Sage Publications, Thousand Oaks, CA
van der Loo M (2017) Simputation: simple imputation. R package version 02 2
Armina R, Zain AM, Ali NA, Sallehuddin R (2017) A review on missing value estimation using imputation algorithm. J Phys Conf Ser 892:012004
Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, vol 10. Wiley, New York
Colantonio A, Di Pietro R, Ocello A, Verde NV (2010) ABBA: adaptive bicluster-based approach to impute missing values in binary matrices. In: Proceedings of the 2010 ACM symposium on applied computing. ACM, pp 1026–1033
Smart Richman L, Blodorn A, Major B (2016) An identity-based motivational model of the effects of perceived discrimination on health-related behaviors. Group Process Intergroup Relat 19(4):415–425
Naik B, Mahapatra S, Nayak J, Behera H (2017) Fuzzy clustering with improved swarm optimization and genetic algorithm: hybrid approach. In: Computational intelligence in data mining. Springer, pp 237–247
Qi S, Schmid F (2017) Hybrid particle-continuum simulations coupling Brownian dynamics and local dynamic density functional theory. Soft Matter 13(43):7938–7947
Shukur OB, Lee MH (2015) Imputation of missing values in daily wind speed data using hybrid AR-ANN method. Mod Appl Sci 9(11):1
Kayri M (2016) Predictive abilities of bayesian regularization and Levenberg–Marquardt algorithms in artificial neural networks: a comparative empirical study on social data. Math Comput Appl 21(2):20
Gan S, Wang S, Chen Y, Chen X, Huang W, Chen H (2016) Compressive sensing for seismic data reconstruction via fast projection onto convex sets based on seislet transform. J Appl Geophys 130:194–208
van der Loo M, de Jonge E (2018) Statistical data cleaning with applications in R. Wiley, New York
Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, Thomas PD (2016) PANTHER version 11: expanded annotation data from gene ontology and reactome pathways, and data analysis tool enhancements. Nucleic Acids Res 45(D1):D183–D189
Aziz MF, Caetano-Anollés K, Caetano-Anollés G (2016) The early history and emergence of molecular functions and modular scale-free network behavior. Sci Rep 6:25058
Acknowledgements
We would like to thank Universiti Malaysia Pahang for supporting this work under the RDU Grant, Grant number: RDU1703200 and RDU180344.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Moorthy, K., Jaber, A.N., Ismail, M.A., Ernawan, F., Mohamad, M.S., Deris, S. (2019). Missing-Values Imputation Algorithms for Microarray Gene Expression Data. In: Bolón-Canedo, V., Alonso-Betanzos, A. (eds) Microarray Bioinformatics. Methods in Molecular Biology, vol 1986. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9442-7_12
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9442-7_12
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9441-0
Online ISBN: 978-1-4939-9442-7
eBook Packages: Springer Protocols