Missing-Values Imputation Algorithms for Microarray Gene Expression Data

Moorthy, Kohbalan; Jaber, Aws Naser; Ismail, Mohd Arfian; Ernawan, Ferda; Mohamad, Mohd Saberi; Deris, Safaai

doi:10.1007/978-1-4939-9442-7_12

Missing-Values Imputation Algorithms for Microarray Gene Expression Data

Kohbalan Moorthy⁴,
Aws Naser Jaber⁴,
Mohd Arfian Ismail⁴,
Ferda Ernawan⁴,
Mohd Saberi Mohamad⁵ &
…
Safaai Deris⁵

Protocol
First Online: 22 May 2019

1718 Accesses
12 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1986))

Abstract

In gene expression studies, missing values are a common problem with important consequences for the interpretation of the final data (Satija et al., Nat Biotechnol 33(5):495, 2015). Numerous bioinformatics examination tools are used for cancer prediction, including the data set matrix (Bailey et al., Cell 173(2):371–385, 2018); thus, it is necessary to resolve the problem of missing-values imputation. This chapter presents a review of the research on missing-values imputation approaches for gene expression data. By using local and global correlation of the data, we were able to focus mostly on the differences between the algorithms. We classified the algorithms as global, hybrid, local, or knowledge-based techniques. Additionally, this chapter presents suitable assessments of the different approaches. The purpose of this review is to focus on developments in the current techniques for scientists rather than applying different or newly developed algorithms with identical functional goals. The aim was to adapt the algorithms to the characteristics of the data.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Fehrmann RS, Karjalainen JM, Krajewska M, Westra H-J, Maloney D, Simeonov A, Pers TH, Hirschhorn JN, Jansen RC, Schultes EA (2015) Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat Genet 47(2):115
Article CAS Google Scholar
Lima-Tenório MK, Pineda EAG, Ahmad NM, Fessi H, Elaissari A (2015) Magnetic nanoparticles: in vivo cancer diagnosis and therapy. Int J Pharm 493(1-2):313–327
Article Google Scholar
Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B (2018) Comprehensive characterization of cancer driver genes and mutations. Cell 173(2):371–385; e318
Article CAS Google Scholar
Criscuolo E, Spadini S, Lamanna J, Ferro M, Burioni R (2017) Bacteriophages and their immunological applications against infectious threats. J Immunol Res 2017:3780697
Article Google Scholar
Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134
Article Google Scholar
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article CAS Google Scholar
Satija R, Farrell JA, Gennert D, Schier AF, Regev A (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33(5):495
Article CAS Google Scholar
Lai H-H, Chuang T-H, Wong L-K, Lee M-J, Hsieh C-L, Wang H-L, Chen S-U (2017) Identification of mosaic and segmental aneuploidies by next-generation sequencing in preimplantation genetic screening can improve clinical outcomes compared to array-comparative genomic hybridization. Mol Cytogenet 10(1):14
Article Google Scholar
Danaee P, Ghaeini R, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. In: Pacific symposium on biocomputing 2017. World Scientific, pp 219–229
Google Scholar
Larose DT, Larose CD (2014) Discovering knowledge in data: an introduction to data mining. Wiley, Hoboken, NJ
Google Scholar
Quinn JJ, Chang HY (2016) Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet 17(1):47
Article CAS Google Scholar
Gogoshin G, Boerwinkle E, Rodin AS (2017) New algorithm and software (BNOmics) for inferring and visualizing Bayesian networks from heterogeneous big biological and genetic data. J Comput Biol 24(4):340–356
Article CAS Google Scholar
Zomorrodi AR, Segrè D (2016) Synthetic ecology of microbes: mathematical models and applications. J Mol Biol 428(5):837–861
Article CAS Google Scholar
Hu W, Lin X, Chen K (2015) Integrated analysis of differential gene expression profiles in hippocampi to identify candidate genes involved in Alzheimer’s disease. Mol Med Rep 12(5):6679–6687
Article CAS Google Scholar
Cressie N (2015) Statistics for spatial data. Wiley, Hoboken, NJ
Google Scholar
Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinformatics 2015:198363
Article Google Scholar
Lang KM, Little TD (2018) Principled missing data treatments. Prev Sci 19(3):284–294
Article Google Scholar
Josse J, Husson F (2016) missMDA: a package for handling missing values in multivariate data analysis. J Stat Softw 70(1):1–31
Article Google Scholar
Tsai C-F, Li M-L, Lin W-C (2018) A class center based approach for missing value imputation. Knowl-Based Syst 151:124–135
Article Google Scholar
Garvey C, Meng C, Nagy JG (2018) Singular value decomposition approximation via Kronecker summations for imaging applications. arXiv preprint arXiv:180311525
Google Scholar
Chatfield C (2018) Introduction to multivariate analysis. Routledge, New York
Book Google Scholar
Tran CT, Zhang M, Andreae P (2016) A genetic programming-based imputation method for classification with missing data. In: European conference on genetic programming. Springer, pp 149–163
Google Scholar
Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H (2014) Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am J Epidemiol 179(6):764–774
Article Google Scholar
Bhattacharya S, Rajan V, Anand A (2017) Clustering high dimensional data using gaussian mixture copula model with lasso based regularization. Google Patents
Google Scholar
Fox J (2015) Applied regression analysis and generalized linear models. Sage Publications, Thousand Oaks, CA
Google Scholar
van der Loo M (2017) Simputation: simple imputation. R package version 02 2
Google Scholar
Armina R, Zain AM, Ali NA, Sallehuddin R (2017) A review on missing value estimation using imputation algorithm. J Phys Conf Ser 892:012004
Article Google Scholar
Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, vol 10. Wiley, New York
Book Google Scholar
Colantonio A, Di Pietro R, Ocello A, Verde NV (2010) ABBA: adaptive bicluster-based approach to impute missing values in binary matrices. In: Proceedings of the 2010 ACM symposium on applied computing. ACM, pp 1026–1033
Google Scholar
Smart Richman L, Blodorn A, Major B (2016) An identity-based motivational model of the effects of perceived discrimination on health-related behaviors. Group Process Intergroup Relat 19(4):415–425
Article Google Scholar
Naik B, Mahapatra S, Nayak J, Behera H (2017) Fuzzy clustering with improved swarm optimization and genetic algorithm: hybrid approach. In: Computational intelligence in data mining. Springer, pp 237–247
Google Scholar
Qi S, Schmid F (2017) Hybrid particle-continuum simulations coupling Brownian dynamics and local dynamic density functional theory. Soft Matter 13(43):7938–7947
Article CAS Google Scholar
Shukur OB, Lee MH (2015) Imputation of missing values in daily wind speed data using hybrid AR-ANN method. Mod Appl Sci 9(11):1
Article Google Scholar
Kayri M (2016) Predictive abilities of bayesian regularization and Levenberg–Marquardt algorithms in artificial neural networks: a comparative empirical study on social data. Math Comput Appl 21(2):20
Google Scholar
Gan S, Wang S, Chen Y, Chen X, Huang W, Chen H (2016) Compressive sensing for seismic data reconstruction via fast projection onto convex sets based on seislet transform. J Appl Geophys 130:194–208
Article Google Scholar
van der Loo M, de Jonge E (2018) Statistical data cleaning with applications in R. Wiley, New York
Google Scholar
Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, Thomas PD (2016) PANTHER version 11: expanded annotation data from gene ontology and reactome pathways, and data analysis tool enhancements. Nucleic Acids Res 45(D1):D183–D189
Article Google Scholar
Aziz MF, Caetano-Anollés K, Caetano-Anollés G (2016) The early history and emergence of molecular functions and modular scale-free network behavior. Sci Rep 6:25058
Article CAS Google Scholar

Download references

Acknowledgements

We would like to thank Universiti Malaysia Pahang for supporting this work under the RDU Grant, Grant number: RDU1703200 and RDU180344.

Author information

Authors and Affiliations

Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang, Kuantan, Pahang, Malaysia
Kohbalan Moorthy, Aws Naser Jaber, Mohd Arfian Ismail & Ferda Ernawan
Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
Mohd Saberi Mohamad & Safaai Deris

Authors

Kohbalan Moorthy
View author publications
You can also search for this author in PubMed Google Scholar
Aws Naser Jaber
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Arfian Ismail
View author publications
You can also search for this author in PubMed Google Scholar
Ferda Ernawan
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Saberi Mohamad
View author publications
You can also search for this author in PubMed Google Scholar
Safaai Deris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kohbalan Moorthy .

Editor information

Editors and Affiliations

CITIC, Universidade da Coruña, A Coruña, Spain
Verónica Bolón-Canedo
CITIC, Universidade da Coruña, A Coruña, Spain
Amparo Alonso-Betanzos

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Moorthy, K., Jaber, A.N., Ismail, M.A., Ernawan, F., Mohamad, M.S., Deris, S. (2019). Missing-Values Imputation Algorithms for Microarray Gene Expression Data. In: Bolón-Canedo, V., Alonso-Betanzos, A. (eds) Microarray Bioinformatics. Methods in Molecular Biology, vol 1986. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9442-7_12

Download citation

DOI: https://doi.org/10.1007/978-1-4939-9442-7_12
Published: 22 May 2019
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9441-0
Online ISBN: 978-1-4939-9442-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics