Abstract
Commonly, in gene expression microarray measurements multiple missing expression values are generated, and the proper handling of missing values is a critical task. To address the issue, in this paper a novel methodology, based on compressive sensing mechanism, is proposed in order to analyze gene expression data on the basis of topological characteristics of gene expression time series. The approach conceives, when data are recovered, their processing through a non-linear PCA for dimensional reduction and a Hierarchical Clustering Algorithm for agglomeration and visualization. Experiments have been performed on the yeast Saccharomyces cerevisiae dataset by considering different percentages of information loss. The approach highlights robust performance when high percentage of loss of information occurs and when few sampling data are available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Draghici, S., Khatri, P., Eklund, A.C., Szallasi, Z.: Reliability and reproducibility issues in DNA microarray measurements. Trends Genet. 22(2), 101–109 (2006)
Camastra, F., Di Taranto, M.D., Staiano, A., Statistical and computational methods for genetic diseases: an overview. Comput. Math. Methods Med. 2015, Article ID 954598 (2015)
Di Gangi, M., Lo Bosco, G., Rizzo, R., Deep learning architectures for prediction of nucleosome positioning from sequences data. BMC Bioinform. 19, Article no. 418 (2018)
Di Taranto, M.D., et al.: Association of USF1 and APOA5 polymorphisms with familial combined hyperlipidemia in an Italian population. Mol. Cell. Probes 29(1), 19–24 (2015)
Fiannaca, A., et al.: Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinform. 19, Article no. 198 (2018)
Staiano, A., et al.: Investigation of single nucleotide polymorphisms associated to familial combined hyperlipidemia with random forests. In: Apolloni, B., Bassis, S., Esposito, A., Morabito, F. (eds.) Neural Nets and Surroundings, vol. 19, pp. 169–178. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35467-0_18
Wang, A., Chen, Y., An, N., Yang, J., Li, L., Jiang, L.: Microarray missing value imputation: a regularized local learning method. IEEE/ACM Trans. Comput. Biol. Bioinform. 16, 980–993 (2018)
Giancarlo, R., Bosco, G.L., Pinello, L., Utro, F.: The three steps of clustering in the post-genomic era: a synopsis. In: Rizzo, R., Lisboa, P.J.G. (eds.) CIBB 2010. LNCS, vol. 6685, pp. 13–30. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21946-7_2
DeRisi, J.L., Iyer, V.R., Brown, P.O.: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278(5338), 680–686 (1997). PMID: 9381177
Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 21–30 (2008)
Ciaramella, A., Gianfico, M., Giunta, G.: Compressive sampling and adaptive dictionary learning for the packet loss recovery in audio multimedia streaming. Multimed. Tools Appl. 75(24), 17375–17392 (2016)
Ciaramella, A., Giunta, G.: Packet loss recovery in audio multimedia streaming by using compressive sensing. IET Commun. 10(4), 387–392 (2016)
Scholz, M., Fraunholz, M., Selbig, J.: Nonlinear principal component analysis: neural network models and applications. In: Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A.Y. (eds.) Principal Manifolds for Data Visualization and Dimension Reduction. LNCSE, vol. 58, pp. 44–67. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73750-6_2
Ciaramella, A., Longo, G., Staiano, A., Tagliaferri, R.: NEC: a hierarchical agglomerative clustering based on fisher and negentropy information. In: Apolloni, B., Marinaro, M., Nicosia, G., Tagliaferri, R. (eds.) NAIS/WIRN -2005. LNCS, vol. 3931, pp. 49–56. Springer, Heidelberg (2006). https://doi.org/10.1007/11731177_8
Nye, T.M., Lió, P., Gilks, W.R.: A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics 22(1), 117–9 (2006)
Acknowledgments
The research was developed when Davide Nardone was a M.Sc. student in Applied Computer Science at University of Naples Parthenope.
This work was partially funded by the University of Naples Parthenope (Sostegno alla ricerca individuale per il triennio 2016–2018 project, and supported by Gruppo Nazionale per il Calcolo Scientifico (GNCS-INdAM)).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ciaramella, A., Nardone, D., Staiano, A. (2020). Compressive Sensing and Hierarchical Clustering for Microarray Data with Missing Values. In: Raposo, M., Ribeiro, P., Sério, S., Staiano, A., Ciaramella, A. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2018. Lecture Notes in Computer Science(), vol 11925. Springer, Cham. https://doi.org/10.1007/978-3-030-34585-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-34585-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34584-6
Online ISBN: 978-3-030-34585-3
eBook Packages: Computer ScienceComputer Science (R0)