Abstract
Univariate and multivariate feature selection methods can be used for biomarker discovery in analysis of toxicant exposure. Among the univariate methods, differential expression analysis (DEA) is often applied for its simplicity and interpretability. A characteristic of methods for DEA is that they treat genes individually, disregarding the correlation that exists between them. On the other hand, some multivariate feature selection methods are proposed for biomarker discovery. Provided with various biomarker discovery methods, how to choose the most suitable method for a specific dataset becomes a problem. In this paper, we present a framework for comparison of potential biomarker discovery methods: three methods that stem from different theories are compared by how stable they are and how well they can improve the classification accuracy. The three methods we have considered are: Significance Analysis of Microarrays (SAM) which identifies the differentially expressed genes; minimum Redundancy Maximum Relevance (mRMR) based on information theory; and Characteristic Direction (GeoDE) inspired by a graphical perspective. Tested on the gene expression data from two experiments exposing the cod fish to two different toxicants (MeHg and PCB 153), different methods stand out in different cases, so a decision upon the most suitable method should be made based on the dataset under study and the research interest.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ageeva, T.N., et al.: Gender-specific responses of mature Atlantic cod (Gadus morhua L.) to feed deprivation. Fish. Res. 188, 95–99 (2017)
Goksøyr, A., Solberg, T.S., Serigstad, B.: Immunochemical detection of cytochrome P450IA1 induction in cod larvae and juveniles exposed to a water soluble fraction of North Sea crude oil. Mar. Pollut. Bull. 22(3), 122–127 (1991)
Balk, L., et al.: Biomarkers in natural fish populations indicate adverse biological effects of offshore oil production. PLoS ONE 6(5), e19735 (2011)
Sundt, et al.: WCM 2010, 2012. NIVA, IMR, IRIS report (2012)
Chesman, B.S., et al.: Hepatic metallothionein and total oxyradical scavenging capacity in Atlantic cod Gadus morhua caged in open sea contamination gradients. Aquat. Toxicol. 84(3), 310–20 (2007)
Olsvik, P.A., et al.: Are Atlantic cod in store Lungegrdsvann, a seawater recipient in Bergen, affected by environmental contaminants? A qRT-PCR survey. J. Toxicol. Environ. Health Part A Curr. Issues 72(3–4), 140–154 (2009)
Robotti, E., Manfredi, M., Marengo, E.: Biomarkers discovery through multivariate statistical methods: a review of recently developed methods and applications in proteomics. J. Proteomics Bioinform. 3, 20 (2014)
De Winter, J.C.: Using the student’s t-test with extremely small sample sizes. Pract. Assess. Res. Eval. 18(10), 1–12 (2013)
Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Nat. Acad. Sci. 98(9), 5116–5121 (2001)
Yadetie, F., et al.: Global transcriptome analysis of Atlantic cod (Gadus morhua) liver after in vivo methylmercury exposure suggests effects on energy metabolism pathways. Aquat. Toxicol. 126, 314–325 (2013)
Yadetie, F., et al.: Liver transcriptome analysis of Atlantic cod (Gadus morhua) exposed to PCB 153 indicates effects on cell cycle regulation and lipid metabolism. BMC Genom. 15(1), 481 (2014)
Yadetie, F., et al.: Quantitative analyses of the hepatic proteome of methylmercury-exposed Atlantic cod (Gadus morhua) suggest oxidative stress-mediated effects on cellular energy metabolism. BMC Genom. 17(1), 554 (2016)
Yadetie, F., et al.: Quantitative proteomics analysis reveals perturbation of lipid metabolic pathways in the liver of Atlantic cod (Gadus morhua) treated with PCB 153. Aquat. Toxicol. 185, 19–28 (2017)
Shannon, P., et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)
Tong, A.H.Y., et al.: Global mapping of the yeast genetic interaction network. Science 303(5659), 808–813 (2004)
He, Z., Yu, W.: Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34(4), 215–225 (2010)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Clark, N.R., et al.: The characteristic direction: a geometrical approach to identify differentially expressed genes. BMC Bioinform. 15(1), 79 (2014)
Nogueira, S., Sechidis, K., Brown, G.: On the stability of feature selection algorithms. J. Mach. Learn. Res. 18, 1–54 (2018)
Davis, C.A., et al.: Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19), 2356–2363 (2006)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)
Cawley, G.C., Talbot, N.L.: On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)
Yandell, B.: Practical Data Analysis for Designed Experiments. Routledge, Abingdon (2017)
Acknowledgements
We would like to thank the colleagues in Jonassen Group for helpful discussions and Computational Biology Unit at University of Bergen, where the work was carried out. We also would like to thank the Centre for Digital Life Norway (DLN) and the dCod 1.0 project to which the work is related.
Funding
The dCod 1.0 project is funded under the Digital Life Norway initiative of the BIOTEK 2021 program of the Research Council of Norway (project no. 248840).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, X., Jonassen, I. (2019). A Comparative Analysis of Feature Selection Methods for Biomarker Discovery in Study of Toxicant-Treated Atlantic Cod (Gadus Morhua) Liver. In: Bach, K., Ruocco, M. (eds) Nordic Artificial Intelligence Research and Development. NAIS 2019. Communications in Computer and Information Science, vol 1056. Springer, Cham. https://doi.org/10.1007/978-3-030-35664-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-35664-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35663-7
Online ISBN: 978-3-030-35664-4
eBook Packages: Computer ScienceComputer Science (R0)