Human age prediction using DNA methylation and regression methods
Determination of a person’s age can be an important factor in forensic investigation. DNA methylation (DNAm) is a well-known factor signifying change during the aging process but also necessary for the development of mammals. Several studies reported that DNAm can be used as an important marker in predicting the age of a human. This study is carried out to develop the age prediction model using three different regression methods. Multiple linear regression, Support vector regression, and Random forest regression methods are applied using a set of four highly age-correlated CpG sites. For 180 blood samples having age between 2 and 87 years, the mean absolute deviation (MAD) for multiple linear regression method is 8.43 years, for support vector regression is 7.86 years and for random forest regression method is 8.25 years. Further, these models are tested on five different age-groups. The average MAD for multiple linear regression, support vector regression and random forest regression are 3.46, 3.44 and 3.56, respectively. Support vector regression gave the highest accuracy for combined samples as well as for 5 different age groups. It has been concluded from the results that support vector regression is a reliable method for human age prediction.
KeywordsAge Chronological CpG sites DNA methylation Epigenetic Regression
There is no funding source.
Compliance with ethical standards
Conflict of interest
All authors of this paper have no actual or potential conflict of interest including any financial, personal or other relationships with other people or organization.
This article does not contain any studies with human participants or animals performed by any of the authors.
- 9.Goel N, Garg VK (2018) Aging in humans and role of DNA methylation. EC Pharmacol Toxicol 6:891–892Google Scholar
- 23.Wilson VL, Smith RA, Ma S, Cutler RG (1987) Genomic 5-methyldeoxycytidine decreases with age. J Biol Chem 262:9948–9951Google Scholar
- 34.Habib EAE (2012) Mean absolute deviation about median as a tool of explanatory data analysis. Int J Res Rev Appl Sci 11:517–523Google Scholar
- 35.Ngo HT (2012) The steps to follow in a multiple regression analysis. SAS Glob Forum 2012:1–12Google Scholar
- 37.Basak D, Pal S, Patranabis DC (2007) Support vector regression. Neural Inf Process 11:203–224Google Scholar
- 38.Hofmann M (2006) Support vector machines: kernels and the kernel trick. pp 1–16Google Scholar
- 39.Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22Google Scholar
- 40.Breiman L (2001) random forests. In: Random forests. pp 1–33Google Scholar