MethPed: an R package for the identification of pediatric brain tumor subtypes
- 1.3k Downloads
DNA methylation profiling of pediatric brain tumors offers a new way of diagnosing and subgrouping these tumors which improves current clinical diagnostics based on histopathology. We have therefore developed the MethPed classifier, which is a multiclass random forest algorithm, based on DNA methylation profiles from many subgroups of pediatric brain tumors.
We developed an R package that implements the MethPed classifier, making it easily available and accessible. The package can be used for estimating the probability that an unknown sample belongs to each of nine pediatric brain tumor diagnoses/subgroups.
The MethPed R package efficiently classifies pediatric brain tumors using the developed MethPed classifier. MethPed is available via Bioconductor: http://bioconductor.org/packages/MethPed/
KeywordsDNA methylation 450K Random forest R package Glioblastoma Medulloblastoma Ependymoma Classifier (classification tool) Astrocytoma MethPed
Carcinogenesis involves changes in gene expression that results in tumor specific gene and protein signatures. Such signatures have been used to classify different subtypes of cancers. Gene expression is partly regulated by the methylation state of CpG islands. Cancer tissue is characterized by an increased variability in DNA methylation patterns. DNA methylation profiling has been reported as a robust method to classify and subgroup tumors of different origin . For most pediatric brain tumor diagnoses, methylation profiling can divide the tumors into clinically relevant subgroups reflecting the diverse biology of the different subtypes which further highlights the need for specific therapeutic strategies to target different subgroups. With the increased knowledge about specific brain tumor subgroups and the development of targeted therapy for different entities, it is essential to quickly and accurately determine the correct diagnosis for pediatric brain tumor patients. The most popular and commonly used platform for genome-wide methylation profiling is the Illumina Infinium Human Methylation 450 BeadChip arrays. These arrays profile ~485,000 CpG sites and have been used by the Cancer Genome Atlas Project (TCGA) and in numerous studies of pediatric brain tumors. A correct diagnosis is vital for determining the appropriate treatment protocol for a specific patient, to select the right patients for clinical trials investigating novel therapy for specific diagnoses and subgroups and for basic researchers to be able to draw correct conclusions from experiments. We therefore developed the MethPed classifier , which is a multiclass random forest algorithm , based on DNA methylation profiles from many subgroups of pediatric brain tumors. We have now developed an R package that uses this method, making it easily available and accessible.
The MethPed classifier can be accessed through the ‘MethPed’ package that can be downloaded from Bioconductor, a repository for bioinformatics related applications. The ‘MethPed’ package includes the MethPed classifier and an example data set of two tumors. The example data can be read into the R computing environment with the help of the data function after installing the package.
Data for running MethPed is generated by the Illumina Infinium HumanMethylation 450 BeadChip arrays. Beta values for two samples (Tumor A and Tumor B) are provided with the ‘MethPed’ package as an example . This data has no missing values. If missing values exist in a data set, the impute package can be used for missing value imputation, according to the MethPed vignette in Bioconductor.
Results and discussion
The conditional probability matrix of the classification from the MethPed output can be observed by the ‘summary’ command in R. For visualization of the prediction, bar plots can be generated by using the ‘plot’ command (Fig. 2). If missing probes from the sample compared with the training data included with the package exist, these can be observed by the ‘probeMis’ command.
MethPed is currently the only publically available tool for classification of pediatric brain tumors. The use of methylation profiling for classification of these tumors adds new and important knowledge in the clinical setting for choosing the optimal care and treatment of these patients and will therefore likely complement histopathological diagnoses in the near future [1, 2].
The MethPed R package can be used to efficiently classify pediatric brain tumors using DNA methylation profiles generated by the Illumina 450 K methylation arrays.
DIPG, diffuse intrinsic pontine glioma; ETMR, embryonal tumor with multilayered rosettes; GBM, glioblastoma; TCGA, the Cancer Genome Atlas Project.
This work was supported by the Swedish Cancer Society; the Swedish Children’s Cancer Society; the Swedish Research Council; the Swedish Society for Medical Research; a Marie Curie CIG from the EU’s Seventh Framework Programme (FP7) and the Wenner-Gren foundation.
Availability of data and materials
Data used for building MethPed is available at GEO (accession numbers: GSE50022, GSE55712, GSE36278, GSE52556, GSE54880, GSE45353 and GSE44684).
Project home page: http://bioconductor.org/packages/MethPed/
Operating systems: any operating system supporting R
Programming language: R
Other requirements: working R installation, with Bioconductor version 3.3
Any restriction to use by non-academics: none.
AD and HC initiated the study. MTA developed the R package under guidance from SN, AD and HC. MTA and HC wrote the manuscript. All authors read, contributed to and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
The inclusion of the two test samples were approved by the Regional Ethics Committee in Gothenburg, EPN, Dnr 604–12. Samples were obtained after signed informed consent from the parents of children who underwent surgery at the Sahlgrenska University Hospital.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.