Alopecia or hair loss is a complex polygenetic and psychologically devastating disease affecting millions of men and women globally. Since the gene annotation and environmental knowledge is limited for alopecia, a systematic analysis for the identification of candidate biomarkers is required that could provide potential therapeutic targets for hair loss therapy.
We designed an interactive framework to perform a meta-analytical study based on differential expression analysis, systems biology, and functional proteomic investigations. We analyzed eight publicly available microarray datasets and found 12 potential candidate biomarkers including three extracellular proteins from the list of differentially expressed genes with a p-value < 0.05. After expression profiling and functional analysis, we studied protein–protein interactions and observed functional associations of source proteins including WIF1, SPON1, LYZ, GPRC5B, PTPRE, ZFP36L2, HBB, PHF15, LMCD1, KRT35 and VAV3 with target proteins including APCDD1, WNT1, WNT3A, SHH, ESRI, TGFB1, and APP. Pathway analysis of these molecules revealed their role in major physiological reactions including protein metabolism, signal transduction, WNT, BMP, EDA, NOTCH and SHH pathways. These pathways regulate hair growth, hair follicle differentiation, pigmentation, and morphogenesis. We studied the regulatory role of β-catenin, Nf-kappa B, cytokines and retinoic acid in the development of hair growth. Therefore, the differential expression of these significant proteins would affect the normal level and could cause aberrations in hair growth.
Our integrative approach helps to prioritize the biomarkers that ultimately lessen the economic burden of experimental studies. It will also be valuable to discover mutants in genomic data in order to increase the identification of new biomarkers for similar problems.
Alopecia or baldness is characterized by either patchy or complete loss of scalp hair . It affects millions of men and women of all ethnic backgrounds and can be psychologically shocking. It is a complex polygenic disease that can occur at any age affecting the general population between 1 and 2% [1, 2]. The severity of the disease varies widely in the population, but very little is known about genetic and environmental factors. A comprehensive understanding of the physiological mechanisms and signaling pathways involved in the progression of the disease is lacking. How inflammation and infectious agents, genetic and immunological factors are associated with the pathophysiology of alopecia is not clear. The currently available treatment options have variable efficacy and are not satisfactory [3, 4] due to limited understanding of the causes and prognosis. The high prevalence and ineffective cure call for extensive research to establish better therapeutic strategies . A systematic approach is required to identify candidate biomarkers regulating major biological pathways linked to alopecia and potential therapeutic targets for its therapy.
Biomarkers are quantifiable traits that can be used to analyze normal as well as pathological processes . These candidate molecules can be used for prediction of relapse, screening and to examine the response of the treatment of hair loss. Due to a lack of a suitable analytical framework, significant problems have been confronted during the translation of candidate biomarkers from cDNA microarray research into proteomic investigations for clinical applications [7,8,9]. The gene expression profiling of the epidermal and dermal keratinocytes is responsible for the activation of adipogenic factors and the levels of chemical-markers  including WIF1, GPRC5B, PTPRE, and LMCD1. These biological molecules are known to regulate hair growth cycle. Recent progress in genomics and proteomics analysis enabled the identification of many proteins and the discovery of new biomarkers.
Microarray and quantitative proteomics generally produce a massive amount of data that need to be further examined to find biomarkers and drug targets. For genome-wide pathological analysis, numerous expression data sets are publicly available. Since data unavailability and platform diversity pose major challenges to the analysis of human diseases, the meta-analysis approach has received significant consideration for analyzing various datasets. This approach offers more detailed analysis, statistical support, and significant results compared to any single study [10, 11]. Simulation-based differential gene expression has enabled the discovery of genetic variants and the generation of a quantitative gene expression scoring system that is critical for disease pathogenesis. In this study, we designed a systematic and logical computational-based meta-analytical framework involving systems biology approaches that would be helpful in identifying the causative agents of hair loss leading to the discovery of biomarkers. Differential expression analysis of eight microarray datasets was performed to determine candidate genes with significant differential regulation scoring system. Identification of key proteins as potential biomarkers based on their expression patterns has become possible through the use of databases which reveal the expression of several genes in tissues . Comprehensive in silico systems-level biological analysis was employed to determine significant alopecia biomarkers through a multistep prioritization method. The physicochemical and functional characteristics of these protein molecules were also studied. Moreover, the protein–protein interactions and pathways networks showed the functional association of our candidate proteins with hair follicle development and regulation of β-catenin, Nf-kappa B, cytokines, and retinoic acid levels. Probing of new biomarkers can deliver valuable insights in both biological and medical research after studying these molecular networks. It also increases biomarker discovery and can be used in large public datasets for better results in other diseases as well.
Identification and accession of eligible source data
The aim of the meta-analysis was to identify potential biomarkers for alopecia to effectively diagnose the causes of hair loss. Alopecia-related expression datasets were identified and retrieved from the Gene Expression Omnibus (GEO) database excluding the non-human studies. Information extracted from each dataset covers GEO accession number, sample type, platform, number of samples, and gene expression data. This study examines the genes commonly covered by hgu133plus2 chips. In these datasets, the Affymetrix Gene Chip Human Genome U133Plus2.0 Array platform and the annotation information (hgu133plus2) of probes were used to check differentially expressed genes. R-platform and BioConductor packages including Affy, AffyQCReport, AffyRNADegradation, AnnotationDbi, Annotate, Biobase, Limma, hgu133a2cdf and hgu133plus2cdf were employed to evaluate the statistical significance of the results (Additional file 1: Fig. S1).
Normalization and differential expression analysis
Normalization was used to compare microarray data sets. The pheno-data files of these data sets were organized in an identifiable format . Background correction, i.e. for perfect matches (PM) and mismatches (MM), was calculated as given in equation (i). Robust Multi-Array Analysis (RMA) was used to remove local artifacts and noise [14, 15].
where, PM is a perfect match, Background (BG) caused by optical noise and non-specific binding (S); ijk is the signal for probe j of probe set k on array i.
where, PM-data is a perfect match (hybridization) representing combination of background (BG) and signals (S) of expression (E) for a set of probes (ijk), normalized (N) at default parameters. The signals of expressions were used to compute probe affinities. BioConductor “Array Quality Metrics” package was used to analyze the dataset that is normalized to the median expression level of each gene [14, 16, 17]. The expression value of a transcript having a p-value < 0.15 was considered as marginal log transformation and quantile normalization of arrays was used to bring them to the same scale. Gene–gene covariance matrices of each data set, ignoring the missing values, were calculated across all arrays (54675affyids). The formula for transformation is:
where, F1 and F2 are distribution functions of the actual and reference chips, respectively.
We used the RMA-algorithm to calculate averages between probes in a probe set in order to get a summary of intensities. BioConductor “AffyRNADegradation” package was used for RNA-degradation analysis and to measure the quality of RNA in these samples [18, 19]. In this study, we identified DEGs in each dataset from genomic experiments by pairwise comparison  and the Benjamini–Hochberg method was used for multiple testing corrections . DEG sand duplicates pots along with the measurement of quality weights were shortlisted. The moderated statistics were calculated, genes were ranked and p-values were calculated. FDR < 0.05 (false discovery rate), p-value ≤ 0.05, Average Expression Level (AEL) ≥ 40% and absolute log fold change (logFC) > 1 were set as significant cut-off values .
Gene overlap significance analysis
Common genes among ranked DEGs (p-value ≤ 0.05) of each microarray dataset included in this study were identified using Compare Two Lists (Bioinformatics & Research Computing, Whitehead Institute http://barc.wi.mit.edu/tools/).
Subcellular localization and identification of secreted proteins
We predicted the subcellular localization of biomarker candidate proteins. We predicted the subcellular localization of alopecia-related significant DEGs using Signal P4.1  and CELLO v.2.5  to find the cellular compartments and categorize the selected proteins for effective applications. In most of the cases, the secretory and extracellular proteins of several cells are important sources of biomarker discovery as they indicate different states of the cells at a real-time . These extracellular proteins that are secreted into the blood or plasma could be a convenient and easy way for patient clinical screening. The abnormal blood and plasma levels of such secretory proteins in various conditions would relate to the pathological conditions. Amino acid sequences as FASTA input format were used. SignalP4.1server was used to predict classical secretory proteins with D-value > 0.45 and SecretomeP2.0 server for non-classical secretory proteins with a neural network (NN) score = 0.5 . Trans-membrane proteins were predicted using TMHMMv.2. with default options . For biomedical text mining, the predicted genes were mapped with alopecia using the Comparative Toxicogenomics Database (CTD) to filter the information and related literature for the association of our selected differentially expressed genes with alopecia .
Expression profile and functional enrichment analysis
We performed expression profile analysis based on average values of expression in each sample of each dataset in order to determine variations in gene expression among different sample sets [29, 30]. The functional annotation, gene ontology (GO) and pathway enrichment analysis [31, 32] of alopecia-related genes helped us to reveal biological functions and they were performed using the web-based FunRich , DAVID and Enrichr  annotation tools.
Physicochemical and functional proteomic analysis
We analyzed the physicochemical properties of selected biomarker proteins by ExPASy ProtParam, ProtScale and Peptide Mass tools . For empirical investigation and validation of the studies of two-dimensional (2-D) gels and mass spectrometry, these tools compute various physical and chemical parameters of proteins. ProtParam was used to calculate various physicochemical parameters, ProtScale for hydrophobicity analysis and Peptide Mass was used for calculating the masses of possible cleavable peptides and any known post-translational modifications. For further structural and functional analysis, PDBePISA interactive tool  was employed for the exploration of macromolecular interfaces. We identified all known motifs in our protein samples by Motif Scan web server . Large proteins that make up structural and functional units have evolved by duplication of internal sequence repeats. Therefore, we identified gapped approximate repeats and short composition biased in protein sequences using RADAR web server [38, 39].
The protein–protein interactions (PPIs) reveal the topology and functional interaction of proteins that are useful to assess biological and pathological conditions [40, 41]. We can infer the functional relationship of proteins through genomic associations. In the PPI network, each protein is considered to have a relationship with one or more gene-sets connected with biological or molecular functions . Functions of this biological network may show great differences of activity in the disease state as compared to normal. The functional interactors of source genes were retrieved with a high confidence score (0.999) from the STRING version-10  and HAPPI databases . For visualization and analysis of molecular networks, Cytoscape (version 3.2.1) was used  to identify the role of each seeder and target genes in alopecia and hair loss. In the network, the source nodes were categorized according to the degree of association with alopecia and any disturbance in the functionality of genes contributes to a disease phenotype. In Cytoscape, Network Analyzer was used to calculate topological properties of the network.
Pathway-molecular mechanisms analysis in alopecia
The pathway analysis could be an essential tool for biomarker discovery to better understand the underlying biochemical mechanisms. KEGG, Reactome and Wiki pathways were used to curate and map the candidate biomarkers. We constructed an integrated metabolic network of alopecia-related potential biomarkers using PathVisio version-3 tool . In this integrated network, the potential mechanism of each genetic marker in each pathway was studied based on data mining.
We performed multiple testing to evaluate the significance of candidate biomarkers. Average expressions of genes between two phenotypic classes were compared, for gene g. The two-sample t-statistic was calculated. The false discovery rate (FDR)  is commonly used in the multiple testing of high-dimensional genomic data as a criterion for controlling false positives . FDR = V/R is defined as the expected proportion of false positives among the genes declared significant. Since no null hypothesis was rejected, V/R is considered to be 0 when R = 0. The determination of the number of biological replicates is one of the current methods for controlling true positives. The variance was used to analyze the importance of differences between these groups. Data were expressed as mean ± SD. We performed all statistical analysis using the Limma-package of the R-platform version 3.1.3. The decision rule was set to p-value < 0.05.
cDNA datasets and normalization
We accessed eight GEO datasets related to alopecia, hair loss, and scalp hair follicles cases. The AffyBatch object has the size of the arrays: 1164 × 1164, 732 × 732, 712 × 712, and 448 × 448 features with related Affyids (Table 1). Normalized distances between arrays of DNA chip showed the median expression level and the quantile normalization of the probes. The MA-plots indicated the quality of the individual array of each dataset after normalization (Fig. 1). The gene–gene covariance matrix across all arrays in each dataset by ignoring missing values was computed following a log-transformation of the arrays to make sure they were on the same scale. Decreased amount of RNA hybridization renders low-quality results and lowers the total signal strength. The intensity gradient 3′/5′ depends on the degree of competitive binding of specific and non-specific targets to a particular probe. The RNA degradation-plot of each dataset shows short probe sets near the 3′-end of the transcripts indicating that the 3′/5′ intensity gradient decreases by increasing degrees of saturation (Fig. 2). A single summary statistic for each array in the batch provides an assessment of the severity of RNA-degradation level (Additional file 2: Table S1). The list of tools, databases and software used in this study are shown in Additional file 3: Table S2.
Identifying DEGs and screening for potential biomarkers
We identified a total of 100 DEGs in each microarray dataset by pairwise comparison between biologically comparable groups. Among these DEGs, the top 20 genes in each dataset were ranked and selected based on FDR (< 0.05), p-value (≤ 0.05) and logFC (> 1) parameters. From the list of ranked genes, 12 common genes (as potential biomarkers) of each dataset were identified (Additional file 4: Table S3). The sub-cellular localization of these genes indicates that WIF1, SPON1, and LYZ are extracellular, GPRC5B is membrane bound, PTPRE, VAV3, and HBB are cytoplasmic, ZFP36L2, PHF15, LMCD1, and KRT35 are nuclear while HBB is a mitochondrial-associated protein. We determined the transmembrane helices of these proteins based on a Hidden Markov Model (Additional file 5: Fig. S2). For disease-gene mapping, we found the role of these genes in alopecia and hair-loss and observed that SPON1 is the enriched term (Fig. 3).
Expression profiling and functional enrichment analysis
We analyzed the expression level of 12 candidate biomarkers to observe the global picture of cellular functions between the biological samples of each dataset. This profiling showed the variation in expression levels and the relative activity of these genes in hair loss. In the heat map, the major variations in the expression levels of PHF15, KRT35, HBB, LMCD1, and WIF1 of each dataset are observed (Fig. 4). Pathway-enriched terms indicate a significant association of DEGs in biological pathways related to alopecia (Fig. 5). The pathway enrichment analysis of these genes uncovered important background information about cell growth, energy pathways, metabolism, and signal transduction (Table 2).
Identifying physicochemical and functional properties
Proteins possess different functional properties due to their dynamic physical, chemical, structural and amphiphilic nature. To explore the relationship of these properties, we investigated the physicochemical properties of selected potential biomarkers to facilitate the validation of studies using ProtParam, ProtScale, and Peptide Mass tools (Table 3). The total number of amino acid (aa) residues of biomarker proteins including WIF1 (379 aa), SPON1 (807 aa), PTPRE (700 aa) and LYZ (148 aa) showed a significant charge density while the hydrophobicity range at the C and N-terminus enabled an insight of the Îs-subunits of these proteins. The theoretical-pI indicated the solubility of these proteins at a given pH while the instability index showed that HBB and LYZ are more stable in vitro as compared to other proteins.
For functional proteomic analysis, we summarized and visualized the macromolecular interfaces of these proteins. The known amino acid residues of these proteins in the interface contact matrix are used to describe the molecular surface at the interface. The sequence motif indicated the pattern of biological significance involved in the shape of proteins and binding sites (Additional file 6: Fig. S3). We analyzed the motif sites of these proteins and identified the gapped approximate repeats and composition of residues for structural significance.
To evaluate the topology and functional annotation of proteins, a protein–protein interaction (PPI) network was constructed. The network involved 111 nodes and 104 edges (nodes represent proteins and edges denote interaction), containing high scoring interaction partners (confidence score: > 0.9). The PPI network was largely categorized into three neighborhoods: light yellow and red nodes indicate the secretory and cell-associated potential biomarkers while the remaining blue nodes represent the other target proteins. We found that these candidate biomarkers interact functionally with other important protein targets including APCDD1 , WNT1, WNT31 , SHH , ESRI, TGFB1  and APP . This functional relationship shows the genomic connectivity of these proteins to disease phenotypes (Fig. 6). Of these proteins, SPON1 (spondin1, extracellular matrix protein) is principally interacting with SHH (sonic hedgehog protein facilitates androgen metabolic processes) and WIF1 (WNT inhibitory factor1) while WNT3A (Wnt-3a protein involved in WNT signaling pathway and hair development) is also interacting with WIF1. Similarly, PHF15 (PHD finger protein-15) is associated with the family of PHF11 and PHF12 contributing to clinical phenotype. We assessed the topological properties of the network using Network Analyzer to classify and optimize the network performance. After disease-gene mapping of these network genes using CTD, we observed more than 90 target genes having a functional connection with candidate biomarkers (source/seeder genes) in alopecia. Among them, RAC1, LPAR1, MAPK1, ITM2B, LTF are the enriched terms (Additional file 7: Fig. S4).
Studying pathway models and associated mechanisms
We studied the role of potential biomarkers in associated pathways to explore the molecular mechanisms of these molecules in alopecia and hair loss. These proteins are critically involved in the regulation of integrated pathways including protein metabolism, signal transduction, Wnt, BMP, Eda, Notch, and Shh pathways. The genetic ontology of these pathways is related to hair follicle differentiation, hair follicle morphogenesis and hair shaft pigmentation (Fig. 7a). Therefore, hair growth and shape depend on these pathways which interact with epithelial and mesenchymal cells. Integration of Wnt with the Eda pathway upregulates the expression of Shh ligand. The cascade of reaction is unified with BMP signaling and regulation of epithelial Notch expression . The levels of β-catenin and Nf-kappa B , cytokines , retinoic acid  and the interaction cycle catalyzed by WIF1, SPON1, SHH, WNT3A, GPCR5, and LYZ are critical for modulating the signaling of Wnt, Shh, Notch and allied pathways. Regular expression patterns of these genes would affect the synthesis of these bio-molecules and dysregulation of these pathways lead to certain abnormalities of hair growth. In Fig. 7b, a circos graph represents the expression levels and interconnection of differentially expressed genes in alopecia.
We cross-validated our analysis by statistical algorithms. The significance of biomarker proteins in alopecia and hair loss between two phenotypes was tested based on logFC (> 1), Ave Expr (AEL ≥ 40%), p-value (< 0.05) and t-test statistics (Prob > |t|: 0.92944) (Fig. 8). We determined the false discovery rate (FDR) as a criterion for controlling false positives.
In recent years, the advances in technological developments have opened up a new era in systems biology, structural, functional and clinical proteomics. Specifically, functional proteomics helps us to understand the fundamental mechanisms of various pathophysiological states and reveals new opportunities for biomarker discovery. The current study critically explores the variations of gene expression levels in alopecia and hair loss. The meta-analysis provides a list of differentially expressed genes of scalp epidermis, hair follicle, dermal papilla and correlates with metabolic pathways. The expression profiling of these genes is indicative of the obvious differences among different tissue samples of clinical phenotypes.
We used various bioinformatics software, tools, and databases to systematically investigate expression data in order to discover specific protein biomarkers. The computational framework employed provides a logical step of the meta-analysis of cDNA microarray datasets associated with the exploration of DEGs. The steps implemented in this study are consistent and complement those technological pipelines described earlier [54, 55].
We filtered 12 common genes as potential biomarkers, namely WIF1, SPON1, LYZ, GPRC5B, PTPRE, ZFP36L2, HBB, PHF15, LMCD1, KRT35, HBB, and VAV3 (p < 0.05), from the list of 100 DEGs based on physicochemical and functional analysis. The dysregulation and functional role of these differentially expressed genes have also been studied in hair physiology [35, 49, 53]. In the protein–protein interaction network, we identified that these possible biomarkers are functionally associated with other interacting protein targets including APCDD1, WNT1, WNT31 , SHH , ESRI, TGFB1  and APP . The APCDD1 gene inhibits the WNT signaling pathway (WIF1), ultimately interfering with BMP, Eda, and Notch pathways, and this aberration causes alopecia and hair loss . We observed more than 90 target proteins having a functional connection with candidate source genes in alopecia.
These candidate biomarkers are involved in the regulation of biological pathways including protein metabolism, signal transduction, Wnt, BMP, Eda, Notch, and Shh pathways [35, 53, 56]. The gene ontology (GO) is related to hair follicle differentiation, hair follicle morphogenesis and hair shaft pigmentation. Therefore, hair growth and shape depend on these important pathways that interact with epithelial and mesenchymal cells. WNT starts signaling through the signal mediator EDA/EDAR/NF-κB, i.e. nuclear factor kappa-light-chain-enhancer of activated B-cells, by upregulating Shh ligand expression. The pathway cycle is integrated with BMP signaling and upregulation of epithelial Notch expression . Therefore, abnormalities of these pathways lead to irregularities of hair growth. The levels of β-catenin and Nf-kappa B are critical to modulate the signaling of WNT, SHH, and allied pathways . Therefore, mutations in WIF1 would affect this level and hair growth. The cytokines  and retinoic acid are known to change skin and hair growth . It has been observed that SPON1, GPC5B, and LYZ metabolize the proteins to synthesize retinoic acid and cytokines. The irregular expression pattern of these genes would change the normal level of these biomolecules affecting hair follicle function. The pathway modeling and integrated network-based analysis showed that these potential biomarkers are important for hair follicle development and variations in β-catenin, Nf-kappa B, cytokines and retinoic acid can be considered as diagnostic indicators of alopecia and hair loss. The protein–protein interaction network provides testable assumptions, albeit validation can only be justified through experimental studies.
The differential expression of WIF1 , PTPRE, HBB, and LMCD1 is associated with hair loss, morphogenesis, and alopecia. SPON1 and GPRC5B catalyze post-translational modifications and are integrated with WNT, Eda, Notch, Shh, and RAIG1 pathways. SPON1 is over-expressed in growing hair follicles [58, 59]. GPRC5 is associated with the differentiation of cells involved in the production of hard keratin and cortical cells of the hair shaft  and its dysregulation cause hair loss. LYZ-Ndufs4 showed a similar phenotype and disease conditions with significant attenuation of the alopecia . Protein tyrosine phosphatase (PTPRE) and PHD finger protein 15 (PHF15) are differentially expressed in bald individuals responsible for androgenetic alopecia . In animals, it has been reported that the down-regulation of ZFP36L2 gene that synthesizes a proline-rich zinc finger protein causes patchy alopecia . The differential expression of LIM and cysteine-rich domains 1 (LMCD1) causes the aberration of Notch and Hedgehog signaling pathways leading to alopecia and hair-loss . Under-expression of KRT35 and VAV3 genes level down the keratins and vav-3 guanine nucleotide exchange factor respectively, affecting the signaling pathways and hair growth [64, 65]. This study suggests that differential expression of these genes in both normal and mild state may be used as clinical biomarkers. We propose that the combination of several indicators may be more useful in diagnostic sensitivity and specificity instead of using a single biomarker.
Systems biology studies of cDNA datasets helped us to identify potential DEGs of alopecia and hair loss. The methodological framework used revealed genome to phenome association in alopecia and hair loss. Our prioritization approach has found 12 potential physiological biomarkers of alopecia connected with other vital proteins including SHH and APCDD1. However, further molecular studies and research should be considered to authenticate the role of these genes in alopecia for effective treatment.
Safavi KH, Muller SA, Suman VJ, Moshell AN, Melton LJ III. Incidence of alopecia areata in Olmsted County, Minnesota, 1975 through 1989. Mayo Clin Proceed. 1995;70:628–33.
Everts HB, King LE Jr, Sundberg JP, Ong DE. Hair cycle-specific immunolocalization of retinoic acid synthesizing enzymes Aldh1a2 and Aldh1a3 indicate complex regulation. J Invest Dermatol. 2004;123:258–63.
Harries MJ, Sun J, Paus R, King LE Jr. Management of alopecia areata. BMJ. 2010;341:c3671.
Tosti A, Duque-Estrada B. Treatment strategies for alopecia. Exp Opin Pharmacother. 2009;10:1017–26.
Jabbari A, Cerise JE, Chen JC, Mackay-Wiggan J, Duvic M, Price V, et al. Molecular signatures define alopecia areata subtypes and transcriptional biomarkers. EBioMedicine. 2016;7:240–7.
Bhatt AN, Mathur R, Farooque A, Verma A, Dwarakanath B. Cancer biomarkers-current perspectives. Indian J Med Res. 2010;132:129–49.
Anderson NL. The roles of multiple proteomic platforms in a pipeline for new diagnostics. Mol Cell Proteom. 2005;4:1441–4.
Paulovich AG, Whiteaker JR, Hoofnagle AN, Wang P. The interface between biomarker discovery and clinical validation: the tar pit of the protein biomarker pipeline. Proteomics Clin Appl. 2008;2:1386–402.
Rifai N, Gillette MA, Carr SA. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol. 2006;24:971–83.
Choi JK, Yu U, Kim S, Yoo OJ. Combining multiple microarray studies and modeling interstudy variation. Bioinformatics. 2003;19:i84–90.
Hong F, Breitling R. A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments. Bioinformatics. 2008;24:374–82.
Klee EW, Finlay JA, McDonald C, Attewell JR, Hebrink D, Dyer R, et al. Bioinformatics methods for prioritizing serum biomarker candidates. Clin Chem. 2006;52:2162–4.
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17:520–5.
Bolstad BM, Irizarry RA, Åstrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–93.
Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F. A model-based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc. 2004;99:909–17.
Fujita A, Sato JR, de Oliveira Rodrigues L, Ferreira CE, Sogayar MC. Evaluating different methods of microarray data normalization. BMC Bioinform. 2006;7:469.
Obenchain V, Lawrence M, Carey V, Gogarten S, Shannon P, Morgan M. VariantAnnotation: a bioconductor package for exploration and annotation of genetic variants. Bioinformatics. 2014;30:2076–8.
Affymetrix Microarray Suite User Guide version 5.0. Santa Clara, CA. 2000. https://bioinformatics.cs.vt.edu/~easychair/AFFYMETRIX/MicroarraySuite.pdf.
Affymetrix Mircoarray Suite User Guide version 5.0. Santa Clara, CA. 2001. https://bioinformatics.cs.vt.edu/~easychair/AFFYMETRIX/MicroarraySuite.pdf.
Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001;98:5116–21.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B. 1995;57:289–300.
Liu X, Wu J, Zhang D, Bing Z, Tian J, Ni M, et al. Identification of potential key genes associated with the pathogenesis and prognosis of gastric cancer based on integrated bioinformatics analysis. Front Genet. 2018;9:265.
Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8:785–6.
Lu Z, Szafron D, Greiner R, Lu P, Wishart DS, Poulin B, et al. Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics. 2004;20:547–56.
Stastna M, Van Eyk JE. Secreted proteins as a fundamental source for biomarker discovery. Proteomics. 2012;12:722–35.
Bendtsen JD, Jensen LJ, Blom N, Von Heijne G, Brunak S. Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel. 2004;17:349–56.
Möller S, Croning MDR, Apweiler R. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics. 2001;17:646–53.
Clematide S, Rinaldi F. Ranking relations between diseases, drugs and genes for a curation task. J Biomed Semantics. 2012;3:S5. https://doi.org/10.1186/2041-1480-3-s3-s5.
Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–8.
Gellert P, Jenniches K, Braun T, Uchida S. C-It: a knowledge database for tissue-enriched genes. Bioinformatics. 2010;26:2328–33.
Nam D, Kim S-Y. Gene-set approach for expression pattern analysis. Brief Bioinform. 2008;9:189–97.
Muhammad SA, Fatima N, Syed NIH, Wu X, Yang XF, Chen JY. MicroRNA expression profiling of human respiratory epithelium affected by invasive Candida infection. PLoS ONE. 2015;10:e0136454.
Pathan M, Keerthikumar S, Ang CS, Gangoda L, Quek CY, Williamson NA, et al. FunRich: an open access standalone functional enrichment and interaction network analysis tool. Proteomics. 2015;15:2597–601.
Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 2013;14:128.
Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007;372:774–97.
Pagni M, Ioannidis V, Cerutti L, Zahn-Zabal M, Jongeneel CV, Hau J, et al. MyHits: improvements to an interactive resource for analyzing protein sequences. Nucleic Acids Res. 2007;35:W433–7.
Heger A, Holm L. Rapid automatic detection and alignment of repeats in protein sequences. Proteins. 2000;41:224–37.
McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, et al. Analysis tool web services from the EMBL-EBI. Nucleic Acids Res. 2013;41:W597–600.
Li F, Long T, Lu Y, Ouyang Q, Tang C. The yeast cell-cycle network is robustly designed. Proc Natl Acad Sci USA. 2004;101:4781–6.
Muhammad SA, Ahmed S, Ali A, Huang H, Wu X, Yang XF, et al. Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach. Genomics. 2014;104:24–35.
Rachlin J, Cohen DD, Cantor C, Kasif S. Biological context networks: a mosaic view of the interactome. Mol Syst Biol. 2006;2:66.
Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39:D561–8.
Chen JY, Mamidipalli S, Huan T. HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics. 2009;10:S16.
Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, et al. Integration of biological networks and gene expression data using Cytoscape. Nat Protoc. 2007;2:2366–82.
Kutmon M, van Iersel MP, Bohler A, Kelder T, Nunes N, Pico AR, et al. PathVisio 3: an extendable pathway analysis toolbox. PLoS Comput Biol. 2015;11:e1004085.
Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007;99:147–57.
Nagy N, Farkas K, Kinyó Á, Fazekas B, Szabó KÁ, Kollár E, et al. A synonymous polymorphism of APCDD1 affects translation efficacy and is associated with androgenic alopecia. J Life Sci. 2014;8:106–14.
Rishikaysh P, Dev K, Diaz D, Qureshi WMS, Filip S, Mokry J. Signaling involved in hair follicle morphogenesis and development. Int J Mol Sci. 2014;15:1647–70.
Hammerschmidt B, Schlake T. Localization of Shh expression by Wnt and Eda affects axial polarity and shape of hairs. Dev Biol. 2007;305:246–61.
Foitzik K, Spexard T, Nakamura M, Halsner U, Paus R. Towards dissecting the pathogenesis of retinoid-induced hair loss: all-trans retinoic acid induces premature hair follicle regression (catagen) by upregulation of transforming growth factor-β2 in the dermal papilla. J Invest Dermatol. 2005;124:1119–26.
Kwak Y-D, Marutle A, Dantuma E, Merchant S, Bushnev S, Sugaya K. Involvement of notch signaling pathway in amyloid precursor protein induced glial differentiation. Eur J Pharmacol. 2011;650:18–27.
Zhang Y, Tomann P, Andl T, Gallant NM, Huelsken J, Jerchow B, et al. Reciprocal requirements for EDA/EDAR/NF-kB and Wnt/b-catenin signaling pathways in hair follicle induction. Dev Cell. 2009;17:49–61.
Gregoriou S, Papafragkaki D, Kontochristopoulos G, Rallis E, Kalogeromitros D, Rigopoulos D. Cytokines and other mediators in alopecia areata. Mediat Inflamm. 2010;2010:928030.
Awan FM, Naz A, Obaid A, Ali A, Ahmad J, Anjum S, et al. Identification of circulating biomarker candidates for hepatocellular carcinoma (HCC): an integrated prioritization approach. PLoS ONE. 2015;10:e0138913.
Addona TA, Shi X, Keshishian H, Mani DR, Burgess M, Gillette MA, et al. A pipeline that integrates the discovery and verification of plasma protein biomarkers reveals candidate markers for cardiovascular disease. Nat Biotechnol. 2011;29:635–43.
Li J, Tzu J, Chen Y, Zhang YP, Nguyen NT, Gao J, et al. Laminin-10 is crucial for hair morphogenesis. EMBO J. 2003;22:2400–10.
Kim YJ, Yoon B, Han K, Park BC. Comprehensive transcriptome profiling of balding and non-balding scalps in trichorhinophalangeal syndrome type i patient. Ann Dermatol. 2017;29:597–601.
Donati G, Proserpio V, Lichtenberger BM, Natsuga K, Sinclair R, Fujiwara H, et al. Epidermal Wnt/β-catenin signaling regulates adipocyte differentiation via secretion of adipogenic factors. Proc Natl Acad Sci USA. 2014;111:E1501–9.
Inoue S, Nambu T, Shimomura T. The RAIG family member, GPRC5D, is associated with hard-keratinized structures. J Invest Dermatol. 2004;122:565–73.
Jin Z, Wei W, Yang M, Du Y, Wan Y. Mitochondrial complex I activity suppresses inflammation and enhances bone resorption by tipping the balance of macrophage-osteoclast polarization. Cell Metab. 2014;20:483–98.
Chew EGY, Tan JHJ, Bahta AW, Ho BS, Liu X, Lim TC, et al. Differential expression between human dermal papilla cells from balding and non-balding scalps reveals new candidate genes for androgenetic alopecia. J Invest Dermatol. 2016;136:1559–67.
Sanduja S, Blanco FF, Young LE, Kaza V, Dixon DA. The role of tristetraprolin in cancer and inflammation. Front Biosci (Landmark Ed). 2012;17:174–88.
Shi FT, Yu M, Zloty D, Bell RH, Wang E, Akhoundsadegh N, et al. Notch signaling is significantly suppressed in basal cell carcinomas and activation induces basal cell carcinoma cell apoptosis. Mol Med Rep. 2017;15:1441–54.
Pratt CH, King LE Jr, Messenger AG, Christiano AM, Sundberg JP. Alopecia areata. Nat Rev Dis Primers. 2017;3:17011.
Eriksson N, Tung JY, Kiefer AK, Hinds DA, Francke U, Mountain JL, et al. Novel associations for hypothyroidism include known autoimmune risk Loci. PLoS ONE. 2012;7:e34442.
Kim S, Dix DJ, Thompson KE, Murrell RN, Schmid JE, Gallagher JE, et al. Gene expression in head hair follicles plucked from men and women. Ann Clin Lab Sci. 2006;36:115–26.
Garza LA, Yang CC, Zhao T, Blatt HB, et al. Bald scalp in men with androgenetic alopecia retains hair follicle stem cells but lacks CD200-rich and CD34-positive hair follicle progenitor cells. J Clin Invest. 2011;121:613–22.
Garza LA, Liu Y, Yang Z, Alagesan B, et al. Prostaglandin D2 inhibits hair growth and is elevated in bald scalp of men with androgenetic alopecia. Sci Transl Med. 2012;4:126ra34.
Higgins CA, Chen JC, Cerise JE, Jahoda CA, et al. Microenvironmental reprogramming by three dimensional culture enables dermal papilla cells to induce de novo human hair-follicle growth. Proc Natl Acad Sci USA. 2013;110:19679–88.
Xing L, Dai Z, Jabbari A, Cerise JE, et al. Alopecia areata is driven by cytotoxic T lymphocytes and is reversed by JAK inhibition. Nat Med. 2014;20:1043–9.
Gazel A, Blumenberg M. Transcriptional effects of inhibiting epidermal growth factor receptor in keratinocytes. Dermatologica Sinica. 2012. https://doi.org/10.1016/j.dsi.2012.11.003.
Ohyama M, Kobayashi T, Sasaki T, Shimizu A, et al. Restoration of the intrinsic properties of human dermal papilla in vitro. J Cell Sci. 2012;1:4114–25.
SM and JC supervised the study. HS, FN, and PR collected the data. SM, JC, and AA analyzed the data. JC and AA provided guidance with study design and data analysis. All authors contributed to manuscript writing and edition. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
The data have been presented with the article.
Consent for publication
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A framework of our study designed to identify physiological biomarkers in alopecia.
The function summaryAffyRNAdeg of Bioconductor package produced a single summary-statistic for each array in the batch dataset.
List of Databases, Software, and Tools used in this study.
Preliminary investigation of common and ranked differentially expressed genes of each microarray dataset.
Prediction of transmembrane helices in selected potential biomarker proteins using TMHMM Server v. 2.0.
Structural and functional properties of potential biomarker candidates. Studies of macromolecular interfaces using PDBePISA interactive tool. Motif scan in protein sequences were studied using Motif tool. Identified gapped approximate repeats and complex repeat architectures using RADAR (Rapid Automatic Detection and Alignment of Repeats) tool.
Data mapping: The role of differentially expressed genes in alopecia was mapped using the Comparative Toxicogenomics Database (CTD).
About this article
Cite this article
Muhammad, S.A., Fatima, N., Paracha, R.Z. et al. A systematic simulation-based meta-analytical framework for prediction of physiological biomarkers in alopecia. J of Biol Res-Thessaloniki 26, 2 (2019). https://doi.org/10.1186/s40709-019-0094-x
- Functional genomics
- PPI networks