Introduction to molecular pathological epidemiology (MPE)

Molecular pathological epidemiology (MPE), which incorporates molecular pathology into epidemiologic research, has emerged as a transdisciplinary field in population health science [13]. Conventional epidemiology primarily investigates the relationship between an exposure and a disease entity in population-based cohorts (Fig. 1). This conventional approach assumes that people diagnosed with similar symptoms or disease manifestations represent a homogeneous group (i.e., a single disease entity) and share similar etiologies. However, this reductionist approach might have led to biased [3] or paradoxical findings [4] in which a well-known risk factor was apparently associated with a better prognosis.

Fig. 1
figure 1

The paradigm of molecular pathological epidemiology (MPE) research. a Scheme of a conventional epidemiology study. The overall association of an exposure with risk of disease X appears to be weak. b Scheme of a MPE study. By categorizing disease X into subgroups (A and B) based on molecular pathological features, the significant association of the exposure with risk of subtype A can be revealed. Note that, although we present an example of two disease subgroups for simplicity, more than two disease subgroups can be evaluated in MPE research. Typically, the primary hypothesis in MPE research tests for a difference between the associations of the exposure with subtypes classified by molecular features. MPE, molecular pathological epidemiology

In contrast, MPE, by means of applying molecular pathology diagnostics to a disease classification, aims to address inherent heterogeneity in a single traditional disease entity [1, 2]. The MPE paradigm is founded on “the unique disease principle [5]” (or “the unique tumor principle [6, 7]”) and “the disease continuum theory [3]”. To elaborate, “the unique disease principle [5]” posits that, while people diagnosed with the same disease entity share some similarities, each individual has a unique pathologic process driven by a complex interaction between molecular alterations in cells and the surrounding microenvironment. At the same time, “the disease continuum theory [3]” asserts that people diagnosed with different diseases can have overlapping etiologies and pathogenesis. Moreover, a wide spectrum of inherent factors (e.g., germline genetic variations, sex, ethnicity) and acquired or exogeneous factors (e.g., acquired genetic and epigenetic alterations, diet, lifestyle, smoking, medications, microorganisms) can affect the disease process. As a result, significant interpersonal heterogeneity exists in the disease process including initiation, evolution, and progression [3, 5]. In order to address the disease heterogeneity, MPE utilizes molecular pathological signatures that can categorize patients into subgroups [1, 2], so that people in each subgroup share more homogenous etiology and pathogenic process. Through this paradigm shift, MPE enables us to explore whether an exposure forms a differential relationship with disease subgroups classified by molecular biomarkers. Thus, findings from MPE research can provide biological evidence to enhance our understanding of etiologies and pathogenesis of diseases, strengthening evidence for causal relationships [1, 2, 7, 8]. The concept of this unified field of MPE has gained considerable popularity in the literature [930].

Based on the growing popularity of molecular pathology assays and increasing recognition of the importance of the precision medicine concept [31, 32], molecular classifications of diseases are widely used in clinical practice. Furthermore, biomedical big data have opened new opportunities to enhance our understanding of disease heterogeneity in humans. Owing to the advent of high-throughput sequencing techniques and analytical methods of genomic data, large-scale biomedical data are increasingly available. Thus, there is a pressing need to establish analytical frameworks to synthesize appropriate knowledge from available big data. Such a need is reflected in the aims of The Big Data to Knowledge (BD2K) and Precision Medicine Initiatives launched by the U.S. National Institutes of Health (NIH), which are in line with the core framework of MPE [31, 32]. The strengths of the unique MPE approach have been well recognized by international symposia [3335]. The International MPE Meeting Series were established in 2013, and the Second International MPE Meeting was held in 2014 [36], and the Third International MPE Meeting was held in 2016 as a NIH-supported meeting (R13 CA203287, funded by National Cancer Institute, National Human Genome Research Institute, and National Institute of Environmental Health Sciences).

The purposes of this article are to introduce the paradigm of MPE, and to summarize recent advances of MPE. As a major strength of MPE, MPE is a very flexible field that can achieve integrations of other scientific fields such as social science [37], lifecourse epidemiology [38], phamarcoepidemiology, immunology, and microbiology.

Framework of MPE research

The outline of MPE research is illustrated in Fig. 1. MPE has evolved through cancer epidemiology research, owing to early recognition of molecular classification systems and wide availability of tumor tissue specimens. In particular, colorectal cancer has served as a practical model for MPE research [1, 2]. The model of accumulation of genetic and epigenetic alterations was established to explain colorectal carcinogenesis [39]. It has been increasingly evident that colorectal cancer represents a considerably heterogeneous group of neoplasms arising from a unique sequence of genetic and epigenetic alterations in each individual [6, 4042]. Of note, there is a wide range of etiologic factors for colorectal cancer such as genetic factors, aging, smoking, excessive alcohol consumption, obesity, physical inactivity, dietary factors, diabetes mellitus, inflammatory bowel diseases, and possibly intestinal microbiome [4346]. To understand specific pathogenic pathways linking diverse etiologic factors to colorectal cancer, it is critical to fully consider potential etiologic heterogeneity. Furthermore, the MPE methodology is readily applicable to research of any human neoplastic [5, 47] and nonneoplastic [4851] diseases possessing substantial disease heterogeneity [3].

Typical MPE studies are risk analyses or survival analyses that attempt to address heterogeneous relationships between an exposure, and disease incidence or prognosis according to molecular subtypes (Fig. 2) [2]. Based on a hypothesis test for heterogeneity in exposure-disease associations across subgroups defined by molecular markers [3], MPE studies can provide not only risk estimates of incidence, recurrence, or progression tailored to specific subgroups, but also insights into diverse pathogenic pathways [1, 2, 5]. Therefore, findings from MPE studies serve as critical evidence to support the need for personalized management in guiding disease prevention, screening, and treatment [5259].

Fig. 2
figure 2

Risk analysis and survival analysis in molecular pathological epidemiology (MPE) research. MPE risk analysis examines differential associations of a prediagnosis exposure with incidence of disease subtypes defined by molecular features. MPE survival analysis examines differential associations of a prediagnosis or postdiagnosis exposure with prognosis of disease subtypes defined by molecular features. Arrows indicate disease process with time. MPE, molecular pathological epidemiology

A MPE study on obesity/physical activity and colorectal cancer incidence by tumor CTNNB1 (beta-catenin) expression status illustrated a critical role of MPE in enhancing our understanding of host-tumor interactions in the WNT-CTNNB1 signaling pathway [60]. The WNT signaling pathway, when activated, induces the translocation of CTNNB1 into the nucleus, where CTNNB1 promotes the transcription of various genes including growth-promoting genes [61]. The study showed that obesity and low physical activity were associated with a higher risk of CTNNB1-negative colorectal cancer, but not with CTNNB1-positive cancer risk [60]. This finding suggests that colorectal carcinogenesis related to energy imbalance is likely less dependent on the activation of the WNT-CTNNB1 pathway. Consistently, another MPE study found that postdiagnosis physical activity was associated with longer survival only in CTNNB1-negative cancer [62]. In that study, CTNNB1-positive tumors, likely developing due to factors other than energy imbalance, was associated with longer survival only among obese patients [62]. These data imply that the association between energy imbalance-related factors and colorectal cancer survival may vary by activation status of WNT-CTNNB1 signaling. Considering that physical activity and obesity are modifiable lifestyle factors, weight management strategy and exercise program after colorectal cancer diagnosis can be further tailored according to tumor CTNNB1 expression level to effectively improve cancer survival.

Medications have a great potential as chemopreventive agents, and further integration of pharmacoepidemiology into MPE has provided novel insights into mechanistic pathways between common medications and risk of specific colorectal cancer subtypes; this area of investigation has recently been coined “pharmaco-MPE” [3]. Emerging evidence indicates that the pathogenic processes of neoplastic and non-neoplastic diseases are influenced not only by signaling molecules but also by host immune response and microbiota [63]. Thus, disease categorization by factors associated with the immune status and microorganisms has opened new opportunities to examine the disease heterogeneity (immuno-MPE [3] and microbial MPE). In fact, the importance to account for the interplay of tumor molecular features, the gut microbiome, and host factors (e.g., diet, immunity, inflammation) in carcinogenesis process is in line with “the colorectal continuum theory,” which proposes that certain molecular features of colorectal cancer may change gradually from the rectum to ascending colon rather than having an abrupt transition at the splenic flexure [64, 65]. In the following sections, we present recent progress and promise of the emerging subfields of MPE (pharmaco-MPE, immuno-MPE, and microbial MPE; Fig. 3). It is the versatile nature (one of the major strengths) of MPE that has enabled the developments of these subfields.

Fig. 3
figure 3

Further integration of several disciplines into molecular pathological epidemiology (MPE). Pharmaco-MPE integrates pharmacoepidemiology into MPE, where we evaluate differential associations of a medication as an exposure with disease subgroups. Immuno-MPE and microbial MPE integrate MPE with immunology and microbiology, respectively. Diseases are categorized into subtypes by parameters of disease immunity status or microbial profile. Arrows indicate disease process with time. MPE, molecular pathological epidemiology

Pharmaco-MPE: integration of pharmacoepidemiology into MPE

Pharmacoepidemiology investigates effects of drugs on disease outcomes and their potential side effects in human populations. Evidence from pharmacoepidemiology research serves as a foundation for chemoprevention and drug therapy. Integration of pharmacoepidemiology and MPE, pharmaco-MPE [3], examines the relationship of a drug with disease incidence or survival according to molecular markers of a disease (Fig. 3). Pharmaco-MPE has particular clinical relevance. No drugs are free from adverse events, and thus it is clinically important to identify target individuals who most likely benefit from use of a particular drug. In fact, pharmaco-MPE has made a striking contribution to not only revealing novel insights into the etiologies and pathogenesis of diseases but also potentially identifying such target populations [5254, 56, 6669].

Aspirin, a commonly used nonsteroidal anti-inflammatory drug (NSAID), has been regarded as a promising chemopreventive agent against colorectal cancer incidence and mortality [5254, 56, 66, 6975]. In the 2016 recommendation statement, the U.S. Preventive Services Task Force recommends the use of low-dose aspirin for primary prevention of colorectal cancer among adults with a substantial cardiovascular risk [76]. Pharmaco-MPE further refined the inverse relationship between aspirin and colorectal cancer risk, by showing that the association was more evident in tumors with PTGS2 (cyclooxygenase-2) overexpression [52]. This pharmaco-MPE finding suggests that aspirin, as a PTGS (cyclooxygenase) inhibitor, may exert antitumor effects by inhibiting PTGS2 during carcinogenesis. A subsequent study observed that aspirin use was associated with lower incidence of BRAF-wild-type colorectal cancer, but not with BRAF-mutant cancer risk [56]. This finding led to a hypothesis that BRAF-mutated neoplastic cells, by upregulating the MAPK (mitogen-activated protein kinases) pathway, might have resistance to the antitumor effects of aspirin. As illustrated here, the pharmaco-MPE approach has substantially enhanced our understanding of molecular mechanisms underlying the chemopreventive action of aspirin. Another pharmaco-MPE study, by showing that a reduced risk of colorectal cancer associated with aspirin use was limited to individuals with high expression of HPGD [hydroxyprostaglandin dehydrogenase 15-(NAD), or 15-PGDH], the primary enzyme catabolizing prostaglandins produced by PTGS2 [66], supported potential use of HPGD expression level in normal colorectal mucosa to predict those who would benefit from aspirin chemoprevention.

In survival analyses, pharmaco-MPE provides insights into molecular pathways that modify the effect of a drug on disease progression. Studies have shown that the survival benefit associated with aspirin use may be stronger in cancers with PTGS2 overexpression [53] or PIK3CA mutation [54, 69], indicating interactions of aspirin with the prostaglandin or PI3K (phosphatidylinositol-4,5-bisphosphate 3-kinase) signaling pathways in tumor progression. Taken together, the risk and survival analyses incorporating the pharmaco-MPE approach have opened new opportunities to refine regimes for aspirin use in the prevention and treatment of colorectal cancer. It is worth noting that risk factors for cancer incidence are not necessarily consistent with prognostic factors for cancer mortality. Tumor cells continuously interact with the local tumor microenvironment, which consists of extra-cellular matrix, microbiome, and non-neoplastic host cells including inflammatory or immune cells [63]. During tumor progression from earlier to later phases, colonic cells accumulate genomic and epigenomic alterations [63], manifesting different profiles of molecular alterations. It has been shown that neoantigens produced by colorectal cancer cells correlate with T lymphocytic immune response in the tumor microenvironment [77]. Thus, depending on dominant molecular alterations at a stage of carcinogenesis, an interaction between an exposure and tumor molecular markers in the host tumor microenvironment may vary.

Pharmaco-MPE studies have also examined potential heterogeneity in the associations between statin (HMA-CoA reductase inhibitor to lower blood cholesterol level) and colorectal cancer incidence or survival according to tumor molecular subtypes. While statin use was associated with a reduced risk of colorectal cancer with KRAS mutation [67], colorectal cancer survival was not related to statin use regardless of KRAS mutation status [68].

With growing popularity of high-throughput sequencing, genome-wide association studies (GWAS) can further enrich pharmaco-MPE studies [2, 7882]. That is, combining knowledge of susceptibility alleles identified by GWAS with that of molecular alterations allows us to examine potential heterogeneity in the drug-disease associations in a more refined manner, providing deeper insights on causality. For instance, in a case–control study, the inverse association between aspirin use and colorectal cancer risk was observed in individuals with the TT genotype in rs2965667, but not in those with the TA or AA genotypes [79]. Further incorporating molecular markers, the relationship between aspirin use and colorectal cancer risk was examined according to markers jointly defined by rs6983267 genotype and CTNNB1 (beta-catenin) expression [80]. An inverse association between regular aspirin use and colorectal cancer incidence observed among individuals with protective T allele (TT or GT vs. GG genotype) of rs6983267 was further confined to cancer with positive nuclear CTNNB1 expression.

Immuno-MPE: integration of immunology into MPE

Immunology is the study of the immune system and related diseases. Immuno-MPE has been derived by integrating immunology into MPE with the purpose of addressing disease heterogeneity by host immune response (Fig. 3) [3]. Innate and adaptive immunity is a host defense system, and accumulating evidence suggests host immune dysregulation as an underlying etiology for a wide spectrum of human diseases including several types of cancer [63]. In oncology, host immune response to tumors as well as tumor molecular features influences tumor behavior, and serves as an informative biomarker [8386]. During cancer evolution, cancer cells continuously interact with microenvironment characterized by a complex network across extra-cellular matrix, vascular endothelial cells, and non-neoplastic host cells including immune cells [63]. Therefore, cancer immunology is an interdisciplinary field that requires integrated analyses on host factors, tumor factors, and their interaction [63]. Emerging evidence suggests that activation of immune cells in the tumor microenvironment can be a promising strategy to treat different types of cancer [8791]. In particular, T cell-mediated immunotherapy has made a breakthrough in cancer treatment by targeting the immune checkpoint pathways related to the PDCD1 (programmed cell death 1, PD-1), CD274 (PDCD1 ligand 1, PD-L1), or CTLA4 proteins [8791]. As immunotherapy modulates host factors, it is less likely to lead to resistance due to tumor mutations. Along with immunotherapy strategies in cancer treatment, immune modulation can be an attractive strategy for cancer prevention (immunoprevention) [9294]. A better understanding of host-tumor interactions in the tumor microenvironment would help develop immunoprevention strategies, improve the effectiveness of immunotherapy, and identify patients likely to benefit from immunotherapy and immunoprevention [95, 96].

In immuno-MPE research exploring potential heterogeneity in associations between etiologic factors and disease outcomes by immune parameters, it is of importance to identify etiologic factors capable of influencing host immune response and to define immune parameters. Accumulating evidence suggests that the immune status may be modulated by a wide variety of epidemiologic factors including diet (e.g., ω-3 polyunsaturated fatty acid [PUFA]), smoking, alcohol, physical activity, obesity, vitamins (e.g., vitamin D), hormones, and common medications (e.g., aspirin, statin) [43, 94, 9799]. Considering that these factors are readily modifiable, they can be used as immunoprevention strategies. With regard to host immune parameters to sub-classify a particular disease, a single immune cell or a combination of diverse immune cells can be used: e.g., T cells (helper-, memory-, regulatory-, cytotoxic-, or suppressor-T cell), B cell, natural killer cell (NK cell), myeloid-derived suppressor cell, macrophage, neutrophil, eosinophil, and dendritic cell. Additionally, with increasing popularity of immunotherapy in various types of cancer, it is worth investigating other potential markers including the immune checkpoint molecules (e.g., CD274 [PD-L1], PDCD1LG2 [PDCD1 ligand 2, PD-L2], PDCD1 [PD-1], CTLA4 [8789, 100103]) and metabolic enzymes (e.g., ARG1, IDO1, TDO2 [104, 105]).

While immuno-MPE is likely to shed light on etiologies and pathogenesis of diseases, there are several challenges. First, epidemiologic studies are limited due to the lack of a large database with comprehensive information on pathological examinations, tumor molecular markers, and immune parameters [63]. Second, while pathological and immunohistochemical examinations of tumor-infiltrating immune cells in tissue sections permit a reliable assessment of host antitumor immune reactivity, pathological methods have not been standardized in terms of specimen types (whole-tissue section or tissue microarray), methods of tissue coring, antibodies for immunohistochemistry, or analytical methods (pathologist’s interpretation, or computer-assisted image analysis) [63].

Yet, several immuno-MPE studies have been conducted in relation to colorectal cancer [106108], identifying potential immunomodulators for cancer immunoprevention [106, 107]. For instance, high plasma 25-hydroxyvitamin D [25(OH)D] level, an indicator of adequate vitamin D status, was associated with a lower risk of colorectal cancer with high-level histopathological immune response, but not with risk of cancer with low-level immune response [106]. The finding supports that antitumor effects of vitamin D may be in part mediated by immune cells that can enzymatically convert 25(OH)D to a bioactive form, 1,25-dihydroxyvitamin D (also known as calcitriol) [106]. Similarly, a reduction in colorectal cancer risk associated with higher marine ω-3 PUFA intake was greater for colorectal cancer with higher FOXP3+ T cell infiltrates [107]. It is speculated that ω-3 PUFA may inhibit regulatory T cell function, and exert antitumor effects to prevent FOXP3+ T cell-rich cancer [107].

In colorectal cancer, high-level microsatellite instability (MSI) status due to mismatch repair deficiency was characterized by increased neoantigen load, which elicits intense host immune response in the tumor microenvironment [13, 85, 86, 95, 109]. Recently, genomic features of colorectal tumors were shown to be linked to antitumor immunity status. With increasing availability of tumor immunity markers, immune-MPE research is expected to discover new insights into cancer pathogenesis in the context of host-tumor interactions [77].

Microbial MPE: integration of microbiology into MPE

Microbiology is the study of microorganisms, such as bacteria, viruses, archaea, fungi, and protozoa. The colorectum is the most microorganism-rich organ in the human body. To maintain intestinal homeostasis, a complex microflora ecosystem must be under control, and the dysregulation of the intestinal microbial communities may contribute to impaired immunity, chronic inflammation, and carcinogenesis in the colorectum. Indeed, compelling evidence suggests that the gut microbiome is involved in the pathogenesis of various benign and malignant diseases [110112] including inflammatory bowel diseases and colorectal cancer [113, 114]. Thus, in epidemiologic studies on colorectal cancer, it is important to account for the complex network of the microbiome, intestinal epithelium, and the immune system. Microbial MPE addresses etiologic heterogeneity according to subgroups of colorectal cancer classified by tumor tissue microbial profiling (Fig. 3).

Fusobacterium nucleatum (F. nucleatum) has recently gained attention for a potential role in initiating and progressing colorectal cancer [115118]. Studies have shown that F. nucleatum might be associated with molecular features in colorectal adenoma and cancer, including high-level microsatellite instability (MSI) and high-level CpG island methylator phenotype (CIMP) [117120], as well as with suppression of T cells in the tumor microenvironment [118]. Furthermore, the gut microbiome might also influence the effectiveness of T cell-mediated immunotherapy [121, 122]. Therefore, it is of interest to examine the association between environmental factors (e.g., diet, medications) and colorectal cancer by tumor F. nucleatum status. Investigations of viruses and other bacteria, such as Bifidobacterium, Bacteroides, Escherichia coli, and Campylobacter, are also warranted in the future. Although technically challenging at this time, comprehensive assessments of the human microbiome ecosystem along with immune status throughout the body (in relation to disease etiologies and molecular pathologic signatures) will further improve our understanding of disease pathogenesis and evolution.

Future perspectives and conclusions

Based on the unique disease principle [5] and the disease continuum theory [3], MPE has established itself as an evolving research area in epidemiology, enabling us to address potential heterogeneity of the conventional exposure-disease relationship by molecular pathological, immune, or microbial markers of diseases [13, 63]. Providing insights into the etiologies and pathogenesis underlying heterogeneous exposure-disease relationships, MPE research serves as a basis for tailored strategies for early detection, prevention, and treatment of diseases [13]. Thus, the MPE paradigm is in line with the aim of the NIH Precision Medicine Initiative [31, 32]. The concept and methodology of MPE have been increasingly adopted by cohort studies in various settings [40, 5258, 60, 62, 69, 123137].

Currently, biobank/biorepository networks and worldwide collaborative databases are increasingly available for population-based research [8]. In parallel with this trend, there have been great advances in the framework of computational biology, bioinformatics, and genomic medicine. To optimize expanding biomedical data (e.g., genomics, epigenomics, transcriptomics, proteomics, metabolomics, microbiome), it is essential to consider the disease heterogeneity [3]. By categorizing a disease into distinct subgroups based on the unique disease principle, MPE can be a powerful tool to gain novel pathogenic insights and to infer causality from a rich resource of biomedical data.

Integration of several disciplines into MPE has further led to a few evolving subfields within MPE, including pharmaco-MPE, immuno-MPE, and microbial MPE. Further integration across MPE subfields is feasible and expected to become a promising research area in the future. For example, several medications can serve as immunomodulators, altering levels of infiltrating lymphocytes in the tumor microenvironment. Indeed, the integration of pharmaco-MPE and immuno-MPE is one of the greatest achievements in recent MPE research. A recent MPE study examined the association between aspirin use and colorectal cancer incidence by levels of lymphocytic reactions to cancer cells in the tumor microenvironment [138]. Similarly, medications could also influence the gut microbiome through direct antimicrobial effects or alterations in other factors [139, 140]. Therefore, a further integration of pharmaco-MPE and microbial MPE could provide a promising research framework.

Despite all of the abovementioned strengths, there are some challenges in MPE. First, the generation and maintenance of comprehensive tumor molecular databases that are a prerequisite for MPE research demand much effort. As a consequence, MPE analyses limited to individuals with tissue specimens tend to have small sample sizes. To obtain adequate statistical power, large sample sizes of parent cohorts and efforts to obtain as many tissue specimens as possible in the parent cohorts are mandatory [3]. Second, in examining the disease heterogeneity, multiple hypothesis testing is inevitable, which increases false-positive findings. Therefore, it is of particular importance in MPE analyses to form a priori hypotheses based on earlier exploratory findings or on plausible biological mechanisms, and to interpret the results accounting for multiple comparisons [2]. Additionally, unique statistical methods to address the disease heterogeneity have been developed for MPE research [141147]. Finally, due to the interdisciplinary nature of MPE, one of the most profound challenges in MPE is the paucity of professionals with multidisciplinary expertise across molecular pathology, epidemiology, and biostatistics [148151]. Multidisciplinary education programs for MPE research in universities and academic institutions could be part of a solution.

In conclusion, the paradigm shift from conventional epidemiology to MPE has opened new opportunities to address the disease heterogeneity, and to provide epidemiologic evidence for molecular pathogenic mechanisms. The MPE research framework is in parallel with the NIH Precision Medicine Initiative, which has emphasized personalized prevention and treatment [31, 32]. Given increasing availability of biomedical data, the disease heterogeneity should be appropriately addressed in order to extract insights into disease etiologies and pathogenesis from invaluable data. The evolving field of MPE can be a core field in the era of big-data health science and precision medicine.