Introduction

While cancer therapies continue to progress at an impressive pace, many patients are unfortunately diagnosed at advanced stages of cancer, and thus face prognoses far worse than had they been diagnosed at earlier stages. This is particularly the case with gastric cancer (GC), which remains one of the top three most common causes of cancer death worldwide [1]. In the United States, where stomach cancer is often diagnosed symptomatically at late stages, the 5-year relative survival rate for distant cancer is profoundly low at approximately 5%, according to public data from the NIH Surveillance, Epidemiology, and End Results Program (SEER) [2]. However, that rate improves dramatically to roughly 69% for those diagnosed and treated at the localized stage [2]. This large survival gap between early and late stage diagnosed GC represents a significant opportunity for reduction of morbidity and mortality, one which improved screening methods can help address.

Currently, upper endoscopy is considered the gold standard for GC screening and diagnosis, but a variety of other screening methods exist and are being further researched or refined, such as radiography, serum biomarkers including pepsinogen, H. pylori, and gastrin 17, confocal laser endomicroscopy and endocytoscopy, microRNAs, tumor autoantibodies, breath-exhaled marker detection methods using chromatography, mass spectroscopy, or nanosensor technology, and even whole-genome bisulfite sequencing for DNA methylation [3,4,5,6]. Yet, even with the best technologies, knowing who to screen is a key part of the equation, and this invariably differs depending on the baseline risks of various individuals. Weighing these various risk factors for GC, which are broad but include genetics, H. pylori, smoking, alcohol, diet, obesity, pernicious anemia, GERD, and EBV infection, as well as sex and ethnicity [1], can help formulate more personalized, nuanced approaches to screening in the future, even in countries where overall GC incidence rates may be lower.

Screening and surveillance recommendations are well established in the U.S. for colorectal cancer and intestinal metaplasia (IM) of the esophagus (Barrett’s), but not for IM of the stomach leading to GC, despite the latter likely having a higher incidence of progression from IM to cancer compared to IM of the esophagus [7]. Improved screening with endoscopic submucosal dissection (ESD) and endoscopic mucosal resection (EMR) for such patients could catch GC earlier on and prevent some types of GC from developing to later stages [7]. Though routine GC screening is not recommended in the U.S. due to inadequate evidence of benefit [8], national screening programs do exist in certain countries outside the U.S., such as Japan and South Korea, which have higher burdens of GC disease [9]. In an ideal world, inexpensive and perfectly sensitive and specific screening tests for GC could be widely employed to diagnose the disease at earlier stages. In the meantime, understanding the various health and economic factors that affect screening cost-effectiveness is critical in helping to guide successful screening programs that can be applied to more people.

Methods

We compared screening strategies with intervals of 1, 2, 5, and 10 years, initial screening of all individuals followed by surveillance of only high-risk individuals every 2 years, as well as initial screening followed by surveillance of high- and low-risk patients at intervals of 2 and 5, 2 and 10, and 5 and 10 years, respectively, all relative to a baseline of no screening for males and females in the United States, Brazil, France, Japan, and Nigeria. These screening intervals and countries were selected to obtain a broad array of timeframes within diverse health and economic settings. We then performed a sensitivity analysis to test different ranges of each variable and determine each variable’s influence on the overall model. All analyses were performed via Markov models using TreeAge Pro Healthcare 2020 (TreeAge Software Inc., Williamstown, MA, USA).

In each country, the target population was defined as all individuals aged 40, who started out asymptomatic but then developed GC according to population incidences as they progressed through 35 successive 1 year cycles of the model until age 75. This age range was chosen to reflect the increased risk of developing GC above age 40, which is also the age when screening in Korea (and formerly Japan, which now screens at age 50) begins [10, 11]. In each strategy, all individuals at age 40 were screened during the first year, and at age 75, all iterations through the model concluded and costs and QALYs were added and compared for each strategy. Screening strategies were designed to test individuals at different intervals of time, with the idea that as the length of screening interval increased, the risk of missing early stage GC would increase, while screening costs would decrease. The model incorporated local, regional, and distant stages of disease. Each branch corresponds to a different potential outcome with built-in probabilities; Markov cycles start at the beginning of each tree for any of the screening strategies (purple circles with an M; see Appendix figures). Each tree ultimately ends in a terminal state (red triangle), which determines where those individuals will begin at the start of the next cycle. These terminal states include local, regional, and distant GC for those diagnosed with GC. For those not diagnosed with GC, they include target population in screening or interval year in the screening and surveillance strategies, and, in the no screening or annual screening scenarios, simply target population. To keep track of the overall toll of GC, a death state named GC Deaths was designated to accumulate all deaths due to GC, separate from deaths due to baseline mortality, which terminated in the health state Dead. Cycle lengths were each 1 year long. To enable us to test for proper validation of our model, we defined the risk of a 40 year old being diagnosed with GC after 35 years as the sum of the final probabilities, at the end of the 35 cycle simulation, of being in any of the following terminal health states: local, regional, or distant GC, as well as GC deaths.

Health data

Incidence of GC for each country in this study was adopted from the International Agency for Research on Cancer (IARC) GLOBOCAN 2018 online database, which provides estimates of incidence for 34 cancer types by sex, age group, and country for the year 2018 [12]. GC mortality data were adopted from the United States National Cancer Institute’s Surveillance, Epidemiology, and End Results Program (SEER) [2]. 5-year mortality estimates were converted to 1-year mortality estimates using the following formula [13]:

pannual = 1—((1-pother year)(1/pother year)).

Because GC screening is not routinely performed in the U.S., we used the U.S. SEER data to determine what the baseline distribution of local, regional, and distant cases would be at diagnosis in a population without any widespread screening, namely, 31% local, 29% regional, and 40% distant. To capture the effect of screening on this baseline distribution of local, regional, and distant GC cases, data from the Korea Central Cancer Registry (2014) [14] were used, as Korea employs biennial GC screening [15]. Thus, the population-wide cancer stage distribution in Korea was used for the cancer stage distribution with a 2 year screening interval program in place in our model. This results in a shift from later to earlier stage cancers at diagnosis, reflecting the beneficial effects of early screening. The strength of that shift was then augmented for annual screening, and depreciated by the same amount for each subsequent year, such that a screening interval of 20 years would be equivalent to no screening. In our surveillance scenarios, the probability of premalignant GC was estimated at 13.5% of the target population [16], which was used to represent the proportion of the target population at higher risk of developing GC and thus more likely to benefit from more frequent surveillance. Realistically, it could be assumed that any low-risk individuals found to have premalignancy in subsequent years could then move into the high-risk pool of the population; though we do not explicitly account for this in our model, we do account for people moving out of these pools and into the cancer pools. While we acknowledge H. pylori to be a risk factor for gastric cancer, we assume that this is already accounted for in the different gastric cancer incidence rates for the countries in our model. Finally, baseline annual mortality by age was obtained from public data sources, including the CDC Census Data for the U.S. (2017) [17], the Ministry of Health, Labour, and Welfare for Japan (2017) [18], and World Health Organization (WHO) for Brazil, France, and Nigeria (2016) [19]. For countries other than the U.S., which provided population mortality data for males and females but not the overall population, we assumed a 50–50 male–female demographic distribution and averaged the mortality data from each sex at every age to get general population estimates. For WHO data, where mortality rates were presented in 5 year intervals, calculated as the number of deaths for an age group between x and x + 5 divided by the population of that age group, we assumed equivalent death rates and population sizes for each year within that 5 year interval. GC incidence and overall mortality data were tailored to each country as described above; all other clinical data were uniformly applied to each country, as difficulty of finding exact estimates and quality of such estimates varied considerably.

Cost data

Costs were calculated in U.S. dollars ($) and given in 2020 prices. Prices from previous years were adjusted for inflation, and a discount rate of 3% was incorporated for both costs and utilities when adjusting future projections back to the present in our model. Endoscopy costs were estimated from national data when possible; this was estimated at $761 per recent Medicare data for the U.S. [20], and $137 (average of 10,000–20,000 yen, converted to 2020 U.S. dollars) per recent Japanese Health Insurance data for Japan [21]. For Nigeria, endoscopy cost was estimated at $148 according to an article from a university hospital in Nigeria [22]. Because Brazil and France have government-backed health insurance like Japan, the Japanese cost of endoscopy was applied to these two countries, as well. Treatment costs were adopted from a study by Zhou et al. [16] which looked at endoscopic surveillance for GC in a population of Singaporean Chinese men and women aged 50–69 years, and tabulated medical service costs specific to each clinical stage (I–IV) of GC according to 2012 hospital charges from the National University Hospital. Because our model records cancer stages via the local, regional, and distant schemes, we averaged stage I and II costs and assigned that to local, stage III to regional, and stage IV to distant cancer costs to most closely match TNM staging [23]. We then converted the costs to 2020 dollars.

Utility data

Health utilities were measured by quality adjusted life year (QALY) and were adopted from a study by Lee et al. [24] which surveyed 326 male and female participants in Korea from various ages and educational backgrounds, and used the standard gamble method to elicit ratings of 9 different health states including various scenarios for early, advanced, and metastatic or recurrent GC. We took the mean of the medians for the early, advanced, and metastatic/recurrent GC health states (0.9000, 0.6500, and 0.3875, respectively) and assigned these utilities to local, regional, and distant GC, respectively. We assigned a utility of 1 to the healthy state, which included all non-cancerous states.

Cost-effectiveness analysis

The primary outcome measure in our model was the incremental cost-effectiveness ratio (ICER) between each screening strategy relative to the baseline no screening strategy, calculated via accrued costs in the numerator, and accrued QALYs in the denominator. Willingness-to-pay for each country was set at twice its 2018 gross national income (GNI) per capita [25], per World Bank data [26], adjusted for inflation to 2020 dollars (Table 1). Any strategy below this threshold was considered cost-effective for its respective country. Sensitivity analysis was performed on each variable to determine how it impacted the overall model, with the exception of utility of the healthy state, which was excluded, since it is by definition equal to 1, as well as GC incidence and overall mortality data, both of which varied with age. Utilities were increased and decreased by 0.1, and all other variables were increased and decreased by 50% of their original values to determine how sensitive the primary outcome, ICER, was to that variable’s new range of values.

Table 1 Gastric cancer incidence and economic data by country

Assumptions

We make several assumptions in this analysis. We assume 100% adherence to screening by the target population, when, in reality, the number is likely to be considerably lower. However, before delving into the specifics of a mass screening program, we believe that there is value to at least understanding what the best-case scenario could be. We also assume that the beneficial effect of screening in terms of shifting the distribution of GC to earlier stages erodes linearly until, at a screening interval of 20 years, the distribution is equivalent to no screening. Though the true natural history of GC is likely to differ in several ways in different populations facing different lifestyles and risks, we do think this assumption is helpful in terms of capturing the decreasing value of screening as time between endoscopies increases. Furthermore, we used the same death rate for premalignant and non-premalignant healthy patients, and for different ages of patients with the same stage of GC. We also held premalignancy prevalence constant across countries at 0.135, and assume that the initial screening at age 40 prior to subsequent surveillance will help identify all high-risk individuals. In reality, these assumptions are likely to differ between different countries for a variety of reasons, including differences in H. pylori prevalence, diet, and smoking patterns, differences which are partly captured in the differing incidences of GC for each country in our model [1, 27]. Finally, we kept QALYs and treatment costs the same across different countries, as finding robust and consistent data was very challenging; though these may differ considerably in different countries, they are also likely to differ considerably within countries, and thus, keeping these constant throughout the model avoids biasing overall cost-effectiveness too strongly in one direction for any given country.

Results

The results of the cost-effectiveness simulation are displayed in Fig. 1 and detailed in Table 2. The baseline model of no screening projected the 35 year risk for a 40 year old of developing GC in the overall U.S. population as 0.42%; this is comparable to SEER statistics showing the 30 year risk of a 40 year old being diagnosed with GC as 0.32%. Projected mortality from GC in the overall population after 35 years ranged from 0.14% in Nigeria with annual screening, to 2.28% in Japan with no screening (Table 3). In all countries, for both men, women, and the general population, the most cost-effective screening option was screening every 10 years, while the least cost-effective option was annual screening. Only in Japan, where GC incidence is highest (ASR 27.5) amongst all countries in this model were any of the screening options below the willingness-to-pay. In particular, the cost-effective options in the Japanese general population were screening every 5 years, screening every 10 years, and performing surveillance in high-risk individuals every 5 years and low risk individuals every 10 years. Amongst Japanese males, these options in addition to performing surveillance in high-risk individuals every 2 years and low-risk individuals every 5 or 10 years were also considered cost-effective.

Fig. 1
figure 1

Cost-effectiveness curves for combined male and female target populations in Brazil (a), France (b), Japan (c), Nigeria (d), and the U.S. (e)

Table 2 ICERs for each screening strategy relative to no screening in general populations (a), males (b), and females (c) of all 5 countries
Table 3 Projected 35 year mortality from gastric cancer for each strategy in Brazil, France, Japan, Nigeria, and U.S

For Japan, which had the lowest ICERs and where multiple screening options were cost-effective, we detail the sensitivity analysis in the general population in Fig. 2 to show which variables had the most impact in the most cost-effective strategy, screening every 10 years. The five most important ones were starting age of screening, probability of GC being local at diagnosis, cost of local GC treatment, cost of endoscopy, and utility of the distant GC health state. In the U.S., which had the highest ICERs and where no screening options were cost-effective, the same sensitivity analysis showed that the five most important variables were probability of GC being local at diagnosis, starting age of screening, cost of endoscopy, annual mortality of local GC, and utility of the distant GC health state. Other sensitivity analysis results are detailed in Appendix Figure 1.

Fig. 2
figure 2

Sensitivity analyses in Japanese general population (a) and U.S. general population (b) when screening every 10 years. Blue and red bars represent decreases and increases, respectively, in the value of the relevant variable. Variables ending with “disc” designations are discounted at 3% over time. prob probability; GC gastric cancer, diag diagnosis, tx treatment, EV expected (original) ICER value, WTP willingness-to-pay

Discussion

Our model shows that only in Japan, where GC incidence is by far the highest of all five countries in this analysis, do any of the screening strategies have ICERs low enough to be considered cost-effective. On the other hand, ICERs were highest in the U.S., which had the highest cost of endoscopy and the second lowest incidence of GC amongst all countries in this analysis. These results support the idea that endoscopic GC screening is most cost-effective in populations with high incidence of this disease and low screening costs [28, 29]. Nonetheless, even in countries where no screening strategies had ICERs below the willingness-to-pay threshold, the different ICERs can help tease apart the strategies that are most likely to be cost-effective. Similarly, because males had a higher incidence of GC in each country compared to females, the screening strategies amongst them were more cost-effective. This highlights the fact that even in a country with low incidence of GC and thus a higher ICER for screening, it may still be cost-effective to screen certain individuals who are at higher risk than the general population, such as immigrants from high-risk countries or individuals with known premalignancy or any other risk factors including family history, race, or H. pylori infection [30, 31]. For example, in the U.S., GC incidence varies widely from state to state (Fig. 3), and thus screening in higher risk individuals could make screening more cost-effective despite low average rates nationally [32].

Fig. 3
figure 3

Source: National Cancer Institute (NCI) Cancer Atlas [33]

Distribution of gastric cancer incidence in U.S. by state.

This analysis also further highlights the importance of improving screening technologies in addition to endoscopy, which will increase diagnostic yield and identification of higher risk individuals, improve convenience for and thus adoption by patients, and lower costs to make screening programs more cost-effective. This will undoubtedly increase the number of individuals who can benefit from screening without having to save costs by less ideal practices, such as delaying screening age which may lower the screening ICER, as depicted in Fig. 2. In addition to highlighting the importance of endoscopy cost, sensitivity analysis also showed that the starting age of endoscopy is a significant factor. This reflects the known trade-offs involved with shifting when people get screened. At younger ages, GC incidence is lower, and thus, screening is less efficient and relatively more costly, while at older ages, screening may catch more cancers and help avert high-risk surgeries, but at the cost of subjecting older patients, who have higher baseline health risks, to more procedures. Finally, sensitivity analysis showed that the probability of GC being local at baseline (with no screening in place) has a strong effect on and correlates positively with the ICER (and thus negatively with overall cost-effectiveness) of a screening strategy for a given population. This makes sense, since the value of screening is to diagnose GC early on, and thus shift the population-wide distribution of GC from later to earlier stages when intervention is more likely to help patients successfully avert higher treatment costs, morbidity, and mortality.

There are several limitations to this study. Our model does not include complications of endoscopy, disutilities from days lost for procedures or treatments, cost data related to scheduling and travelling to an endoscopy, and several other variables that would presumably affect the implementation of a mass screening program. While some of these variables are likely to make screening even less cost-effective, others, such as the productivity losses of patients living with late-stage cancer and of their families caring for them, or from lives lost due to higher mortality from late-stage cancer, are likely to make screening more cost-effective. Thus, we believe that this simplification avoids overfitting the model while allowing us to broaden its applicability to multiple countries. Each health system will undoubtedly face different challenges and needs requiring unique adaptations of these frameworks. Furthermore, though we believe that this model provides a useful framework for comparing various screening strategies within and across countries with different health and economic characteristics, we recognize that a variety of other factors can make direct comparisons between countries challenging. For example, two countries may have similar rates of GC incidence and similar cost structures, but one health system may have greater availability of and more widespread training in endoscopic detection of premalignant and malignant GC. However, the cost of endoscopy partly factors this in. Similarly, while rates of GC premalignancy and other pathologies necessitating endoscopy may be similar in different populations, differences in health regulations, access, literacy, and culture may affect the likelihood of individuals presenting for endoscopy. This can affect the ability to stratify patients according to GC risk, and thus influence cost-effectiveness of GC surveillance in a population. However, on a large scale, we think that this is likely to be a similar source of bias in all countries in our model. These limitations also highlight the importance of alternative screening technologies that can improve access to screening. Finally, cancer staging systems may differ between countries, both in terms of classification types and differing standards of inclusion into shared types. However, we applied a common system of local, regional, and distant GC distribution and mortality in all countries to minimize this potential bias.

To the best of our knowledge, this is the only cost-effectiveness model exploring multiple endoscopic screening strategies across a mix of countries with varying health and economic profiles. We show that screening is more likely to be cost-effective in populations with higher incidence of GC, at lower costs of endoscopy, at later starting ages of screening, and whenever screening has a greater opportunity to shift the distribution of GC from later to earlier stages. We also show that regardless of these variables, any screening strategy is only as likely to be cost-effective as the underlying willingness-to-pay of the society where it is implemented. More robust data and clinical trials, advances in technologies, and lowering of costs will strengthen these types of cost-effective analyses in the future, as well as help us avoid unfortunate tradeoffs that we currently face in deciding when to screen. This will likely improve earlier GC detection, so that more treatment options, with greater likelihoods of success, will be available to as many people as possible.