Introduction

Lung infection in people with cystic fibrosis (CF) is polymicrobial and comprises a mixture of multiple bacteria, fungi, viruses, and other microorganisms [1, 2]. After conventional diagnosis of infection using culture-based microbiology, only specific bacterial and fungal CF pathogens are treated using antibiotics or antifungals, respectively. Multiple CF pathogens are intrinsically resistant to antibiotics, and within the environment of the CF lung, they escape killing with these antimicrobials leading to the establishment of chronic infections. The continuous or repeated use of antibiotics to suppress chronic infection may lead to increased pathogen antimicrobial resistance (AMR), and frequently limits available therapeutic options for people with CF. In addition, pathogen adaption to this antimicrobial-rich lung environment promotes diversification and phenotypic variation. The microbial molecular epidemiology of CF lung infection has been comprehensively discussed by Lipuma [2, 3•] and is audited regularly by organisations such as the UK Cystic Fibrosis within their Data Registry [4].

But from the diverse groups of isolates associated with bacterial CF lung infections, how do researchers or companies developing therapeutics select appropriate strains for analysis?

Testing of novel anti-infective agents for the treatment of CF infections should consider using microbial strains that are well characterised in the context of the respiratory disease, as well as being representative of the genetic and phenotypic diversity of species being targeted. The definition of a bacterial strain has been discussed widely and can now be defined at several levels of resolution based on whole genome sequences [5]. For the purpose of this review, a bacterial CF strain will be taken as an isolate that is distinct from other isolates of the same species based on a genotypic feature. This is important in relation to chronic CF lung infection because sequential isolates of the same strain are frequently collected over time, and hence will bias research if not accounted for. Here, we provide a summary of strain resources for problematic bacterial CF pathogens which are treated with antibiotics in the context of the respiratory disease (Fig. 1): Pseudomonas aeruginosa, Burkholderia cepacia complex (Bcc) and Burkholderia gladioli, Mycobacterium abscessus complex, multi-drug-resistant Gram-negative species (Achromobacter, Stenotrophomonas and others), Staphylococcus aureus, and Haemophilus influenza. From this knowledge, systematic criteria are also proposed to help future selection strains for research and testing.

Fig. 1
figure 1

Bacterial pathogens, fungal pathogens, and the CF lung microbiota. Bacteria and fungi that are identified as pathogens and therapeutically treated in CF lung disease are listed. The wider microbiota are also illustrated, outlining prevalent bacteria (mainly anaerobes) which are not currently treated as lung pathogens. The current knowledge relating to taxonomic identity, strain definitions, and genomic resources is described. Figure graphics created with BioRender.com

Pseudomonas aeruginosa

P. aeruginosa is arguably the most widely studied CF pathogen and is also a World Health Organisation Priority 1 (Critical) bacterium in terms of AMR and an urgent need for new treatments [6]. P. aeruginosa continues to be a dominant cause of chronic CF lung infection, with a prevalence of greater than 50% in large populations such as the USA [3•] and UK [4]. Aggressive antibiotic eradication therapy has been successful at limiting P. aeruginosa infection in children, but chronic infection eventually occurs in adolescents and adults with CF [3•, 4]. Historically, P. aeruginosa strains which have been widely studied in terms of antimicrobial resistance and molecular pathogenesis are:

  • P. aeruginosa PAO1 (ATCC 15,692), a human wound isolate and arguably the prototypic P. aeruginosa reference strain studied for the last 30 years [7••]

  • P. aeruginosa UCBPP-PA14, a human burn isolate [7••]

  • P. aeruginosa PAK, a non-CF human isolate [7••]

    However, none of the above were originally isolated from CF lung infections, representing a key knowledge gap in terms of therapeutic testing and modelling disease-appropriate P. aeruginosa strains. In the last decade, one CF strain has been studied in more detail, providing insights into the different behaviour of isolates from chronic lung infection compared to the widely studied reference strains:

  • P. aeruginosa LES B58, a chronic lung infection isolate from an adult with CF which is representative of a major epidemic strain, the Liverpool Epidemic Strain (LES) CF strain [8].

The International P. aeruginosa Reference Panel

In 2013, Anthony De Soyza and colleagues coordinated the development of an international P. aeruginosa reference panel in an attempt to address the issue of studying disease-relevant strains [7••] (Table 1). The panel of 43 strains was developed to be representative of P. aeruginosa using a systematic range of clinical and microbiological criteria. Full historical details and source references for each panel strain were provided, and they were selected to be genetically diverse in terms of the known population biology of P. aeruginosa at the time [7••]. The P. aeruginosa strain panel included all four well-studied reference strains (PAO1, UCBPP-PA14, PAK, and LES B58) and expanded on CF strains as follows:

  • 8 isolates representative of transmissible CF strains including LES and from Denmark and Australia

  • 8 isolates from chronic infection that were sequential to cover evolution within an individual’s lung infection (3 from an adult and 5 from a childhood CF respiratory infection)

  • 8 additional CF strains with specific phenotypic, virulence serotype characteristics [7••].

Table 1 Strain and genomic level resources for bacterial CF pathogens

The wider panel encompassed 15 other P. aeruginosa strains from non-CF infections (burns, wounds, keratitis, acute intensive care unit infection, and chronic obstructive pulmonary disease), as well as 4 strains isolated from the environment [7••]. All the strains within the panel were deposited within an internationally recognised strain repository, the Belgium Coordinated Collection of Microorganisms/Laboratory for Microbiology Gent (BCCM/LMG; http://bccm.belspo.be/about-us/bccm-lmg). This enabled researchers to reproducibly access the P. aeruginosa strains from a single validated source at low cost [7••].

Phenotypic Data on the International P. aeruginosa Reference Panel

To follow-up the definition of the panel, Cullen and colleagues [9••] examined multiple phenotypic properties for 42 of the original 43 strains (strain NN2 was withdrawn from the analysis due to inconsistencies over its taxonomic identity). The phenotypes determined included growth rates, antibiotic susceptibility, motility, Galleria mellonella infection modelling, mucoidy, pyocynanin and alginate production, lipopolysaccharide properties, biofilm formation, urease activity, and antimicrobial and phage susceptibility [9••]. A key feature of the phenotypic testing was its coordination across multiple research laboratories in different geographic locations, with several phenotypes such as antibiotic susceptibility reproduced in different centres [9••]. This accounted for the variable nature of laboratory testing, and the fact that P. aeruginosa is a phenotypically diverse bacterium capable of significant intra-strain diversity [10]. By using the findings of the P. aeruginosa panel phenotype paper [9••], researchers can rapidly identify strains with the relevant virulence or antimicrobial resistance traits they wish to model.

A Genomic Understanding of the P. aeruginosa Reference Panel

The original P. aeruginosa strain panel [7••] accounted for the genetic diversity of P. aeruginosa by selecting strains based on their Clondiag Array Tube (AT) genotype [11]. The AT-genotyping database covered the diversity of greater than 1000 P. aeruginosa strains at the time, but as a molecular strain typing method, it is now been superseded by whole genome sequence analysis. In 2018, Freschi and colleagues [12•] determined the genome sequences for 33 of the strains from the P. aeruginosa panel [7••] and compared them to 7 reference genomes (Table 1). A key observation from this analysis of 40 strain genomes was that P. aeruginosa separates into two major evolutionary lineages, designated genomic Groups 1 and 2. Well-characterised reference strains such as LESB58, PAO1, and PAK are in Group 1, while other reference strains such as UBCPP-PA14 are in Group 2.

Furthermore, genomic analysis of 1311 P. aeruginosa genomes identified a total of 5 genomic groups within the species [13]. This confirmed the presence of Group 1 (n = 986 isolates) and Group 2 (n = 297 isolates) and demonstrated that they are the most common P. aeruginosa lineages (collectively 98% of the isolates) [13]. Three additional groups, 3, 4, and 5, were defined, with Group 3 being significantly distant in terms of its average nucleotide identity (ANI; 93–94% compared to all the other P. aeruginosa genomes analysed) [13]. The group 3 isolates (14 in total) included the well-studied multidrug-resistant P. aeruginosa PA7 strain [13]. However, in the context of the current genomic taxonomy boundary of 95% ANI used to define the majority of bacterial species [14••], strain PA7 and Group 3 as a whole could be considered a novel genomic species. This suggest that strain PA7 is less relevant as a model strain when comparing to the more common and taxonomically validated, P. aeruginosa lineages.

A study of 103 P. aeruginosa genomes [15] confirmed the Groups 1 and 2 population biology as encompassing the majority of strains [13], and also defined the preservative tolerance of multiple panel strains. P. aeruginosa is intrinsically resistant to both antibiotics and other antimicrobials such as preservatives, disinfectants, and biocides [15]. This analysis also showed that preservative tolerant strains from the contamination of nonsterile industrial products possessed genomes that were larger (7 Mb) than clinical strains (6.6 Mb) primarily due to the presence of a 0.5-Mb megaplasmid. The same family of megaplasmids was subsequently identified in multidrug-resistant non-CF clinical P. aeruginosa isolates from Thailand [16]. Overall, given the clear presence of 2 common genomic lineages within P. aeruginosa, researchers should seek to understand how strains from each behave in relation to novel therapeutics. At minimum, studies should include P. aeruginosa PAO1 (Group 1) and UBCPP-PA14 (Group 2) as references for each lineage and potentially expand analysis to include wider CF or other strains from sequenced panel [12•], or more recent genetic studies which characterise AMR megaplasmid encoding strains [15, 16].

Burkholderia cepacia Complex Bacteria and Burkholderia gladioli

Burkholderia are intrinsically antimicrobial-resistant Gram-negative bacteria that emerged as highly virulent and transmissible CF pathogens in the 1980s [2, 17]. From the onset of their recognition as CF pathogens, there were difficulties associated with their identification and taxonomy, with 5 species groups shown to make up isolates of B. cepacia [18]. Hence, to help researchers study them, they were collectively designated as the B. cepacia (Bcc) complex [18], with a representative strain panel assembled in 2000 to facilitate comparative research [19] (Table 1). The original Bcc panel included 30 strains from the 5 species at the time, and accounted for basic genetic strain diversity as determined by macrorestriction and pulsed-field gel electrophoresis fingerprinting [19]. Multiple follow-up publications have examined these original panel strains, and in 2003, an expansion to the panel was made to include B. dolosa (designated genomovar VI at the time), B. ambifaria, B. anthina, and B. pyrrocinia (20) (Table 1). As with the P. aeruginosa panel [7••], all strains within the B. cepacia complex panels are deposited within the BCCM/LMG (http://bccm.belspo.be/about-us/bccm-lmg) to enable access.

With the ongoing taxonomic classification, by 2010, there were 17 Bcc species defined [2]. Using whole genome sequence, a total of 22 known species and 14 further novel genomic species have recently been identified [20]. In a more extensive analysis of 4000 Burkholderia genomes, 26 novel Bcc genomic species groups have been recently defined [21•], indicating that the taxonomic complexity of this group of bacteria will continue to grow. In the context of CF lung infections, B. multivorans and B. cenocepacia are the most dominant species encountered in people with CF [3•, 22], and hence should be the focus of therapeutic development. Although the prevalence of B. cepacia complex infections in CF is low (between 2 and 5% in different populations [3•, 22]), they are important as a target for therapeutic development for a number of reasons. Firstly, there are limited therapeutic options to suppress or eradicate infection. Secondly, as a result of the unpredictable clinical outcome and severe lung disease associated with Bcc CF lung disease, infected individuals are frequently excluded from clinical trials of other novel drugs and lung transplantation, further limiting options available for this highly vulnerable group.

B. cenocepacia

This species name was proposed in 2003 [23] and known to encompass considerable CF strain diversity. At least four genetic lineages of B. cenocepacia were defined based on the recA gene sequence (III-A, III-B, III-C, and III-D) [23, 24]. B. cenocepacia isolates from the ET12 strain have been widely studied and include model strains J2315 [25] and K56-2 [19]. They are representative of the highly virulent and transmissible CF B. cenocepacia recA III-A lineage [26]. B. cenocepacia recA III-B lineage strains are prevalent within the US CF population [2]. Within the original strain panel [19], strain PC184 is a suitable recA III-B strain and member of the US Mid-West epidemic clone.

Recent genomic analysis by Wallner and colleagues [27•] validated the major genetic split within B. cenocepacia originally observed by the analysis of the recA gene [24]. They also proposed that the III-B grouping represent a novel genomic species, proposing the name Burkholderia servocepacia for this group [27•]. This new species name has not been taxonomically validated. It is also the subject of controversy because it misrepresents that this B. cenocepacia genomic species encompasses strains which are mainly plant associated [27•]. This completely contrasts the known CF infection epidemiology, with both III-A and III-B B. cenocepacia strains causing devastating CF lung infections and epidemic outbreaks of infection [2]. Future studies on B. cenocepacia as a CF pathogen should examine strains from both the III-A and III-B genetic lineages. Also, with genomic characterisation of multiple B. cenocepacia isolates completed [21•], systematic selection of future strain panels for this species is fully enabled.

B. multivorans

Unfortunately, there is no well-characterised CF strain of this species, even though it is now the most dominant Bcc species seen in CF [2, 3•, 22]. The soil isolate B. multivorans ATCC 17,616 has been well studied [19], with multilocus sequence typing (MLST) identifying 2 CF strains of the same sequence type (ST-21) as this environmental isolate [28]. Hence, strain ATCC 17,616 can be considered a model that is genetically representative of isolates that are capable of causing CF infection.

There have been two detailed studies following the genomic evolution of single B. multivorans strains during chronic CF lung infection. Silva et al. [29] followed the evolution of single strain through a switch from a mucoid to nonmucoid phenotype associated with 20 years of chronic CF infection. Cabellero et al. [30] performed another genomic study of the evolution of a single B. multivorans strain which chronically infected a CF adult for 10 years prior to lung transplantation. The isolate possessed the MLST sequence type, ST-783 [30], and current analysis of this sequence type at the MLST database shows that it is a sequence type shared by other CF (US infection), non-CF (Belgium), and environmental isolates (Belgium). Given the depth of clinical and genomic information that accompanies these 2 B. multivorans CF strains [29, 30], they are worthy of inclusion within future test panels.

B. gladioli

This species is not a member of the Bcc, but it is the third most common Burkholderia species seen in the US [2] and UK CF population [22]. Clinical outcomes of infection with B. gladioli may also be variable and problematic as seen with the Bcc. A useful model strain for this group is B. gladioli BCC0238 (LMG-P 26202), a CF isolate recovered in 1996 from a paediatric patient in Minnesota, USA [31]. The complete genome sequence for B. gladioli BCC0238 is available [31, 32], and although it has not been characterised in detail as a CF pathogen, the isolate has been studied because it produces multiple antibiotics, such as gladiolin which is active against tuberculosis [31, 33].

The population biology of B. gladioli has recently been determined [34•] by genomic comparison of 206 isolates, of which the majority were from CF infection (n = 194). The analysis demonstrated the presence of 3 distinct genomic groups and 5 evolutionary clades, with strain BCC0238 residing in B. gladioli group 3 [34•]. Strains from CF infection were found within all 5 evolutionary clades, and the study also uniquely identified that 13% of the B. gladioli CF isolates can produce the toxin, bongkrekic acid. This virulence factor could potentially drive clinical disease in CF as it is known to be lethal to non-CF individuals following ingestion of bongkrekic acid via food that had been contaminated with B. gladioli [34•]. The antibiotic trimethoprim suppressed B. gladioli bongkrekic acid production in vitro, and hence could be useful to prevent rapid clinical decline due to toxin poisoning in CF [34•]. B. gladioli should be included as a test organism for CF therapeutics that target Burkholderia CF infection, with BCC0238 or other genomically characterised strains selected as models [34•].

Mycobacterium abscessus Complex

A range of nontuberculous mycobacteria (NTM) may cause lung infection in people with CF, including isolates from the M. abscessus complex and M. avium complex [2]. Currently, the most problematic and prevalent group within a number of global CF populations is M. abscessus. In 2013, Bryant and colleagues [35] published the first extensive genomic evidence that shared strains exist and transmission of isolates may occur between individuals with CF. Over 4 years, they assembled a collection of 168 isolates from 31 individuals attending a single CF UK treatment centre, and used genome sequencing and phylogenomic analysis to characterise this collection [35].

In the last decade, it has become clear from genomic studies characterizing over 1000 clinical isolates that M. abscessus CF infections are acquired frequently through transmission leading to the emergence of globally dominant clones [36]. Furthermore, high-resolution genomic analyses of 1173 isolates from 526 CF patients were used to demonstrate the presence of 3 dominant circulating clones of M. abscessus, and model their pathogenic evolution within selected isolates such as strain BIR1049 [37•]. These large collections of well-characterised M. abscessus strains [35, 36, 37•] provide highly valuable resources including information on antibiotic resistance and molecular pathogenesis in CF. A sub-selection of limited numbers of model strains such as BIR1049, and those which account for each of the dominant circulating clones, will be required to enable straightforward CF therapeutic testing against M. abscessus.

Multiresistant Gram-Negative CF Pathogens

In addition to P. aeruginosa and Burkholderia species, multiple intrinsically antibiotic resistant Gram-negative species have emerged as CF pathogens [2], which can also dominate the lung microbiota in CF adults [38]. These include Ralstonia species, Pandoraea species, Stenotrophomonas maltophilia, A. xylosoxidans, and Inquilinus limonsus [2]. There are currently no highly characterised strains, but a small panel of representative species was assembled as part of a European Union funded project, EuroCareCF. The panel of CF isolates and other reference strains (Table 2; n = 25) were used for a quality assurance trial of bacteriology laboratories in different countries [39]. This EuroCareCF panel is also available from the BCCM/LMG (http://bccm.belspo.be/about-us/bccm-lmg), and although biased towards Burkholderia species (14 of 25 strains), it forms a systematic collection of other multiresistant species useful for therapeutic testing (Ralstonia, n = 3; Pandoraea, n = 4; and one isolate of S. maltophilia, A. xylosoxidans, Cupriavidus (Ralstonia) respiriculi, and Inquilinus limonsus; Table 2).

Table 2 The EuroCareCF panel of Gram-negative CF pathogens

Since the assembly of the EuroCareCF strain panel [39], it has become clear that multiple species of Achromobacter may infect people with CF [40], including A. xylosoxidans, A. ruhlandii, A. insuavis, A. aegrifaciens, A. dolens, A. insolitus, and other novel genomic species [41, 42]. Two recent genomic studies of 101 [41] and 54 [42] CF isolates of Achromobacter provide excellent collections from which to draw diverse isolates for testing. S. maltophilia is another multiresistant Gram-negative pathogen of increasing concern in CF lung infection, which in the USA had an overall prevalence of 13% in 2012 [3•]. Few genomic studies of S. maltophilia CF isolates exist, but Esposito et al. [43] characterised a collection of 91 isolates from 10 individuals with CF, looking at their phenotypic and genomic evolution over a 12-year period. This study showed that the S. maltophilia strains infecting the 10 individuals were genetically diverse, with many of them representing novel MLST sequence types [43].

Staphylococcus aureus

As a Gram-positive bacterium, S. aureus is as synonymous with CF lung infection as P. aeruginosa. S. aureus is found in CF patients of all ages but is at high prevalence in the paediatric CF population (often greater than 60%) [3•]. The increased isolation of methicillin-resistant S. aureus (MRSA) in CF is also worrying [3•], as it is WHO ‘Priority 2 (high)’ AMR pathogen [6]. The prevalence of chronic MRSA lung infection is increasing in several CF populations [3•, 4, 44]. Outside of CF lung infection, S. aureus strain collections and knowledge of genomic diversity are considerably advanced primarily because it is a major AMR human pathogen [6]. An extensive database of over 36,000 isolates and 26,000 genomes is maintained at the S. aureus MLST database [45••]. The recently established Staphopia analysis pipeline, database, and web application programming interface also provide a unique suite of tools to assemble and analyse S. aureus genomes [46]. The developers of Staphopia used the platform to analyse over 43,000 publicly available S. aureus genomes, exploring its genetic diversity and selecting high-quality genomes with robust metadata as a reference subset for the species [46].

In the context of CF, Bernady et al. [47] recently established a collection of 64 lung infection isolates, mapped a range of their phenotypes, and used genome sequencing and comparison via Staphopia to place them within the known S. aureus population biology. The CF strains were mapped to 8 of the 66 clonal complexes defined in the Staphobia study, with the most common being CC5 and CC8 MRSA strains [47]. The study also compared the CF strains to the well-known non-CF reference MRSA strain, S. aureus JE2, which is representative of a globally distributed strain known as USA300 [47]. S. aureus phenotypes such as toxin production and nonmucoid variants were shown to be retained during chronic lung infection in the MRSA and antibiotic susceptible CF strains that were characterised [47]. This study [47] together with greater use of representative clonal complex strains within the MLST [45••] (Table 1) or Staphopia databases [46] can provide an excellent basis to select S. aureus strain panels for therapeutic testing.

Haemophilus influenzae

Nontypeable H. influenzae isolates are common colonisers of children with CF, with an overall prevalence of 15% in the USA, that peaks at 32% among children aged 2 to 5 years [3•]. Although frequently treated as a pathogen in CF, H. influenzae represents a poorly studied bacterial infection in this context. In contrast, as an invasive pathogen of children that is capable of causing meningitis, extensive isolate (> 6000) and genomic collections (> 2000) exist for H. influenzae at the MLST database (Table 1) [45••]. Searching the MLST database shows that a small proportion of isolates may represent CF infection but proving this is limited by the fact that the source for H. influenzae isolates is generically recorded as sputum (Table 1). Without robust sample metadata, it is unknown whether the sputum is from an individual with CF or another form of respiratory disease. In addition, CF may actually be under-represented given most infection occurs in children, who generally cannot expectorate sputum.

Ebbing et al. [48] characterised the antibiotic resistance of both H. influenzae and H. parainfluenzae in isolates collected over 15 years within the Australian CF community. H. parainfluenza is considered part of the normal microbiota within the human oral and laryngeal cavity but can occasionally cause lung in infections and endocarditis. With 518 H. influenzae and 1020 H. parainfluenzae isolates analysed, it represents one of the largest studies in the context of CF. The isolates were representative of the infection seen in 349 CF individuals, and it was demonstrated that overall antibiotic resistance increased by 46% for H. influenza and 61% for H. parainfluenza over the 15-year study. The striking frequency at which Haemophilus species are isolated during childhood respiratory infection [3•], their increasing antimicrobial resistance in CF [48], and the status of H. influenzae as a WHO ‘Priority 3 (Medium)’ pathogen warrants that it should be the subject of much more systematic study in CF. Given the assumption that microbial species that infect the CF lung early and any antibiotics administered against them will alter the lung environment, it is likely that early H. influenzae infections may play a significant role in influencing the future course of lung disease.

Systematic Criteria for the Selection of Bacterial CF Pathogen Strains

It is clear from the current state of knowledge that the ability to systematically select bacterial strains for testing novel therapeutics varies for different CF pathogens. However, multiple core principles come through in considering how strains for testing and reproducible CF research should be selected in the future. In particular, the development of the P. aeruginosa international strain panel [7••] and its characterisation over the last decade [9••, 12•] prompts the following selection criteria and key questions for consideration when selecting strains in future CF research (Fig. 2):

  1. I.

    A systematic strain selection process. Initially, there was a systematic selection of P. aeruginosa strains based on relevant strain diversity, underlying disease and other criteria [7••]. These criteria and the strains selected were discussed widely among multiple groups of interdisciplinary researchers [7••].

  2. II.

    Detailed phenotypic characterisation. Phenotypic analysis of P. aeruginosa strains was subsequently performed, reproduced, and validated to understand their behaviour across a range of standard laboratory models including antibiotic susceptibility analysis [9••]. A clear understanding of multidrug resistance within a testing strain panel is also highly relevant in the context of new therapeutics being able to overcome existing AMR [6].

  3. III.

    Understanding of the species and strain population biology. The panel strains were initially placed within the overall population biology of P. aeruginosa as a bacterial species using AT-genotyping [7••]. This accounted for their genetic diversity and representation of P. aeruginosa as a pathogen in CF, other diseases, and as a free-living environmental bacterial species. The ultimate resolution of genetic analysis, whole genome sequencing, and the comparison of isolates at a phylogenomic scale revealed further diversity into two major evolutionary groups [13]. Although we do not fully understand the clinical or biological significance of P. aeruginosa genomic Group 1 or Group 2 strains, inclusion of both within future research should be carried out to identify if systematic differences exist.

  4. IV.

    Understanding within-strain evolution and adaption to CF lung infection. The within-strain variation of P. aeruginosa as a CF pathogen has also been extensively characterised [10, 49, 50]. For example, the international P. aeruginosa reference panel includes a mucoid and nonmucoid isolates of the same strain (IST 27) to account for this important CF phenotype [7••]. The transition to a hypermutator for P. aeruginosa [50], or a small colony variant in S. aureus [51], are also important sub-strain phenotypes in CF. Consideration should be given to which types of phenotypic strain derivatives should be included in any therapeutic testing, particularly if new compounds are directed, for example, at virulence factors.

  5. V.

    Accessibility within a public collection. A final key consideration is that any strains that are used widely in therapeutic testing must be made available from curated and accessible microbial collections. The establishment of the international P. aeruginosa reference panel and its central deposition in a recognised microbial resource repository [7••] enabled testing by multiple laboratories to ensure data validation and reproduction [9••]. Given the within strain phenotypic and genetic variance of multiple CF pathogens, using a single curated source rather than passing a strain from research laboratory to research laboratory is vital to obtain robust data.

Fig. 2
figure 2

Core principles and relevant questions to aid the design of systematic CF pathogen strain panels. Consideration of the design and testing of strains within the international P. aeruginosa panel, and the current state of knowledge of multiple other bacterial CF pathogens was made. The five core principles (boxes) and key questions within them (ovals) that derive from considering what is needed in future studies are illustrated. Figure graphics created with BioRender.com

Strain Collections, Useful Databases, and Prospective Clinical Isolates

Multiple internationally recognised strain repositories exist from which reference strains may be ordered and hence enable greater reproducibility across global studies. The BCCM/LMG collection (http://bccm.belspo.be/about-us/bccm-lmg) already forms an excellent repository for multiple CF pathogen species as outlined (Table 2). The Bacterial Diversity Metadatabase (BacDive; > 81,000 strain) is also a highly useful online resource for researchers and companies to explore culture collections [52•]. BacDive enables rapid searches of taxonomy, physiology, isolation source and location, phenotypic data, and genomic resources for multiple bacterial strains [52•]. The database is very useful for providing unifying information on bacterial strains which may have been published under multiple pseudonyms (e.g., ATCC, LMG, or other identifiers; see Table 2).

Multiple researchers and industrials developing therapeutics also use currently circulating clinical isolates to test novel anti-infectives. This approach is relevant if the basic strain selection can account for the first four criteria noted above. Working directly with CF treatment centres may also be beneficial because of the relative sparsity of available clinical data that accompanies strains requested from curated culture collections. CF lung infection also represents a highly complex disease, where several factors impact the success of therapeutic approaches and there are multiple knowledge gaps (Table 3). Clinical CF centres can provide rich clinical data to accompany bacterial isolates such as the rates of decline in lung function, frequency of exacerbations, co-infecting pathogens, and other drug treatments.

Table 3 Knowledge gaps in relation to the selection and testing of relevant CF pathogen strains

The last few decades have witnessed improved health outcomes and decreasing rates of chronic infection with several priority CF pathogens [4]. The revolutionary CFTR modulator drugs have the ability to restore a substantial amount of CFTR function, and are now available for the vast majority of the CF population in many countries [53]. Registry data have demonstrated a modest impact on infection prevalence for certain bacterial pathogens, even in populations commencing treatment in adulthood. It is anticipated that the impacts on infection acquisition and evolution to chronicity will be even more marked in CF children commencing CFTR modulators before airway disease is established. With this changing landscape in bacterial CF lung infections, there is a strong case to continue prospective collection and characterisation of CF infection isolates to join up current and historical information, and account for the clear differences in the clonal population biology of problematic pathogens such as P. aeruginosa, Bcc, and M. abscessus:

How can researchers help provide context to clinical isolates or other testing strains with limited genotypic characterisation?

Given that genome sequencing is now cost-effective, carrying this out for uncharacterised testing isolates is highly recommended. A draft bacterial genome can be uploaded to the SpeciesID tool (https://pubmlst.org/species-id) available at the PubMLST database, which hosts strain genotyping resources for multiple CF and other pathogens (Table 1) [45••]. The tool will carry out ribosomal multilocus sequence typing (rMLST) [45••] directly on the draft genome and provide both taxonomic and strain level matches to the query isolate sequence against the databases. The provenance data on closely related strains within pubMLST (Table 1) can then be used to provide additional context to the uncharacterized testing isolate. If the genome sequence of the isolate is subsequently deposited in the DNA archives, this will help researchers with future comparative analysis and set a baseline for what AMR resistance factors may have present in the testing isolate.

Conclusions

The criteria and core principles on strain selection (Fig. 2) can be applied to multiple priority CF pathogens beyond P. aeruginosa. Phenotypic and genomic knowledge of Bcc species, B. gladioli, M. abscessus, S. aureus, and Achromobacter species in CF is reaching a point where there is now sufficient understanding to enable systematic strain selection (Table 1). In light of this, an additional consideration becomes:

What is the optimum number of strains of a given CF pathogen species that is required for any testing panel to be relevant?

Given that we know multiple strain lineages and core genomic groups exist within P. aeruginosa [13] and B. cenocepacia [27•], initial testing of at least 5 to 10 strains should enable basic coverage of such population biology diversity. Going beyond this will depend on multiple variables including the logistics of the testing required, the cost, how many well-characterised strains are easily accessible, and whether panels are representative of current and emerging AMR. The strain coverage within the P. aeruginosa international reference panel and its use [9••] suggests that 40 to 50 strains can be handled as a relevant number for investigations on multiple traits, at least for broader screening.

This review of strain choice for bacterial pathogens also highlights multiple knowledge gaps for CF lung infection which researchers should seek to fill and include (Table 3): fungal CF pathogens, H. influenzae, and emerging AMR Gram-negative bacterial CF pathogens, anaerobic CF bacteria, the wider lung microbiome, and CF relevant models for therapeutic testing. Outlining relevant bacterial strain resources for the major well-characterised CF pathogens (Tables 1 and 2) and core principles for selecting relevant strains (Fig. 2) provides a basis from which to consider the challenge of filling these knowledge gaps (Table 3) in future.