Skip to main content

On the Statistics of Identifying Candidate Pathogen Effectors

  • Protocol
  • First Online:
Plant-Pathogen Interactions

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1127))

Abstract

High-throughput sequencing is an increasingly accessible tool for cataloging gene complements of plant pathogens and their hosts. It has had great impact in plant pathology, enabling rapid acquisition of data for a wide range of pathogens and hosts, leading to the selection of novel candidate effector proteins, and/or associated host targets (Bart et al., Proc Nat Acad Sci U S A doi:10.1073/pnas.1208003109, 2012; Agbor and McCormick, Cell Microbiol 13:1858–1869, 2011; Fabro et al., PLoS Pathog 7:e1002348, 2011; Kim et al., Mol Plant Pathol 2:715–730, 2011; Kimbrel et al., Mol Plant Pathol 12:580–594, 2011; O’Brien et al., Curr Opin Microbiol 14:24–30, 2011; Vleeshouwers et al., Annu Rev Phytopathol 49:507–531, 2011; Sarris et al., Mol Plant Pathol 11:795–804, 2010; Boch and Bonas, Annu Rev Phytopathol 48:419–436, 2010; Mcdermott et al., Infect Immun 79:23–32, 2011).

Identification of candidate effectors from genome data is not different from classification in any other high-content or high-throughput experiment. The primary aim is to discover a set of qualitative or quantitative sequence characteristics that discriminate, with a defined level of certainty, between proteins that have previously been identified as being either “effector” (positive) or “not effector” (negative). Combination of these characteristics in a mathematical model, or classifier, enables prediction of whether a protein is or is not an effector, with a defined level of certainty. High-throughput screening of the gene complement is then performed to identify candidate effectors; this may seem straightforward, but it is unfortunately very easy to identify seemingly persuasive candidate effectors that are, in fact, entirely spurious.

The main sources of danger in this area of statistical modeling are not entirely independent of each other, and include: inappropriate choice of classifier model; poor selection of reference sequences (known positive and negative examples); poor definition of classes (what is, and what is not, an effector); inadequate training sample size; poor model validation; and lack of adequate model performance metrics (Xia et al., Metabolomics doi:10.1007/s11306-012-0482-9, 2012). Many studies fail to take these issues into account, and thereby fail to discover anything of true significance or, worse, report spurious findings that are impossible to validate. Here we summarize the impact of these issues and present strategies to assist in improving design and evaluation of effector classifiers, enabling robust scientific conclusions to be drawn from the available data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bart R, Cohn M, Kassen A, McCallum EJ, Shybut M et al (2012) High-throughput genomic sequencing of cassava bacterial blight strains identifies conserved effectors to target for durable resistance. Proc Natl Acad Sci U S A. doi:10.1073/pnas.1208003109

    Google Scholar 

  2. Agbor TA, McCormick BA (2011) Salmonella effectors: important players modulating host cell function during infection. Cell Microbiol 13:1858–1869. doi:10.1111/j.1462-5822.2011.01701.x

    Google Scholar 

  3. Fabro G, Steinbrenner J, Coates M, Ishaque N, Baxter L et al (2011) Multiple candidate effectors from the oomycete pathogen Hyaloperonospora arabidopsidis suppress host plant immunity. PLoS Pathog 7:e1002348. doi:10.1371/journal.ppat.1002348

  4. Kim J-G, Taylor KW, Mudgett MB (2011) Comparative analysis of the XopD type III secretion (T3S) effector family in plant pathogenic bacteria. Mol Plant Pathol 12:715–730. doi:10.1111/j.1364-3703.2011.00706.x

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  5. Kimbrel JA, Givan SA, Temple TN, Johnson KB, Chang JH (2011) Genome sequencing and comparative analysis of the carrot bacterial blight pathogen, Xanthomonas hortorum pv. carotae M081, for insights into pathogenicity and applications in molecular diagnostics. Mol Plant Pathol 12:580–594. doi:10.1111/j.1364-3703.2010.00694.x

  6. O'Brien HE, Desveaux D, Guttman DS (2011) Next-generation genomics of Pseudomonas syringae. Curr Opin Microbiol 14:24–30. doi:10.1016/j.mib.2010.12.007

    Google Scholar 

  7. Vleeshouwers VGAA, Raffaele S, Vossen JH, Champouret N, Oliva R et al (2011) Understanding and exploiting late blight resistance in the age of effectors. Annu Rev Phytopathol 49:507–531. doi:10.1146/annurev-phyto-072910-095326

    Article  CAS  PubMed  Google Scholar 

  8. Sarris PF, Skandalis N, Kokkinidis M, Panopoulos NJ (2010) In silico analysis reveals multiple putative type VI secretion systems and effector proteins in Pseudomonas syringae pathovars. Mol Plant Pathol 11:795–804. doi:10.1111/j.1364-3703.2010.00644.x

    Google Scholar 

  9. Boch J, Bonas U (2010) Xanthomonas AvrBs3 family-type III effectors: discovery and function. Annu Rev Phytopathol 48:419–436. doi:10.1146/annurev-phyto-080508-081936

    Google Scholar 

  10. Mcdermott JE, Corrigan A, Peterson E, Oehmen C, Niemann G et al (2011) Computational prediction of type III and IV secreted effectors in gram-negative bacteria. Infect Immun 79:23–32. doi:10.1128/IAI.00537-10

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  11. Xia J, Broadhurst DI, Wilson M, Wishart DS (2012) Translational biomarker discovery in clinical metabolomics: an introductory tutorial. Metabolomics. doi:10.1007/s11306-012-0482-9

    PubMed Central  PubMed  Google Scholar 

  12. Cornelis GR (2006) The type III secretion injectisome. Nat Rev Microbiol 4:811–825. doi:10.1038/nrmicro1526

    Article  CAS  PubMed  Google Scholar 

  13. Whisson SC, Boevink PC, Moleleki L, Avrova AO, Morales JG et al (2007) A translocation signal for delivery of oomycete effector proteins into host plant cells. Nature 450:115–118. doi:10.1038/nature06203

    Article  CAS  PubMed  Google Scholar 

  14. Löwer M, Schneider G (2009) Prediction of type III secretion signals in genomes of gram-negative bacteria. PLoS ONE 4:e5917. doi:10.1371/journal.pone.0005917

    Article  PubMed Central  PubMed  Google Scholar 

  15. Arnold R, Brandmaier S, Kleine F, Tischler P, Heinz E et al (2009) Sequence-based prediction of type III secreted proteins. PLoS Pathog 5:e1000376. doi:10.1371/journal.ppat.1000376

    Article  PubMed Central  PubMed  Google Scholar 

  16. Sui T, Yang Y, Wang X (2013) Sequence-based feature extraction for type III effector prediction. Int J Biosci Biochem Bioinforma 3:246–251. doi:10.7763/IJBBB.2013.V3.206

    Google Scholar 

  17. Liu C, Che D, Liu X, Song Y (2013) Applications of machine learning in genomics and systems biology. Comput Math Methods Med 2013:587492. doi:10.1155/2013/587492

  18. Broadhurst D, Kell DB (2006) Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics 2:171–196

    Article  CAS  Google Scholar 

  19. O'Brien HE, Thakur S, Gong Y, Fung P, Zhang J et al (2012) Extensive remodeling of the Pseudomonas syringae pv. avellanae type III secretome associated with two independent host shifts onto hazelnut. BMC Microbiol 12:141

    Google Scholar 

  20. McNally RR, Toth IK, Cock PJA, Pritchard L, Hedley PE et al (2012) Genetic characterization of the HrpL regulon of the fire blight pathogen Erwinia amylovora reveals novel virulence factors. Mol Plant Pathol 13:160–173. doi:10.1111/j.1364-3703.2011.00738.x

  21. Arnold DL, Jackson RW (2011) Bacterial genomes: evolution of pathogenicity. Curr Opin Plant Biol 14:385–391. doi:10.1016/j.pbi.2011.03.001

    Article  CAS  PubMed  Google Scholar 

  22. Haas BJ, Kamoun S, Zody MC, Jiang RHY, Handsaker RE et al (2009) Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans. Nature 461:393–398. doi:10.1038/nature08358

    Google Scholar 

  23. Win J, Morgan W, Bos JIB, Krasileva KV, Cano LM et al (2007) Adaptive evolution has targeted the C-terminal domain of the RXLR effectors of plant pathogenic oomycetes. Plant Cell 19:2349–2369. doi:10.1105/tpc.107.051037

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  24. Bhattacharjee S, Hiller NL, Liolios K, Win J, Kanneganti T-D et al (2006) The malarial host-targeting signal is conserved in the Irish potato famine pathogen. PLoS Pathog 2:e50. doi:10.1371/journal.ppat.0020050

    Article  PubMed Central  PubMed  Google Scholar 

  25. Petnicki-Ocwieja T, Schneider DJ, Tam VC, Chancey ST, Shan L et al (2002) Genomewide identification of proteins secreted by the Hrp type III protein secretion system of Pseudomonas syringae pv. tomato DC3000. Proc Natl Acad Sci U S A 99:7652–7657. doi:10.1073/pnas.112183899

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  26. Greenberg JT, Vinatzer B (2003) Identifying type III effectors of plant pathogens and analyzing their interaction with plant cells. Curr Opin Microbiol 6(1):20–28

    Article  CAS  PubMed  Google Scholar 

  27. Bogdanove AJ, Schornack S, Lahaye T (2010) TAL effectors: finding plant genes for disease and defense. Curr Opin Plant Biol 13: 394–401. doi:10.1016/j.pbi.2010.04.010

    Article  CAS  PubMed  Google Scholar 

  28. Boch J, Scholze H, Schornack S, Landgraf A, Hahn S et al (2009) Breaking the code of DNA-binding specificity of TAL-type III effectors. Science. doi:10.1126/science.1178811

    PubMed  Google Scholar 

  29. Yang Y (2012) Identification of novel type III effectors using latent Dirichlet allocation. Comput Math Methods Med 2012:696190. doi:10.1155/2012/696190

    PubMed Central  PubMed  Google Scholar 

  30. Wang Y, Zhang Q, Sun M-A, Guo D (2011) High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles. Bioinformatics 27:777–784. doi:10.1093/bioinformatics/btr021

    Article  CAS  PubMed  Google Scholar 

  31. Macho AP, Ruiz-Albert J, Tornero P, Beuzón CR (2009) Identification of new type III effectors and analysis of the plant response by competitive index. Mol Plant Pathol 10:69–80. doi:10.1111/j.1364-3703.2008.00511.x

    Google Scholar 

  32. Xu S, Zhang C, Miao Y, Gao J, Xu D (2010) Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif. BMC Genomics 11(Suppl 3):S1. doi:10.1186/1471-2164-11-S3-S1

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  33. Jehl M-A, Arnold R, Rattei T (2010) Effective – a database of predicted secreted bacterial proteins. Nucleic Acids Res. doi:10.1093/nar/gkq1154

    Google Scholar 

  34. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    Google Scholar 

  35. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517. doi:10.1093/bioinformatics/btm344

    Article  CAS  PubMed  Google Scholar 

  36. Eriksson L, Johansson E, Kettaneh-Wold N, Wold S (2001) Multi- and megavariate data analysis: principles and applications. Umetrics AB, Umea

    Google Scholar 

  37. Brereton RG (2003) Chemometrics: data analysis for the laboratory and chemical plant. Wiley, Chichester UK

    Book  Google Scholar 

  38. Efron B, Tibshirani R (1997) Improvements on cross-validation: the .632+ bootstrap method. J Am Stat Assoc 92:548–560. doi:10.1080/01621459.1997.10474007

    Google Scholar 

  39. Obuchowski NA, Lieber ML, Wians FH (2004) ROC curves in clinical chemistry: uses, misuses, and possible solutions. Clin Chem 50:1118–1125. doi:10.1373/clinchem.2004.031823

    Google Scholar 

  40. Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39(4):561–577

    CAS  PubMed  Google Scholar 

  41. Lasko TA, Bhagwat JG, Zou KH (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform 38(5):404–415

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leighton Pritchard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media, New York

About this protocol

Cite this protocol

Pritchard, L., Broadhurst, D. (2014). On the Statistics of Identifying Candidate Pathogen Effectors. In: Birch, P., Jones, J., Bos, J. (eds) Plant-Pathogen Interactions. Methods in Molecular Biology, vol 1127. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-986-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-986-4_4

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-985-7

  • Online ISBN: 978-1-62703-986-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics