Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.)

  • 160 Accesses

  • 81 Citations


The public EST (expressed sequence tag) databases represent an enormous but heterogeneous repository of sequences, including many from a broad selection of plant species and a wide range of distinct varieties. The significant redundancy within large EST collections makes them an attractive resource for rapid pre-selection of candidate sequence polymorphisms. Here we present a strategy that allows rapid identification of candidate SNPs in barley (Hordeum vulgare L.) using publicly available EST databases. Analysis of 271,630 EST sequences from different cDNA libraries, representing 23 different barley varieties, resulted in the generation of 56,302 tentative consensus sequences. In all, 8171 of these unigene sequences are members of clusters with six or more ESTs. By applying a novel SNP detection algorithm (SNiPpER) to these sequences, we identified 3069 candidate inter-varietal SNPs. In order to verify these candidate SNPs, we selected a small subset of 63 present in 36 ESTs. Of the 63 SNPs selected, we were able to validate 54 (86%) using a direct sequencing approach. For further verification, 28 ESTs were mapped to distinct loci within the barley genome. The polymorphism information content (PIC) and nucleotide diversity (π) values of the SNPs identified by the SNiPpER algorithm are significantly higher than those that were obtained by random sequencing. This demonstrates the efficiency of our strategy for SNP identification and the cost-efficient development of EST-based SNP-markers.

This is a preview of subscription content, log in to check access.

Fig. 1.
Fig. 2.
Fig. 3a, b.


  1. Abdel-Ghani AH, Parzies HK, Geiger HH (2002) Estimation of outcrossing rate in Hordeum spontaneum and barley landraces from Jordan. In: Deininger A (ed) International research on food security, natural resource management and rural development (Deutscher Tropentag 2002). University of Kassel-Witzenhausen, Germany

  2. Badr A, Muller K, Schafer-Pregl R, El Rabey H, Effgen S, Ibrahim HH, Pozzi C, Rohde W, Salamini F (2000) On the origin and domestication history of barley ( Hordeum vulgare). Mol Biol Evol 17:499–510

  3. Bakhanashvili M, Hizi A (1992) Fidelity of the RNA-dependent DNA synthesis exhibited by the reverse transcriptases of human immunodeficiency virus types 1 and 2 and of murine leukemia virus: mispair extension frequencies. Biochemistry 31:9393–9398

  4. Bakhanashvili M, Hizi A (1993) The fidelity of the reverse transcriptases of human immunodeficiency viruses and murine leukemia virus, exhibited by the mispair extension frequencies, is sequence dependent and enzyme related. FEBS Lett 319:201–205

  5. Buetow KH, Edmonson MN, Cassidy AB (1999). Reliable identification of large numbers of candidate SNPs from public EST data. Nat Genet 21:323–325

  6. Cho RJ, et al (1999) Genome-wide mapping with biallelic markers in Arabidopsis thaliana. Nat Genet 23:203–207

  7. Curry J, Glickman BW (1997) Moloney murine leukemia reverse transcriptase suspect in the production of multiple misincorporations during hprt cDNA synthesis. Mutat Res 374:145–148

  8. Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194

  9. Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8:175–185

  10. Feuillet C, Keller B (2002) Comparative genomics in the grass family: molecular characterization of grass genome structure and evolution. Ann Bot 89:3–10

  11. Gaut BS, Le Thierry d'Ennequin M, Peek AS, Sawkins MC (2000) Maize as a model for the evolution of plant nuclear genomes. Proc Natl Acad Sci USA 97:7008–7015

  12. Giordano M, Oefner PJ, Underhill PA, Cavalli-Sforza L, Tosi R, Richiardi PM (1999) Identification by denaturing high-performance liquid chromatography of numerous polymorphisms in a candidate region for multiple sclerosis susceptibility. Genomics 56:247–253

  13. Goff SA, et al (2002) A draft sequence of the rice genome ( Oryza sativa L. ssp. japonica) Science 296:92–100

  14. Gribskov M, Devereux J, Burgess RR (1984) The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression. Nucleic Acids Res 12:539–549

  15. Griffin TJ, Smith LM (2000) Single-nucleotide polymorphism analysis by MALDI-TOF mass spectrometry. Trends Biotechnol 18:77–84

  16. Hartl DL, Clark AG (1997) Principles of population genetics. Sinauer Associates, Sunderland, Mass.

  17. Heumann K, Mewes H-W (1996) The Hashed Position Tree (HPT): a suffix tree variant for large data sets stored on slow mass storage devices. In: Ziviani N, Baeza-Yates A, Guimaraes G (eds) Proceedings of the Third South American Workshop on String Processing. Carlton University Press, Ottawa, pp 101–115

  18. Hoskins RA, et al (2002) Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biol 3:Research 0085.1–0085.16

  19. Kent WJ, Haussler D (2001) Assembly of the working draft of the human genome with GigAssembler. Genome Res 11:1541–1548

  20. Kota R, Wolf M, Michalek W, Graner A (2001) Application of DHPLC for mapping of single nucleotide polymorphisms (SNPs) in barley ( Hordeum vulgare L.). Genome 44:523–528

  21. Lund B, Ortiz R, Skovgaard IM, Waugh R, Andersen SB (2002) Analysis of potential duplicates in barley gene bank collections using re-sampling of microsatellite data. Theor Appl Genet (DOI 10.1007/s00122-002-1130-y)

  22. Marth GT, Korf I, Yandell MD, Yeh, RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR (1999) A general approach to single-nucleotide polymorphism discovery. Nat Genet 23:452–456

  23. Nairz K, Stocker H, Schindelholz B, Hafen E (2002) High-resolution SNP mapping by denaturing HPLC. Proc Natl Acad Sci USA 99:10575–10580

  24. Neff MM, Neff JD, Chory J, Pepper AE (1998) dCAPS, a simple technique for the genetic analysis of single nucleotide polymorphisms: experimental applications in Arabidopsis thaliana genetics. Plant J 14:387–392

  25. Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York

  26. Nei M, Li WH (1979). Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA 76:5269–5273

  27. Newton CR, Graham A, Heptinstall LE, Powell SJ, Summers C, Kalshekar N, Smith JC, Markham AF (1989) Analysis of any point mutation in DNA: the amplification refractory mutation system (ARMS). Nucleic Acids Res 17:2503–2516

  28. Oefner PJ, Underhill PA (1998) DNA mutation detection using denaturing high performance liquid chromatography (DHPLC). In: Current Protocols in Human Genetics. Wiley and Sons, USA

  29. Pastinen T, Raitio M, Lindroos K, Tainola P, Peltonen L, Syvanen AC (2000) A system for specific, high-throughput genotyping by allele-specific primer extension on microarrays. Genet Res 10:1031–1042

  30. Picoult-Newberg L, Ideker TE, Pohl MG, Taylor SL, Donaldson MA, Nickerson DA, Boyce-Jacino M (1999) Mining SNPs from EST databases. Genome Res 9:167–174

  31. Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J (2001) The TIGR gene indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res 29:159–164

  32. Rafalski JA (2002a) Novel genetic mapping tools in plants: SNPs and LD-based approaches. Plant Sci 162:329–333

  33. Rafalski JA (2002b) Application of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol 5:94–100

  34. Roberts JD, Preston BD, Johnston LA, Soni A, Loeb LA, Kunkel TA (1989) Fidelity of two retroviral reverse transcriptases during DNA-dependent DNA synthesis in vitro. Mol Cell Biol 9:469–476

  35. Ronaghi M, Uhlén M, Nyrén P (1998) A sequencing method based on real-time pyrophosphate. Science 281:363–365

  36. Rostoks N, Park YJ, Ramakrishna W, Ma J, Druka A, Shiloff BA, SanMiguel PJ, Jiang Z, Brueggeman R, Sandhu D, Gill K, Bennetzen JL, Kleinhofs A (2002) Genomic sequencing reveals gene content, genomic organization, and recombination relationships in barley. Funct Integr Genomics 2:51–59

  37. Rudd S, Mewes HW, Mayer KF (2003) Sputnik: a database platform for comparative plant genomics. Nucleic Acids Res 31:128–132

  38. Schneider K, Weisshaar B, Borchardt DC, Salamini F (2001) SNP frequency and allelic haplotype of Beta vulgaris expressed genes. Mol Breeding 8:63–74

  39. Stoesser G, Baker W, Van Den Broek A, Garcia-Pastor M, Kanz C, Kulikova T, Leinonen R, Lin Q, Lombard V, Lopez R, Redaschi N, Stoehr P, Tuli MA, Tzouvara K, Vaughan R (2003) The EMBL nucleotide sequence database: major new developments. Nucleic Acids Res 31:17–22

  40. Syvänen AC, Aalto-Setälä K, Harju L, Kontula K, Soderlund H (1990) A primer-guided nucleotide incorporation assay in the genotyping of Apolipoprotein E. Genomics 8:684–692

  41. The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815

  42. Useche FJ, Gao G, Harafey M, Rafalski A (2001) High-throughput identification, database storage and analysis of SNPs in EST sequences. Genome Inform Ser Workshop Genome Inform 12:194–203

  43. Wallace RB, Shaffer J, Murphy RF, Bonner J, Hirose T, Itakura K (1979) Hybridization of synthetic oligodeoxyribonucleotides to ϕX174 DNA: the effect of single base pair mis-match. Nucleic Acids Res 6:3543–3557

  44. Wallace RB, Johnson MJ, Hirose T, Miyake T, Kawashima EH, Itakura K (1981) The use of synthetic oligonucleotide as hybridization probes. II. Hybridization of oligonucleotides of mixed sequence to rabbit β-globin DNA. Nucleic Acids Res 9:879–894

  45. Waterston RH, Lander ES, Sulston JE (2002) On the sequencing of the human genome. Proc Natl Acad Sci USA 99:3712–3716

  46. Wolford JK, Blunt D, Ballecer C, Prochazka M (2000) High-throughput SNP detection by using DNA pooling and denaturing high performance liquid chromatography (DHPLC). Human Genet 107:483–487

  47. Wu DY, Wallance RB (1989) The ligation amplification reaction (LAR)—amplification of specific DNA sequences using sequential rounds of template-dependent ligation. Genomics 4:460–569

  48. Yu J, (2002) A draft sequence of the rice genome ( Oryza sativa L. ssp. indica) Science 296:79–92

  49. Zohary D, Hopf M (2000) Domestication of plants in the old world. Oxford University Press, Oxford, UK

Download references


We are grateful to Patrick Hayes for providing the DH lines of the barley mapping populations Steptoe × Morex and Oregon Wolfe Dom × Oregon Wolfe Rec. The technical assistance of Ulrike Beier is gratefully acknowledged. This work was funded by the German Federal Ministry of Education and Research in conjunction with the GABI program (BMBF Grants 0312270/4, 0312271A and 0312278C)

Author information

Correspondence to A. Graner.

Additional information

The first two authors contributed equally to this work

Communicated by R. Hagemann

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kota, R., Rudd, S., Facius, A. et al. Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.). Mol Gen Genomics 270, 24–33 (2003). https://doi.org/10.1007/s00438-003-0891-6

Download citation


  • Single-nucleotide polymorphisms (SNPs)
  • Expressed sequence tags (ESTs)
  • Denaturing high-performance liquid chromatography (DHPLC)
  • Data mining
  • Bioinfomatics