Data Mining to Detect Common, Unique, and Polymorphic Simple Sequence Repeats

  • Aditi Kapil
  • C. K. Jha
  • Asheesh Shanker


Nowadays computational data mining of biological data is of paramount importance to discover patterns in large data generated through sequencing and other efforts. The extracted information can be used in various ways to get new insights about subject organism. Simple sequence repeats (SSRs) consist of 1–6 nucleotides and can be characterized in wet laboratory as well as mined through computational approaches. These repeats help in the genetic mapping, breeding experiments, phylogeny and can also be used to develop molecular markers. In view of their usefulness, various specialized biological databases of SSRs were developed. In this chapter, a case study is presented which used in silico mined nucleotide sequence data to further detect putative polymorphic, common, and unique SSRs in chloroplast genomes of genus Triticum. Earlier, SSRs were detected in several organisms; however, in silico detection of unique, common, and putative polymorphic SSRs is a recent development which can be used in various ways including the identification of species.


Simple sequence repeats Chloroplast Triticum Primer 


  1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402CrossRefGoogle Scholar
  2. Bachmann L, Bare PTJ (2004) Allelic variation, fragment length analysis and population genetic model: a case study on Drosophilla microsatellites. Zool Syst Evol Res 42:215–222CrossRefGoogle Scholar
  3. Barkworth ME (1992) Taxonomy of the Triticeae: a historical perspective. Hereditas 116:1–14CrossRefGoogle Scholar
  4. Batwal S, Sitaraman S, Ranade S, Khandekar P, Bajaj S (2011) Analysis of distribution and significance of simple sequence repeats in enteric bacteria Shigella dysenteriae SD197. Bioinformation 6:348–351CrossRefGoogle Scholar
  5. Botstein D, White RL, Skolnick M, Davis RW (1980) Construction of a genetic linkage map inman using restriction fragment length polymorphisms. Am J Hum Genet 32:314–331PubMedPubMedCentralGoogle Scholar
  6. Coenye T, Vandamme P (2005) Characterization of mononucleotide repeats in sequenced prokaryotic genomes. DNA Res 12:221–233CrossRefGoogle Scholar
  7. Dvorak J, Zhang H-B (1992) Application of molecular tools for study of the phylogeny of diploid and polyploid taxa in Triticeae. Hereditas 166:37–42Google Scholar
  8. Field D, Wills C (1996) Long, polymorphic microsatellites in simple organisms. Proc Biol Sci 263:209–215CrossRefGoogle Scholar
  9. Gerber HP, Seipel K, Georgiev O, Hofferer M, Hug M, Rusconi S, Schaffner W (1994) Transcriptional activation modulated by homopolymeric glutamine and proline stretches. Science 263:808–811CrossRefGoogle Scholar
  10. Gupta PK, Varshney RK (2000) The development and use of microsatellite markers for genetic analysis and plant breeding with emphasis on bread wheat. Euphytica 113:163–185CrossRefGoogle Scholar
  11. Gupta PK, Balyan HS, Edwards KJ, Isaac P, Korzun V, Roder M, Jourdrier P, Schlatter AR, Dubcovsky J, de la Pena RC, Khairallah M, Hayden M, Keller B, Wang R, Hardouin JP, Jack P, Leroy P (2002) Genetic mapping of 66 new SSR loci in bread wheat. Theor Appl Genet 105:413–422CrossRefGoogle Scholar
  12. Gutiérrez-Ozuna R, Hamilton MB (2017) Identification and characterization of microsatellite loci in the tuliptree, Liriodendron tulipifera (Magnoliaceae). Appl Plant Sci 5(8):pii: apps.1700032. CrossRefGoogle Scholar
  13. Hancock JM (1995) The contribution of slippage-like processes to genome evolution. J Mol Evol 41:1038–1047CrossRefGoogle Scholar
  14. Heslop-Harrison JS (1992) Molecular cytogenetics, cytology and genomic comparisons in the Triticeae. Hereditas 116:93–99CrossRefGoogle Scholar
  15. Jones N, Ougham H, Thomas H, Pasakinskiense I (2009) Markers and mapping revisited: finding your gene. New Phytol 183:935–966CrossRefGoogle Scholar
  16. Kabra R, Kapil A, Attarwala K, Rai PK, Shanker A (2016) Identification of common, unique and polymorphic microsatellites among 73 cyanobacterial genomes. World J Microbiol Biotechnol 32:71CrossRefGoogle Scholar
  17. Kaila T, Chaduvla PK, Rawal HC, Saxena S, Tyagi A, Mithra SVA, Solanke AU, Kalia P, Sharma TR, Singh NK, Gaikwad K (2017) Chloroplast genome sequence of Clusterbean (Cyamopsis tetragonoloba L.): genome structure and comparative analysis. Genes (Basel) 8(9):E212. CrossRefGoogle Scholar
  18. Kapil A, Rai PK, Shanker A (2014) ChloroSSRdb: a repository of perfect and imperfect chloroplastic simple sequence repeats (cpSSRs) of green plants. Database 2014:1–5CrossRefGoogle Scholar
  19. Kashi Y, King D, Soller M (1997) Simple sequence repeats as a source of quantitative genetic variation. Trends Genet 13:74–78CrossRefGoogle Scholar
  20. Katti MV, Rajenkar PK, Gupta VS (2001) Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol 18:1161–1167CrossRefGoogle Scholar
  21. Kumar M, Kapil A, Shanker A (2014) MitoSatPlant: mitochondrial microsatellites database of Viridiplantae. Mitochondrion 19:334–337CrossRefGoogle Scholar
  22. Kumpatla SV, Mukhopadhyaya S (2005) Mining and survey of simple sequence repeats in expressed sequence tags in dicotyledonous species. Genome 48:985–998CrossRefGoogle Scholar
  23. Levinson G, Gutman GA (1987) Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4:203–221PubMedGoogle Scholar
  24. Morgante M, Hanafey M, Powell W (2002) Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet 30:194–200CrossRefGoogle Scholar
  25. Moxon ER, Wills C (1999) DNA microsatellites: agents of evolution. Sci Am 280:94–99CrossRefGoogle Scholar
  26. Mudunuri SB, Nagarajaram HA (2007) IMEx: imperfect microsatellite extractor. Bioinformatics 23:1181–1187CrossRefGoogle Scholar
  27. Ogihara Y, Tsunewaki K (1988) Diversity and evolution of chloroplast DNA in Triticum and Aegilops as revealed by restriction fragment analysis. Theor Appl Genet 76:321–332CrossRefGoogle Scholar
  28. Primmer CR, Raudsepp T, Chowdary BP, Moller AP, Ellegren H (1997) Low frequency of microsatellites in the avian genome. Genome Res 7:471–482CrossRefGoogle Scholar
  29. Rajendrakumar P, Biswal AK, Balachandran SM, Srinivasarao K, Sundaram RM (2007) Simple sequence repeats in organellar genomes of rice: frequency and distribution in genic and intergenic regions. Bioinformatics 23:1–4CrossRefGoogle Scholar
  30. Roder MS, Korzun V, Gill BS, Ganal MW (1998) The physical mapping of microsatellite markers in wheat. Genome 41:278–283CrossRefGoogle Scholar
  31. Roy JK, Prasad M, Varshney RK, Balyan HS, Blake TK, Dhaliwal HS, Singh H, Edwards KJ, Gupta PK (1999) Identification of a microsatellite on chromosomes 6B and a STS on 7D of bread wheat showing an association with preharvest sprouting tolerance. Theor Appl Genet 99:336–340CrossRefGoogle Scholar
  32. Sehgal SK, Li W, Rabinowicz PD, Chan A, Simkova H, Dolezel J, Gill BS (2012) Chromosome arm-specific BAC end sequences permit comparative analysis of homoeologous chromosomes and genomes of polyploid wheat. BMC Plant Biol 12:64CrossRefGoogle Scholar
  33. Shanker A, Bhargava A, Bajpai R, Singh S, Srivastava S, Sharma V (2007a) Bioinformatically mined simple sequence repeats in UniGene of Citrus sinensis. Sci Hort 113:353–361CrossRefGoogle Scholar
  34. Shanker A, Singh A, Sharma V (2007b) In silico mining in expressed sequences of Neurospora crassa for identification and abundance of microsatellites. Microbiol Res 162:250–256CrossRefGoogle Scholar
  35. Squirrell J, Hollingsworth PM, Woodhead M, Russell J, Low AJ, Gibby M, Powell W (2003) How much effort is required to isolate nuclear microsatellites from plants? Mol Ecol 12:1339–1348CrossRefGoogle Scholar
  36. Sung W, Tucker A, Bergeron RD, Lynch M, Thomas WK (2010) Simple sequence repeat variation in the Daphnia pulex genome. BMC Genomics 11:691CrossRefGoogle Scholar
  37. Tautz D, Renz M (1984) Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Res 12:4127–4138CrossRefGoogle Scholar
  38. Tautz D, Schlotterer C (1994) Simple Sequences. Curr Opin Genet Dev 4:832–837CrossRefGoogle Scholar
  39. Tomar RSS, Deshmukh RK, Naik K, Tomar SMS (2014) Development of chloroplast-specific microsatellite markers for molecular characterization of alloplasmic lines and phylogenetic analysis in wheat. Plant Breed 133:12–18CrossRefGoogle Scholar
  40. Untergrasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG (2012) Primer3: new capabilities and interfaces. Nucleic Acids Res 40:e115CrossRefGoogle Scholar
  41. Vogt P (1990) Potentially genetic functions of tandemly repeated DNA sequence blocks in the human genome are based on a highly conserved “chromatin folding code”. Hum Genet 84:301–336PubMedGoogle Scholar
  42. Voorrips RE (2002) MapChart: software for the graphical presentation of linkage maps and QTLs. J Hered 93:77–78CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Aditi Kapil
    • 1
  • C. K. Jha
    • 2
  • Asheesh Shanker
    • 1
    • 3
  1. 1.Department of Bioscience and BiotechnologyBanasthali VidyapithRajasthanIndia
  2. 2.Department of Computer ScienceBanasthali VidyapithRajasthanIndia
  3. 3.Department of BioinformaticsCentral University of South BiharGayaIndia

Personalised recommendations