Abstract
A mathematical method was developed in this study to determine tandem repeats in a DNA sequence. A multiple alignment of periods was calculated by direct optimization of the position-weight matrix (PWM) without using pairwise alignments or searching for similarity between periods. Random PWMs were used to develop a new mathematical algorithm for periodicity search. The developed algorithm was applied to analyze the DNA sequences of C. elegans genome. 25360 regions having a periodicity with length of 2 to 50 bases were found. On the average, a periodicity of ~4000 nucleotides was found to be associated with each region. A significant portion of the revealed regions have periods consisting of 10 and 11 nucleotides, multiple to 10 nucleotides and periods in the vicinity of 35 nucleotides. Only ~30% of the periods found were discovered early. This study discussed the origin of periodicity with insertions and deletions.
This work was supported by Russian Science Foundation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Korotkov, E.V., Korotkova, M.A., Kudryashov, N.A.: Information decomposition method to analyze symbolical sequences. Phys. Lett. Sect. A Gen. At. Solid State Phys. 312, 198–210 (2003)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998). doi:10.1017/CBO9780511790492
Suvorova, Y.M., Korotkova, M.A., Korotkov, E.V.: Comparative analysis of periodicity search methods in DNA sequences. Comput. Biol. Chem. 53(PA), 43–48 (2014). doi:10.1016/j.compbiolchem.2014.08.008
Tiwari, S., Ramachandran, S., Bhattacharya, A., Bhattacharya, S., Ramaswamy, R.: Prediction of probable genes by Fourier analysis of genomic sequences. Comput. Appl. Biosci. CABIOS 13, 263–270 (1997)
Lobzin, V.V., Chechetkin, V.R.: Order and correlations in genomic DNA sequences. The spectral approach. Uspekhi Fiz Nauk 170, 57 (2000)
Benson, G.: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999)
Parisi, V., De Fonzo, V., Aluffi-Pentini, F.: STRING: finding tandem repeats in DNA sequences. Bioinformatics 19, 1733–1738 (2003)
Anisimova, M., Pečerska, J., Schaper, E.: Statistical approaches to detecting and analyzing tandem repeats in genomic sequences. Front. Bioeng. Biotechnol. 3, 31 (2015). doi:10.3389/fbioe.2015.00031
Turutina, V.P., Laskin, A.A., Kudryashov, N.A., Skryabin, K.G., Korotkov, E.V.: Identification of amino acid latent periodicity within 94 protein families. J. Comput. Biol. 13, 946–964 (2006). doi:10.1089/cmb.2006.13.946
Kolpakov, R., Bana, G., Kucherov, G.: Mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 31, 3672–3678 (2003)
Pellegrini, M., Renda, M.E., Vecchio, A.: TRStalker: an efficient heuristic for finding fuzzy tandem repeats. Bioinformatics 26, i358–i366 (2010). doi:10.1093/bioinformatics/btq209
Wexler, Y., Yakhini, Z., Kashi, Y., Geiger, D.: Finding approximate tandem repeats in genomic sequences. J. Comput. Biol. 12, 928–942 (2005). doi:10.1089/cmb.2005.12.928
Jorda, J., Kajava, A.V.: T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics 25, 2632–2638 (2009)
Mudunuri, S.B., Kumar, P., Rao, A.A., Pallamsetty, S., Nagarajaram, H.A.: G-IMEx: a comprehensive software tool for detection of microsatellites from genome sequences. Bioinformation 5, 221–223 (2010)
Mudunuri, S.B., Nagarajaram, H.A.: IMEx: imperfect microsatellite extractor. Bioinformatics 23, 1181–1187 (2007). doi:10.1093/bioinformatics/btm097
Grissa, I., Vergnaud, G., Pourcel, C.: CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 35, W52–W57 (2007). doi:10.1093/nar/gkm360
Boeva, V., Regnier, M., Papatsenko, D., Makeev, V.: Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 22, 676–684 (2006). doi:10.1093/bioinformatics/btk032
Lim, K.G., Kwoh, C.K., Hsu, L.Y., Wirawan, A.: Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief. Bioinform. 14, 67–81 (2013). doi:10.1093/bib/bbs023
Moniruzzaman, M., Khatun, R., Yaakob, Z., Khan, M.S., Mintoo, A.A.: Development of microsatellites: a powerful genetic marker. Agriculturists 13, 152 (2016). doi:10.3329/agric.v13i1.26559
Korotkov, E.V., Korotkova, M.A., Kudryashov, N.A.: The informational concept of searching for periodicity in symbol sequences. Mol. Biol. (Mosk) 37, 436–451 (2003)
Shelenkov, A., Skryabin, K., Korotkov, E.: Search and classification of potential minisatellite sequences from bacterial genomes. DNA Res. 13, 89–102 (2006). doi:10.1093/dnares/dsl004
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Pugacheva, V.M., Korotkov, A.E., Korotkov, E.V.: Search of latent periodicity in amino acid sequences by means of genetic algorithm and dynamic programming. Stat. Appl. Genet. Mol. Biol. 15, 381–400 (2016)
Kullback, S.: Information Theory and Statistics. Dover publications, New York (1997)
Betley, J.N., Frith, M.C., Graber, J.H., Choo, S., Deshler, J.O.: A ubiquitous and conserved signal for RNA localization in chordates. Curr. Biol. 12, 1756–1761 (2002)
Pokrzywa, R., Polanski, A.: BWtrs: a tool for searching for tandem repeats in DNA sequences based on the Burrows-Wheeler transform. Genomics 96, 316–321 (2010). doi:10.1016/j.ygeno.2010.08.001
Herzel, H., Weiss, O., Trifonov, E.N.: 10–11 bp periodicities in complete genomes reflect protein structure and DNA folding. Bioinformatics 15, 187–193 (1999)
Larsabal, E., Danchin, A.: Genomes are covered with ubiquitous 11 bp periodic patterns, the “class A flexible patterns”. BMC Bioinform. 6, 206 (2005). doi:10.1186/1471-2105-6-206
Schieg, P., Herzel, H.: Periodicities of 10–11 bp as indicators of the supercoiled state of genomic DNA. J. Mol. Biol. 343, 891–901 (2004). doi:10.1016/j.jmb.2004.08.068
Kumar, L., Futschik, M., Herzel, H.: DNA motifs and sequence periodicities. Silico. Biol. 6, 71–78 (2006)
Kadauke, S., Blobel, G.A.: Chromatin loops in gene regulation. Biochim. Biophys. Acta 1789, 17–25 (2009). doi:10.1016/j.bbagrm.2008.07.002
Kantidze, O.L., Razin, S.V.: Chromatin loops, illegitimate recombination, and genome evolution. BioEssays 31, 278–286 (2009). doi:10.1002/bies.200800165
Richard, G.-F., Kerrest, A., Dujon, B.: Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol. Mol. Biol. Rev. 72, 686–727 (2008). doi:10.1128/MMBR.00011-08
Acknowledgements
This work was supported by Competitiveness Growth Program of the Federal Autonomous Educational Institution of Higher Professional Education National Research Nuclear University MEPhI (Moscow Engineering Physics Institute).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Korotkov, E.V., Korotkova, M.A. (2017). Search of Regions with Periodicity Using Random Position Weight Matrices in the Genome of C. elegans . In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-56154-7_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)