Statistical Identification of Uniformly Mutated Segments within Repeats

  • S. Cenk Ṣahinalp
  • Evan Eichler
  • Paul Goldberg
  • Petra Berenbrink
  • Tom Friedetzky
  • Funda Ergun
Conference paper

DOI: 10.1007/3-540-45452-7_21

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2373)
Cite this paper as:
Ṣahinalp S.C., Eichler E., Goldberg P., Berenbrink P., Friedetzky T., Ergun F. (2002) Statistical Identification of Uniformly Mutated Segments within Repeats. In: Apostolico A., Takeda M. (eds) Combinatorial Pattern Matching. CPM 2002. Lecture Notes in Computer Science, vol 2373. Springer, Berlin, Heidelberg

Abstract

Given a long string of characters from a constant size (w.l.o.g. binary) alphabet we present an algorithm to determine whether its characters have been generated by a single i.i.d. random source. More specifically, consider all possible k-coin models for generating a binary string S, where each bit of S is generated via an independent toss of one of the k coins in the model. The choice of which coin to toss is decided by a random walk on the set of coins where the probability of a coin change is much lower than the probability of using the same coin repeatedly. We present a statistical test procedure which, for any given S, determines whether the a posteriori probability for k = 1 is higher than for any other k > 1. Our algorithm runs in time O(l4 log l), where l is the length of S, through a dynamic programming approach which exploits the convexity of the a posteriori probability for k.

The problem we consider arises from two critical applications in analyzing long alignments between pairs of genomic sequences. A high alignment score between two DNA sequences usually indicates an evolutionary relationship, i.e. that the sequences have been generated as a result of one or more copy events followed by random point mutations. Such sequences may include functional regions (e.g. exons) as well as nonfunctional ones (e.g. introns). Functional regions with critical importance exhibit much lower mutation rates than non-functional DNA (or DNA

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • S. Cenk Ṣahinalp
    • 1
  • Evan Eichler
    • 2
  • Paul Goldberg
    • 3
  • Petra Berenbrink
    • 4
  • Tom Friedetzky
    • 5
  • Funda Ergun
    • 6
  1. 1.Dept of EECS, Dept of Genetics and Center for Computational GenomicsCWRUUSA
  2. 2.Dept of Genetics and Center for Computational GenomicsCWRUUSA
  3. 3.Dept of Computer ScienceUniversity of WarwickUK
  4. 4.School of ComputingSimon Fraser UniversityCanada
  5. 5.Pacific Institute of MathematicsSimon Fraser UniversityCanada
  6. 6.NEC Research Institute and Dept of EECSCWRUUSA

Personalised recommendations