Advertisement

Segmentation by Maximal Predictive Partitioning According to Composition Biases

  • Laurent Guéguen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2066)

Abstract

We present a method for segmenting qualitative sequences, according to a type of composition criteria whose definition and evaluation are founded on the notion of predictors and additive prediction. Given a set of predictors, a partition of a sequence can be precisely evaluated. We present a language for the declaration of predictors. One of the problems is to optimize the partition of a sequence into a given number of segments. The other problem is to obtain a suitable number of segments for the partitioning of the sequence. We present an algorithm which, given a sequence and a set of predictors, can successively compute the optimal partitions of the sequence for growing numbers of segments. The time- and space-complexity of the algorithm are linear for the length of sequence and number of predictors. Experimentally, the computed partitions are highly stable regard to the number of segments, and we present an application of this approach to the determination of the origins of replication of bacterial chromosomes.

Keywords

Linear Discriminant Analysis Sequence Vector Optimal Partition Bacterial Chromosome Optimal Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    W.D. Fisher. On grouping for maximal homogeneity. Journal of the American Statistical Association, 53:789–798, 1958.zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    A.D. Gordon. Cluster validation. In C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H.H. Bock, and Y. Baba, editors, Studies in Classification, Data Analysis, and Knowledge Organization: Data Science, Classification, and Related Methods, pages 22–39, Kobe, March 1996. IFCS, Springer-Verlag. http://www-solar.dcs.st-and.ac.uk/~allan/.
  3. 3.
    J.C. Gower. Maximal predictive classification. Biometrics, 30:643–654, 1974.zbMATHCrossRefGoogle Scholar
  4. 4.
    L. Guéguen, R. Vignes, and J. Lebbe. Maximal predictive clustering with order constraint: a linear and optimal algorithm. In A. Rizzi, M. Vichi, and H. Bock, editors, Advances in Data Science and Classification, pages 137–144. IFCS, Springer Verlag, July 1998.Google Scholar
  5. 5.
    D.M. Hawkins and D.F. Merriam. Optimal zonation of digitized sequential data. Mathematical Geology, 5(4):389–395, 1973.CrossRefGoogle Scholar
  6. 6.
    J.R. Lobry. Asymmetric substitution patterns in the two dna strands of bacteria. Mol. Biol. Evol., 13(5):660–665, 1996.Google Scholar
  7. 7.
    E.P.C. Rocha, A. Danchin, and A. Viari. Universal replication biases in bacteria. Molecular Microbiology, 32(1):11–16, 1999.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Laurent Guéguen
    • 1
  1. 1.CEB-LIS ― UPMC Paris VIFrance

Personalised recommendations