Skip to main content

Generalized Pattern Matching Statistics

  • Conference paper
Mathematics and Computer Science II

Part of the book series: Trends in Mathematics ((TM))

Abstract

In pattern matching algorithms, a characteristic parameter is the number of occurrences of a given pattern in a random text of length n generated by a source. We consider here a generalization of the pattern matching problem in two ways. First, we deal with a generalized notion of pattern that encompasses classical patterns as well as “hidden patterns”. Second, we consider a quite general probabilistic model of sources that may possess a high degree of correlations. Such sources are built with dynamical systems and are called dynamical sources. We determine the mean and the variance of the number of occurrences in this generalized pattern matching problem, and establish a property of concentration of distribution. These results are obtained via combinatorics, formal language techniques, and methods of analytic combinatorics based on generating operators and generating functions. The generating operators come from the dynamical system framework and generate themselves generating functions. The motivation to study this problem comes from an attempt at finding a reliable threshold for intrusion detections, from textual data processing applications, and from molecular biology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Apostolico and M. Atallah, Compact Recognizers of Episode Sequences, Submitted to Information and Computation.

    Google Scholar 

  2. E. Bender and F. Kochman, The Distribution of Subword Counts is Usually Normal European Journal of Combinatorics 14, 265–275, 1993.

    Article  MathSciNet  MATH  Google Scholar 

  3. J. Bourdon, Size and Path-length of Patricia Tries: Dynamical Sources Context Random Structures and Algorithms Vol 19 (3–4), 289–315, 2001.

    Article  MathSciNet  MATH  Google Scholar 

  4. J. Bourdon, B. Vallée, Dynamical Sources in Information Theory: Motif Statistics, not published.

    Google Scholar 

  5. J. Bourdon, B. Daireaux, B. Vallée, Dynamical Analysis of a-Euclidean Algorithms, submitted.

    Google Scholar 

  6. J. Clément, P. Flajolet, and B. Vallée, Dynamical Sources in Information Theory: A General Analysis of Trie Structures Algorithmica 29, 307–369, 2001.

    Article  MathSciNet  MATH  Google Scholar 

  7. M. Crochemore and W. Rytter Text Algorithms Oxford University Press, New York, 1994.

    MATH  Google Scholar 

  8. Ph Flajolet, Y. Guivarch, W. Szpankowski and B. Vallée, Hidden Pattern Statistics,Proc. of ICALP’2001 LNCS 2076, 152–165, 2001.

    Google Scholar 

  9. L. Guibas and A. M. Odlyzko, Periods in Strings J. Combinatorial Theory Ser. A, 30, 19–43, 1981.

    MathSciNet  MATH  Google Scholar 

  10. L. Guibas and A. M. Odlyzko, String Overlaps, Pattern Matching and Non-transitive Games J. Combinatorial Theory Ser. A, 30, 183–208, 1981.

    MathSciNet  MATH  Google Scholar 

  11. D. E. Knuth The Art of Computer Programming Fundamental Algorithms Vol. 1, Third Edition, Addison-Wesley, Reading, MA, 1997.

    Google Scholar 

  12. S. Kumar and E.H. Spafford, A Pattern-Matching Model for Intrusion Detection Proceedings of the National Computer Security Conference 11–21, 1994.

    Google Scholar 

  13. P. Nicodème, B. Salvy, and P. Flajolet, Motif Statistics European Symposium on Algorithms Lecture Notes in Computer Science No. 1643, 194–211, 1999.

    Article  Google Scholar 

  14. M. Régnier and W. Szpankowski, On the Approximate Pattern Occurrences in a Text, Proc. Compression and Complexity of SEQUENCE’97, IEEE Computer Society, 253–264, Positano, 1997.

    Google Scholar 

  15. M. Régnier and W. Szpankowski, On Pattern Frequency Occurrences in a Markovian Sequence Algorithmica 22, 631–649, 1998.

    Article  MathSciNet  MATH  Google Scholar 

  16. I. Rigoutsos, A. Floratos, L. Parida, Y. Gao and D. Platt, The Emergence of Pattern Discovery Techniques in Computational Biology Metabolic Engineering 2, 159–177, 2000.

    Article  Google Scholar 

  17. R. Sedgewick and P. Flajolet An Introduction to the Analysis of Algorithms Addison-Wesley, Reading, MA, 1995.

    Google Scholar 

  18. W. Szpankowski, Average Case Analysis of Algorithms on Sequences, John Wiley & Sons, New York, 2001.

    Book  Google Scholar 

  19. B. Vallée, Dynamical Sources in Information Theory: Fundamental Intervals and Word Prefixes Algorithmica 29, 262–306, 2001.

    Article  MathSciNet  MATH  Google Scholar 

  20. B. Vallée, Dynamical Analysis of a Class of Euclidean Algorithms, to appear in Theoretical Computer Science (2002), also available in Les cahiers du GREYC 2000.

    Google Scholar 

  21. B. Vallée, Digits and Continuants in Euclidean Algorithms. Ergodic Versus Tauberian Theorems, JTNB 12, 531–570, 2000.

    MATH  Google Scholar 

  22. A. Vanet, L. Marsan, and M.-F. Sagot, Promoter sequences and algorithmical methods for identifying them Res. Microbiol. 150, 779–799, 1999.

    Article  Google Scholar 

  23. M. Waterman Introduction to Computational Biology Chapman and Hall, London, 1995.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Basel AG

About this paper

Cite this paper

Bourdon, J., Vallée, B. (2002). Generalized Pattern Matching Statistics. In: Chauvin, B., Flajolet, P., Gardy, D., Mokkadem, A. (eds) Mathematics and Computer Science II. Trends in Mathematics. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-8211-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-0348-8211-8_15

  • Publisher Name: Birkhäuser, Basel

  • Print ISBN: 978-3-0348-9475-3

  • Online ISBN: 978-3-0348-8211-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics