Skip to main content

Hidden Pattern Statistics

  • Conference paper
  • First Online:
Automata, Languages and Programming (ICALP 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2076))

Included in the following conference series:

Abstract

We consider the sequence comparison problem, also known as “hidden pattern” problem, where one searches for a given subsequence in a text (rather than a string understood as a sequence of consecutive symbols). A characteristic parameter is the number of occurrences of a given pattern w of length m as a subsequence in a random text of length n generated by a memoryless source. Spacings between letters of the pattern may either be constrained or not in order to define valid occurrences. We determine the mean and the variance of the number of occurrences, and establish a Gaussian limit law. These results are obtained via combinatorics on words, formal language techniques, and methods of analytic combinatorics based on generating functions and convergence of moments. The motivation to study this problem comes from an attempt at finding a reliable threshold for intrusion detections, from textual data processing applications, and from molecular biology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Aczel, The Mystery of the Aleph. Mathematics, the Kabbalah, and the Search for Infinity, Four Walls Eight Windows, New York, 2000.

    MATH  Google Scholar 

  2. A. Apostolico and M. Atallah, Compact Recognizers of Episode Sequences, Submitted to Information and Computation.

    Google Scholar 

  3. E. Bender and F. Kochman, The Distribution of Subword Counts is Usually Normal, European Journal of Combinatorics, 14, 265–275, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  4. P. Billingsley, Probability and Measure, Second Edition, John Wiley & Sons, New York, 1986.

    MATH  Google Scholar 

  5. L. Boasson, P. Cegielski, I. Guessarian, and Yuri Matiyasevich, Window-Accumulated Subsequence Matching Problem is Linear, In Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems: PODS 1999, ACM Press, 327–336, 1999.

    Google Scholar 

  6. J. Clément, P. Flajolet, and B. Vallée, Dynamical Sources in Information Theory: A General Analysis of Trie Structures, Algorithmica, 29, 307–369, 2001.

    Article  MATH  MathSciNet  Google Scholar 

  7. M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press, New York, 1994.

    MATH  Google Scholar 

  8. G. Das, R. Fleischer, L. Gasieniec, D. Gunopulos, and J. Kärkkäinen, Episode Matching, In Combinatorial Pattern Matching, 8th Annual Symposium, Lecture Notes in Computer Science vol. 1264, 12–27, 1997.

    Google Scholar 

  9. L. Guibas and A. M. Odlyzko, Periods in Strings, J. Combinatorial Theory Ser. A, 30, 19–43, 1981.

    Article  MATH  MathSciNet  Google Scholar 

  10. L. Guibas and A. M. Odlyzko, String Overlaps, Pattern Matching, and Nontransitive Games, J. Combinatorial Theory Ser. A, 30, 183–208, 1981.

    Article  MATH  MathSciNet  Google Scholar 

  11. Y. Guivarc’h, Marches aléatoires sur les groupes, Fascicule de probabilités, Publ. Inst. Rech. Math. Rennes, 2000.

    Google Scholar 

  12. D. E. Knuth, The Art of Computer Programming, Fundamental Algorithms, Vol. 1, Third Edition, Addison-Wesley, Reading, MA, 1997.

    Google Scholar 

  13. G. Kucherov and M. Rusinowitch, Matching a Set of Strings with Variable Length Don’t Cares, Theoretical Computer Science 178, 129–154, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  14. S. Kumar and E.H. Spafford, A Pattern-Matching Model for Intrusion Detection, Proceedings of the National Computer Security Conference, 11–21, 1994.

    Google Scholar 

  15. P. Nicodème, B. Salvy, and P. Flajolet, Motif Statistics, European Symposium on Algorithms, Lecture Notes in Computer Science, No. 1643, 194–211, 1999.

    Google Scholar 

  16. M. Régnier and W. Szpankowski, On the Approximate Pattern Occurrences in a Text, Proc. Compression and Complexity of SEQUENCE’97, IEEE Computer Society, 253–264, Positano, 1997.

    Google Scholar 

  17. M. Règnier and W. Szpankowski, On Pattern Frequency Occurrences in a Markovian Sequence, Algorithmica, 22, 631–649, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  18. I. Rigoutsos, A. Floratos, L. Parida, Y. Gao and D. Platt, The Emergence of Pattern Discovery Techniques in Computational Biology, Metabolic Engineering, 2, 159–177, 2000.

    Article  Google Scholar 

  19. R. Sedgewick and P. Flajolet, An Introduction to the Analysis of Algorithms, Addison-Wesley, Reading, MA, 1995.

    Google Scholar 

  20. J. M. Steele, Probability Theory and Combinatorial Optimization, SIAM, Philadelphia, 1997.

    MATH  Google Scholar 

  21. W. Szpankowski, Average Case Analysis of Algorithms on Sequences, John Wiley & Sons, New York, 2001.

    MATH  Google Scholar 

  22. B. Vallépe, Dynamical Sources in Information Theory: Fundamental Intervals and Word Prefixes, Algorithmica, 29, 262–306, 2001.

    Article  MathSciNet  Google Scholar 

  23. A. Vanet, L. Marsan, and M.-F. Sagot, Promoter sequences and algorithmical methods for identifying them, Res. Microbiol., 150, 779–799, 1999.

    Article  Google Scholar 

  24. M. Waterman, Introduction to Computational Biology, Chapman and Hall, London, 1995.

    MATH  Google Scholar 

  25. A. Wespi, H. Debar, M. Dacier, and M. Nassehi, Fixed vs. Variable-Length Patterns For Detecting Suspicious Process Behavior, J. Computer Security, 8, 159–181, 2000.

    Google Scholar 

  26. S. Wu and U. Manber, Fast Text Searching Allowing Errors, Comm. ACM, 35:10, 83–991, 1995.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Flajolet, P., Guivarc’h, Y., Szpankowski, W., Vallée, B. (2001). Hidden Pattern Statistics. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds) Automata, Languages and Programming. ICALP 2001. Lecture Notes in Computer Science, vol 2076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48224-5_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-48224-5_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42287-7

  • Online ISBN: 978-3-540-48224-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics