Hidden Pattern Statistics

Flajolet, Philippe; Guivarc’h, Yves; Szpankowski, Wojciech; Vallée, Brigitte

doi:10.1007/3-540-48224-5_13

Philippe Flajolet⁷,
Yves Guivarc’h⁸,
Wojciech Szpankowski⁹ &
…
Brigitte Vallée¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2076))

Included in the following conference series:

International Colloquium on Automata, Languages, and Programming

1353 Accesses
6 Citations

Abstract

We consider the sequence comparison problem, also known as “hidden pattern” problem, where one searches for a given subsequence in a text (rather than a string understood as a sequence of consecutive symbols). A characteristic parameter is the number of occurrences of a given pattern w of length m as a subsequence in a random text of length n generated by a memoryless source. Spacings between letters of the pattern may either be constrained or not in order to define valid occurrences. We determine the mean and the variance of the number of occurrences, and establish a Gaussian limit law. These results are obtained via combinatorics on words, formal language techniques, and methods of analytic combinatorics based on generating functions and convergence of moments. The motivation to study this problem comes from an attempt at finding a reliable threshold for intrusion detections, from textual data processing applications, and from molecular biology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Aczel, The Mystery of the Aleph. Mathematics, the Kabbalah, and the Search for Infinity, Four Walls Eight Windows, New York, 2000.
MATH Google Scholar
A. Apostolico and M. Atallah, Compact Recognizers of Episode Sequences, Submitted to Information and Computation.
Google Scholar
E. Bender and F. Kochman, The Distribution of Subword Counts is Usually Normal, European Journal of Combinatorics, 14, 265–275, 1993.
Article MATH MathSciNet Google Scholar
P. Billingsley, Probability and Measure, Second Edition, John Wiley & Sons, New York, 1986.
MATH Google Scholar
L. Boasson, P. Cegielski, I. Guessarian, and Yuri Matiyasevich, Window-Accumulated Subsequence Matching Problem is Linear, In Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems: PODS 1999, ACM Press, 327–336, 1999.
Google Scholar
J. Clément, P. Flajolet, and B. Vallée, Dynamical Sources in Information Theory: A General Analysis of Trie Structures, Algorithmica, 29, 307–369, 2001.
Article MATH MathSciNet Google Scholar
M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press, New York, 1994.
MATH Google Scholar
G. Das, R. Fleischer, L. Gasieniec, D. Gunopulos, and J. Kärkkäinen, Episode Matching, In Combinatorial Pattern Matching, 8th Annual Symposium, Lecture Notes in Computer Science vol. 1264, 12–27, 1997.
Google Scholar
L. Guibas and A. M. Odlyzko, Periods in Strings, J. Combinatorial Theory Ser. A, 30, 19–43, 1981.
Article MATH MathSciNet Google Scholar
L. Guibas and A. M. Odlyzko, String Overlaps, Pattern Matching, and Nontransitive Games, J. Combinatorial Theory Ser. A, 30, 183–208, 1981.
Article MATH MathSciNet Google Scholar
Y. Guivarc’h, Marches aléatoires sur les groupes, Fascicule de probabilités, Publ. Inst. Rech. Math. Rennes, 2000.
Google Scholar
D. E. Knuth, The Art of Computer Programming, Fundamental Algorithms, Vol. 1, Third Edition, Addison-Wesley, Reading, MA, 1997.
Google Scholar
G. Kucherov and M. Rusinowitch, Matching a Set of Strings with Variable Length Don’t Cares, Theoretical Computer Science 178, 129–154, 1997.
Article MATH MathSciNet Google Scholar
S. Kumar and E.H. Spafford, A Pattern-Matching Model for Intrusion Detection, Proceedings of the National Computer Security Conference, 11–21, 1994.
Google Scholar
P. Nicodème, B. Salvy, and P. Flajolet, Motif Statistics, European Symposium on Algorithms, Lecture Notes in Computer Science, No. 1643, 194–211, 1999.
Google Scholar
M. Régnier and W. Szpankowski, On the Approximate Pattern Occurrences in a Text, Proc. Compression and Complexity of SEQUENCE’97, IEEE Computer Society, 253–264, Positano, 1997.
Google Scholar
M. Règnier and W. Szpankowski, On Pattern Frequency Occurrences in a Markovian Sequence, Algorithmica, 22, 631–649, 1998.
Article MATH MathSciNet Google Scholar
I. Rigoutsos, A. Floratos, L. Parida, Y. Gao and D. Platt, The Emergence of Pattern Discovery Techniques in Computational Biology, Metabolic Engineering, 2, 159–177, 2000.
Article Google Scholar
R. Sedgewick and P. Flajolet, An Introduction to the Analysis of Algorithms, Addison-Wesley, Reading, MA, 1995.
Google Scholar
J. M. Steele, Probability Theory and Combinatorial Optimization, SIAM, Philadelphia, 1997.
MATH Google Scholar
W. Szpankowski, Average Case Analysis of Algorithms on Sequences, John Wiley & Sons, New York, 2001.
MATH Google Scholar
B. Vallépe, Dynamical Sources in Information Theory: Fundamental Intervals and Word Prefixes, Algorithmica, 29, 262–306, 2001.
Article MathSciNet Google Scholar
A. Vanet, L. Marsan, and M.-F. Sagot, Promoter sequences and algorithmical methods for identifying them, Res. Microbiol., 150, 779–799, 1999.
Article Google Scholar
M. Waterman, Introduction to Computational Biology, Chapman and Hall, London, 1995.
MATH Google Scholar
A. Wespi, H. Debar, M. Dacier, and M. Nassehi, Fixed vs. Variable-Length Patterns For Detecting Suspicious Process Behavior, J. Computer Security, 8, 159–181, 2000.
Google Scholar
S. Wu and U. Manber, Fast Text Searching Allowing Errors, Comm. ACM, 35:10, 83–991, 1995.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Algorithms Project, INRIA-Rocquencourt, 78153, Le Chesnay, France
Philippe Flajolet
IRMAR, Université de Rennes I, F-35042, Rennes Cedex, France
Yves Guivarc’h
Dept. Computer Science, Purdue University, W. Lafayette, IN, 47907, USA
Wojciech Szpankowski
GREYC, Université de Caen, F-14032, Caen Cedex, France
Brigitte Vallée

Authors

Philippe Flajolet
View author publications
You can also search for this author in PubMed Google Scholar
Yves Guivarc’h
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Szpankowski
View author publications
You can also search for this author in PubMed Google Scholar
Brigitte Vallée
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departament de Llenguatges i Sistemes Informátics, Univ. Politècnica de Catalunya, C/Jordi Girona Salgado 1-3, 08034, Barcelona, Spain
Fernando Orejas
Computer Technology Institute (CTI), University of Patras, 61 Riga Feraiou street, 26221, Patras, Greece
Paul G. Spirakis
Institute of Information and Computing Sciences, Utrecht University, Padualaan 14, 3584, CH, Utrecht, The Netherlands
Jan van Leeuwen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Flajolet, P., Guivarc’h, Y., Szpankowski, W., Vallée, B. (2001). Hidden Pattern Statistics. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds) Automata, Languages and Programming. ICALP 2001. Lecture Notes in Computer Science, vol 2076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48224-5_13

Download citation

DOI: https://doi.org/10.1007/3-540-48224-5_13
Published: 04 July 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42287-7
Online ISBN: 978-3-540-48224-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics