Skip to main content
Log in

Pattern Matching with Flexible Wildcards

  • Survey
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Pattern matching with wildcards (PMW) has great theoretical and practical significance in bioinformatics, information retrieval, and pattern mining. Due to the uncertainty of wildcards, not only is the number of all matches exponential with respect to the maximal gap flexibility and the pattern length, but the matching positions in PMW are also hard to choose. The objective to count the maximal number of matches one by one is computationally infeasible. Therefore, rather than solving the generic PMW problem, many research efforts have further defined new problems within PMW according to different application backgrounds. To break through the limitations of either fixing the number or allowing an unbounded number of wildcards, pattern matching with flexible wildcards (PMFW) allows the users to control the ranges of wildcards. In this paper, we provide a survey on the state-of-the-art algorithms for PMFW, with detailed analyses and comparisons, and discuss challenges and opportunities in PMFW research and applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cole J R, Chai B, Farris R J et al. The Ribosomal Database Project (RDP-II): Sequences and tools for high-throughput rRNA analysis. Nucleic Acids Research, 2005, 33(Database Issue): 294–296.

    Article  Google Scholar 

  2. Mendivelso J, Pinzon Y, Lee I. Finding overlaps within regular expressions with variable-length gaps. In Proc. the 2013 Research in Adaptive and Convergent Systems, Oct. 2013, pp.16–21.

  3. Patnaik D, Laxman S, Chandramouli B, Ramakrishnan N. A general streaming algorithm for pattern discovery. Knowledge and Information Systems, 2013, 37(3): 585–610.

    Article  Google Scholar 

  4. Xie F, Wu X, Hu X et al. Sequential pattern mining with wildcards. In Proc. the 22nd IEEE International Conference on Tools with Artificial Intelligence, Oct. 2010, pp.241–247.

  5. Ding B, Lo D, Han J, Khoo S. E±cient mining of closed repet-itive gapped subsequences from a sequence database. In Proc. the 25th IEEE International Conference on Data Engineering, Mar. 29–April 2, 2009, pp.1024–1035.

  6. El-Ramly M, Stroulia E, Sorenson P. From run-time behavior to usage scenarios: An interaction-pattern mining approach. In Proc. the 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, July 2002, pp.315–324.

  7. Manber U, Baeza-Yates R. An algorithm for string matching with a sequence of don’t cares. Information Processing Letters, 1991, 37(3): 133–136.

    Article  MathSciNet  MATH  Google Scholar 

  8. de Pablo-Sánchez C, Segura-Bedmar I, Martínez P, Iglesias-Maqueda A. Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining. Knowledge and Information Systems, 2013, 35(1): 87–109.

    Article  Google Scholar 

  9. Barbieri N, Bonchi F, Manco G. Topic-aware social influence propagation models. Knowledge and Information Systems, 2013, 37(3): 555–584.

    Article  Google Scholar 

  10. Wei Y, Dominique F, Jean-Paul B. An automatic keyphrase extraction system for scientific documents. Knowledge and Information Systems, 2013, 34(3): 691–724.

    Article  Google Scholar 

  11. Fischer M J, Paterson M S. String matching and other products. Technical Report, Massachusetts Institute of Technology, 1974.

  12. Muthukrishnan S, Palem K. Non-standard stringology: Algorithms and complexity. In Proc. the 26th Annual ACM Symposium on Theory of Computing, May 1994, pp.770–779.

  13. Indyk P. Faster algorithms for string matching problems: Matching the convolution bound. In Proc. the 39th Symp. Foundations of Computer Science, Nov. 1998, pp.166–173.

  14. Clifford P, Clifford R. Simple deterministic wildcard matching. Information Processing Letters, 2007, 101(2): 53–54.

    Article  MathSciNet  MATH  Google Scholar 

  15. Cole R, Hariharan R. Verifying candidate matches in sparse and wildcard matching. In Proc. the 34th Annual ACM Symposium on Theory of Computing, May 2002, pp.592–601.

  16. Guo D, Hu X, Xie F, Wu X. Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Applied Intelligence, 2013, 39(1): 57–74.

    Article  Google Scholar 

  17. Navarro G, Raffinot M. Fast and simple character classes and bounded gaps pattern matching, with application to protein searching. In Proc. the 5th Annual International Conference on Computational Biology, April 2001, pp.231–240.

  18. Morgante M, Policriti A, Vitacolonna N, Zuccolo A. Structured motifs search. Journal of Computational Biology, 2005, 12(8): 1065–1082.

    Article  Google Scholar 

  19. Cole R, Gottlieb L, Lewenstein M. Dictionary matching and indexing with errors and don’t cares. In Proc. the 36th Annual ACM Symposium on the Theory of Computing, June 2004, pp.91–100.

  20. Kalai A. Efficient pattern-matching with don’t cares. In Proc. the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2002, pp.655–656.

  21. Haapasalo T, Silvasti P, Sippu S et al. Online dictionary matching with variable-length gaps. In Proc. the 10th Int. Conf. Experimental Algorithms, May 2011, pp.76–87.

  22. Kucherov G, Rusinowitch M. Matching a set of strings with variable length don’t cares. Theoretical Computer Science, 1997, 178(1/2): 129–154.

    Article  MathSciNet  MATH  Google Scholar 

  23. Zhang M, Zhang Y, Hu L. A faster algorithm for matching a set of patterns with variable length don’t cares. Information Processing Letter, 2010, 110(6): 216–220.

    Article  MATH  Google Scholar 

  24. Wu X, Zhu X, He Y, Arslan A N. PMBC: Pattern mining from biological sequences with wildcard constraints. Computers in Biology and Medicine, 2013, 43(5): 481–492.

    Article  Google Scholar 

  25. Rahman M S, Iliopoulos C S, Lee I et al. Finding patterns with variable length gaps or don’t cares. In Proc. the 12th Annual International Computing and Combinatorics Conference, August 2006, pp.146–155.

  26. Bille P, Gørtz I L, Vildhøj H W, Wind D K. String matching with variable length gaps. Theoretical Computer Science, 2012, 443(20): 25–34.

    Article  MathSciNet  MATH  Google Scholar 

  27. Min F, Wu X, Lu Z. Pattern matching with independent wildcard gaps. In Proc. the 8th IEEE Int. Conf. Dependable, Autonomic and Secure Computing, December 2009, pp.194–199.

  28. Zhu X, Wu X. Mining complex patterns across sequences with gap requirements. In Proc. the 20th Int. Joint Conf. Artificial Intelligence, January 2007, pp.2934–2940.

  29. Chen G, Wu X, Zhu X, Arslan A, He Y. Efficient string matching with wildcards and length constraints. Knowledge and Information Systems, 2006, 10(4): 399–419.

    Article  Google Scholar 

  30. Guo D, Hong X, Hu X et al. A bit-parallel algorithm for sequential pattern matching with wildcards. Cybernetics and Systems, 2011, 42(6): 382–401.

    Article  Google Scholar 

  31. Lin P C, Li Z X, Lin Y D et al. Profiling and accelerating string matching algorithms in three network content security applications. IEEE Communications Surveys and Tutorials, 2006, 8(2): 24–37.

    Article  Google Scholar 

  32. Aho A V, Corasick M J. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 1975, 18(6): 333–340.

    Article  MathSciNet  MATH  Google Scholar 

  33. Tuck N, Sherwood T, Calder B, Varghese G. Deterministic memory-efficient string matching algorithms for intrusion detection. In Proc. the 23rd Annual Joint Conference of the IEEE Computer and Communications Societies, March 2004, pp.2628–2639.

  34. Norton M. Optimizing pattern matching for intrusion detection. http://pdf.aminer.org/000/309/890/optimizing pattern match.pdf, July 2014.

  35. Boyer R S, Moore J S. A fast string searching algorithm. Communications of the ACM, 1977, 20(10): 762–772.

    Article  MATH  Google Scholar 

  36. Wu S, Manber U. A fast algorithm for multi-pattern searching. Technical Report TR-94-17, University of Arizona, 1994.

  37. Muth R, Manber U. Approximate multiple string search. In Proc. the 7th Annual Symposium on Combinatorial Pattern Matching (CPM), June 1996, pp.75–86.

  38. Karp R M, Rabin M O. Efficient randomized pattern matching algorithms. IBM Journal of Research and Development, 1987, 31(2): 249–260.

    Article  MathSciNet  MATH  Google Scholar 

  39. Baeza-Yates R, Gonnet G H. A new approach to text searching. Communications of the ACM, 1992, 35(10): 74–82.

    Article  Google Scholar 

  40. Navarro G, Raffinot M. A bit-parallel approach to suffix automata: Fast extended string matching. In Proc. the 9th Annual Symp. Combinatorial Pattern Matching, July 1998, pp.14–33.

  41. Navarro G. A guided tour to approximate string matching. ACM Computing Surveys, 2001, 33(1): 31–88.

    Article  Google Scholar 

  42. Kim S, Kim Y. A fast multiple string pattern matching algorithm. In Proc. the 17th AoM/IAoM Conference on Computer Science, August 1999.

  43. Agrawal R, Srikant R. Mining sequential patterns. In Proc. the 11th Int. Conf. Data Engineering, March 1995, pp.3–14.

  44. Akutsu T. Approximate string matching with variable length don’t care characters. Information Processing Letters, 1995, 55(5): 235–239.

    Article  MathSciNet  Google Scholar 

  45. Lee I, Apostolico A, Iliopoulos C S, Park K. Finding approximate occurrences of a pattern that contains gaps. In Proc. the 14th Australasian Workshop on Combinatorial Algorithms, July 2003, pp.89–100.

  46. Zhang M, Kao B, Cheung D W, Yip K. Mining periodic patterns with gap requirement from sequences. In Proc. the ACM SIGMOD International Conference on Management of Data, June 2005, pp.623–633.

  47. Min F, Wu X. A comparative study of pattern matching algorithms on sequences. In Proc. the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Dec. 2009, pp.510–517.

  48. Wang H, Xie F, Hu X, Li P, Wu X. Pattern matching with exible wildcards and recurring characters. In Proc. the 2010 IEEE International Conference on Granular Computing, Aug. 2010, pp.782–786.

  49. Wu Y, Wu X, Jiang H, Min F. A heuristic algorithm for MP-MGOOC. Chinese Journal of Computers, 2011, 34(8): 1452–1462. (in Chinese)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xindong Wu.

Additional information

This paper is supported in part by the National Natural Science Foundation of China under Grant Nos. 61229301 and 60828005, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education, China, under Grant No. IRT13059, and the National Science Foundation (NSF) of USA under Grant No. 0514819.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 217 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, X., Qiang, JP. & Xie, F. Pattern Matching with Flexible Wildcards. J. Comput. Sci. Technol. 29, 740–750 (2014). https://doi.org/10.1007/s11390-014-1464-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1464-3

Keywords

Navigation