Advertisement

p-Suffix Sorting as Arithmetic Coding

  • Richard Beal
  • Donald Adjeroh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7056)

Abstract

The challenge of direct parameterized suffix sorting (p-suffix sorting) for a parameterized string (p-string) is the dynamic nature of parameterized suffixes (p-suffixes). In this work, we propose transformative approaches to direct p-suffix sorting by generating and sorting lexicographically numeric fingerprints and arithmetic codes that correspond to individual p-suffixes. Our algorithm to p-suffix sort via fingerprints is the first theoretical linear time algorithm for p-suffix sorting for non-binary parameter alphabets, which assumes that each code is represented by a practical integer. We eliminate the key problems of fingerprints by introducing an algorithm that exploits the ordering of arithmetic codes to sort p-suffixes in linear time on average.

Keywords

parameterized suffix array parameterized suffix sorting arithmetic coding fingerprints p-string p-match 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baker, B.: A theory of parameterized pattern matching: Algorithms and applications. In: STOC 1993, pp. 71–80. ACM, New York (1993)Google Scholar
  2. 2.
    Shibuya, T.: Generalization of a suffix tree for RNA structural pattern matching. Algorithmica 39(1), 1–19 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Zeidman, B.: Software v. software. IEEE Spectr. 47, 32–53 (2010)CrossRefGoogle Scholar
  4. 4.
    Idury, R., Schäffer, A.: Multiple matching of parameterized patterns. Theor. Comput. Sci. 154, 203–224 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Amir, A., Farach, M., Muthukrishnan, S.: Alphabet dependence in parameterized matching. Inf. Process. Lett. 49, 111–115 (1994)CrossRefzbMATHGoogle Scholar
  6. 6.
    Baker, B.: Parameterized pattern matching by Boyer-Moore-type algorithms. In: SODA 1995, pp. 541–550. ACM, Philadelphia (1995)Google Scholar
  7. 7.
    Tomohiro, I., Deguchi, S., Bannai, H., Inenaga, S., Takeda, M.: Lightweight Parameterized Suffix Array Construction. In: Fiala, J., Kratochvíl, J., Miller, M. (eds.) IWOCA 2009. LNCS, vol. 5874, pp. 312–323. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Deguchi, S., Higashijima, F., Bannai, H., Inenaga, S., Takeda, M.: Parameterized suffix arrays for binary strings. In: PSC 2008, Czech Republic, pp. 84–94 (2008)Google Scholar
  9. 9.
    Adjeroh, D., Bell, T., Mukherjee, A.: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays and Pattern Matching. Springer, New York (2008)CrossRefGoogle Scholar
  10. 10.
    Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefzbMATHGoogle Scholar
  11. 11.
    Smyth, W.: Computing Patterns in Strings. Pearson, New York (2003)Google Scholar
  12. 12.
    Kosaraju, S.: Faster algorithms for the construction of parameterized suffix trees. In: FOCS 1995, pp. 631–637. ACM, Washington, DC (1995)Google Scholar
  13. 13.
    Cole, R., Hariharan, R.: Faster suffix tree construction with missing suffix links. In: STOC 2000, pp. 407–415. ACM, New York (2000)Google Scholar
  14. 14.
    Lee, T., Na, J.C., Park, K.: On-Line Construction of Parameterized Suffix Trees. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 31–38. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 935–948 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Beal, R.: Parameterized Strings: Algorithms and Data Structures. MS Thesis. West Virginia University (2011)Google Scholar
  17. 17.
    Karp, R., Rabin, M.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31, 249–260 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Moffat, A., Neal, R., Witten, I.: Arithmetic coding revisited. ACM Trans. Inf. Syst. 16, 256–294 (1995)CrossRefGoogle Scholar
  19. 19.
    Cover, T., Thomas, J.: Elements of Information Theory. Wiley (1991)Google Scholar
  20. 20.
    Adjeroh, D., Nan, F.: Suffix sorting via Shannon-Fano-Elias codes. Algorithms 3(2), 145–167 (2010)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Karlin, S., Ghandour, G., et al.: New approaches for computer analysis of nucleic acid sequences. PNAS 80(18), 5660–5664 (1983)CrossRefzbMATHGoogle Scholar
  22. 22.
    Devroye, L., Szpankowski, W., Rais, B.: A note on the height of suffix trees. SIAM J. Comput. 21, 48–53 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Franceschini, G., Muthukrishnan, S.: In-Place Suffix Sorting. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 533–545. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  24. 24.
    Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM. 53, 918–936 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmca 40, 33–50 (2004)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Richard Beal
    • 1
  • Donald Adjeroh
    • 1
  1. 1.Lane Department of Computer Science and Electrical EngineeringWest Virginia UniversityMorgantownUnited States

Personalised recommendations