LATIN 2018: LATIN 2018: Theoretical Informatics pp 290-302

# Property Suffix Array with Applications

• Panagiotis Charalampopoulos
• Costas S. Iliopoulos
• Chang Liu
• Solon P. Pissis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10807)

## Abstract

The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property $$\varPi$$, i.e. a set of $$\varPi$$-valid intervals over x. A suffix-tree-like index over these valid prefixes of suffixes of x can be built in time and space $$\mathcal {O}(n)$$. We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space $$\mathcal {O}(n)$$. We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold $$\frac{1}{z}$$, we say that a string p of length m matches a weighted sequence X of length n at starting position i if the product of probabilities of the letters of p at positions $$i,\ldots ,i+m-1$$ in X is at least $$\frac{1}{z}$$. Our algorithm for building the PSA can be directly applied to build an $$\mathcal {O}(nz)$$-sized suffix-array-like index over X in time and space $$\mathcal {O}(nz)$$.

## References

1. 1.
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)
2. 2.
Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)
3. 3.
Alzamel, M., Charalampopoulos, P., Iliopoulos, C.S., Pissis, S.P.: How to answer a small batch of RMQs or LCA queries in practice. In: IWOCA. LNCS. Springer International Publishing (2017, in press)Google Scholar
4. 4.
Amir, A., Chencinski, E., Iliopoulos, C., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. Theor. Comput. Sci. 395(2–3), 298–310 (2008)
5. 5.
Barton, C., Kociumaka, T., Liu, C., Pissis, S.P., Radoszewski, J.: Indexing weighted sequences: neat and efficient. CoRR abs/1704.07625v1 (2017)Google Scholar
6. 6.
Barton, C., Kociumaka, T., Liu, C., Pissis, S.P., Radoszewski, J.: Indexing weighted sequences: neat and efficient. CoRR abs/1704.07625v2 (2017)Google Scholar
7. 7.
Barton, C., Kociumaka, T., Pissis, S.P., Radoszewski, J.: Efficient index for weighted sequences. In: CPM. LIPIcs, vol. 54, pp. 4:1–4:13. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016)Google Scholar
8. 8.
Barton, C., Liu, C., Pissis, S.P.: On-line pattern matching on uncertain sequences and applications. In: Chan, T.-H.H., Li, M., Wang, L. (eds.) COCOA 2016. LNCS, vol. 10043, pp. 547–562. Springer, Cham (2016).
9. 9.
Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000).
10. 10.
Biswas, S., Patil, M., Thankachan, S.V., Shah, R.: Probabilistic threshold indexing for uncertain strings. In: EDBT. pp. 401–412 (2016). OpenProceedings.org
11. 11.
Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms, 2nd edn. McGraw-Hill Higher Education, Pennsylvania (2001)
12. 12.
Crochemore, M., Iliopoulos, C., Kubica, M., Radoszewski, J., Rytter, W., Stencel, K., Walen, T.: New simple efficient algorithms computing powers and runs in strings. Discrete Appl. Math. 163(Part 3), 258–267 (2014)
13. 13.
Gabow, H.N., Tarjan, R.E.: A linear-time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci. 30(2), 209–221 (1985)
14. 14.
Iliopoulos, C.S., Rahman, M.S.: Faster index for property matching. Inf. Process. Lett. 105(6), 218–223 (2008)
15. 15.
Juan, M.T., Liu, J.J., Wang, Y.L.: Errata for “faster index for property matching”. Inf. Process. Lett. 109(18), 1027–1029 (2009)
16. 16.
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A. (ed.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001).
17. 17.
Kociumaka, T., Pissis, S.P., Radoszewski, J.: Pattern matching and consensus problems on weighted sequences and profiles. In: ISAAC. LIPIcs, vol. 64, pp. 46:1–46:12. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016)Google Scholar
18. 18.
Kociumaka, T., Pissis, S.P., Radoszewski, J., Rytter, W., Walen, T.: Efficient algorithms for shortest partial seeds in words. Theor. Comput. Sci. 710, 139–147 (2018)
19. 19.
Kopelowitz, T.: The property suffix tree with dynamic properties. Theor. Comput. Sci. 638(C), 44–51 (2016)
20. 20.
Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
21. 21.
Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: DCC, pp. 193–202. IEEE (2009)Google Scholar
22. 22.
Weiner, P.: Linear pattern matching algorithms. In: SWAT (FOCS), pp. 1–11. IEEE Computer Society (1973)Google Scholar

© Springer International Publishing AG, part of Springer Nature 2018

## Authors and Affiliations

• Panagiotis Charalampopoulos
• 1
• Costas S. Iliopoulos
• 1
• Chang Liu
• 1
• Solon P. Pissis
• 1
1. 1.Department of InformaticsKing’s College LondonLondonUK