LATIN 2018: LATIN 2018: Theoretical Informatics pp 290-302

# Property Suffix Array with Applications

Conference paper
## Abstract

The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property $$\varPi$$, i.e. a set of $$\varPi$$-valid intervals over x. A suffix-tree-like index over these valid prefixes of suffixes of x can be built in time and space $$\mathcal {O}(n)$$. We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space $$\mathcal {O}(n)$$. We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold $$\frac{1}{z}$$, we say that a string p of length m matches a weighted sequence X of length n at starting position i if the product of probabilities of the letters of p at positions $$i,\ldots ,i+m-1$$ in X is at least $$\frac{1}{z}$$. Our algorithm for building the PSA can be directly applied to build an $$\mathcal {O}(nz)$$-sized suffix-array-like index over X in time and space $$\mathcal {O}(nz)$$.

## Authors and Affiliations

