language-icon Old Web
English
Sign In

Suffix array

In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used, among others, in full text indices, data compression algorithms and within the field of bibliometrics. In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used, among others, in full text indices, data compression algorithms and within the field of bibliometrics. Suffix arrays were introduced by Manber & Myers (1990) as a simple, space efficient alternative to suffix trees. They had independently been discovered by Gaston Gonnet in 1987 under the name PAT array (Gonnet, Baeza-Yates & Snider 1992). Li, Li & Huo (2016) gave the first in-place O ( n ) {displaystyle {mathcal {O}}(n)} time suffix array construction algorithm that is optimal both in time and space, where in-place means that the algorithm only needs O ( 1 ) {displaystyle {mathcal {O}}(1)} additional space beyond the input string and the output suffix array. Enhanced suffix arrays (ESAs) are suffix arrays with additional tables that reproduce the full functionality of suffix trees preserving the same time and memory complexity.The suffix array for a subset of all suffixes of a string is called sparse suffix array. Multiple probabilistic algorithms have been developed to minimize the additional memory usage including an optimal time and memory algorithm. Let S = S [ 1 ] S [ 2 ] . . . S [ n ] {displaystyle S=SS...S} be a string and let S [ i , j ] {displaystyle S} denote the substring of S {displaystyle S} ranging from i {displaystyle i} to j {displaystyle j} . The suffix array A {displaystyle A} of S {displaystyle S} is now defined to be an array of integers providing the starting positions of suffixes of S {displaystyle S} in lexicographical order. This means, an entry A [ i ] {displaystyle A} contains the starting position of the i {displaystyle i} -th smallest suffix in S {displaystyle S} and thus for all 1 < i ≤ n {displaystyle 1<ileq n} : S [ A [ i − 1 ] , n ] < S [ A [ i ] , n ] {displaystyle S,n]<S,n]} . Each suffix of S {displaystyle S} shows up in A {displaystyle A} exactly once. Note suffixes are simple strings. These strings are sorted (as in a paper dictionary), before their starting positions (integer indices) are saved in A {displaystyle A} . Consider the text S {displaystyle S} =banana$ to be indexed: The text ends with the special sentinel letter $ that is unique and lexicographically smaller than any other character. The text has the following suffixes:

[ "Time complexity", "Data structure", "Suffix", "C++ string handling", "String (computer science)", "LCP array" ]
Parent Topic
Child Topic
    No Parent Topic