Jump to content

Generalized suffix tree

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Nils Grimsmo (talk | contribs) at 16:27, 12 November 2005. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

A generalised suffix tree is a suffix tree for a set of strings. Given the set of strings of total length , it is a Patricia trie containing all suffixes of the strings. It is an index structure which gives you substring search. It is mostly used in bioinformatics.

It can be built in time and space, and you can use to find all occurrences of a string of length in time, which is assymptotically optimal. (This is under the assumption that the size of the alphabet is viewed as a constant. Otherwise, the running-times depend on the implementation.)

An alternative to building a generalised suffix tree, is to concatenate the strings, and build a regular suffix tree for the resulting string. When you evaluate the hits after a search, you map the global positions into documents and local positions with some algorithm and/or data structure, such as a binary search in the starting/ending positions of the documents.


Refereces

  • Paul Bieganski, John Riedl, John Carlis, and Ernest F. Retzel. Generalized Suffix Trees for Biological Sequence Data. System Sciences, 1994. Vol.V: Biotechnology Computing, Proceedings of the Twenty-Seventh Hawaii International Conference on.