|
NeoBio API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--neobio.alignment.Trie
This class implements a trie, or a digital search tree. A trie is a multiway tree (each node can have multiple children) that represents a set of strings.
Each node contains data encapsulated in an object instance. Each edge spells out a character and each path from the root represents a string described by the characters labelling the traversed edges. Moreover, for each string represented, there is a unique path from the root.
The trie of the following example represents the strings 'a', 'd', 'b', 'ac', 'ba', 'be', 'bd', 'bad' and 'bae'.
[0]
--+--
/ | \
a/ d| \b
[1] [2] [4]
| --+--
| / | \
c| a/ e| d\
[3] [5] [6] [9]
--+--
/ \
d/ e\
[7] [8]
It is easy to see that strings with common prefixes will branch off from each other at the first distinguishing character. This feature makes the trie a good data structure to identify and represent phrases of a text such as the ones induced by the Lempel-Ziv familiy of compression algorithms. For instance, the LZ78 version parses the text into phrases, where each phrase is the longest matching phrase seen previously plus one character.
In this implementation, each node is actually an instance of this class. To build a trie, one must first create the root using the public constructor:
Trie root = new Trie (some_object);
Here some_object
contains any relevant information encapsulated in an
object instance. Typically, that's the only moment the public constructor is used. From
now on, all new nodes will be added as a new child of one existing node using the
add
method:
new_node = any_node.add (some_object, character);
Here character
is the character that will label the edge from
any_node
to new_node
. Note that this transition must not
already exist, otherwise an exception is raised.
To find the longest prefix of a given string, we follow a path from the root down
the tree, character by character, with the spellDown
method:
next_node = root;
while (next_node != null)
{
current_node = next_node;
char c = get next character from somewhere
next_node = current_node.spellDown (c);
}
spellDown
follows the edge out of current_node
labelled by
the character c
and returns the next node. If there is no such a path, it
returns null.
To retrieve the information stored at any node, simply use the getData
method.
In fact, there are many ways to implement a trie. To avoid wasting space with multiple pointers at each node, this implementation uses an approach with a linked list of siblings. Each node actually contains a pointer to one of its children and a pointer to one of its siblings only. Together with the pointers, each node also stores the character that labels the edge to the pointed node.
[0] | a| d b [1]---[2]---[4] | | c| a| e d [3] [5]---[6]---[9] | d| e [7]---[8]
In this way, a trie is similar to a binary tree. Although this implementation is
more efficient in terms of space, the search for a label with a given character leaving
a node n
is no more constant but proportional to the number of children of
n
. In the previous example, it is necessary to traverse three edges to
reach node 9 from node 4 with character d.
This class is used by the FactorSequence to build a linked list of factors of a sequence in a LZ78 fashion, i.e. where each factor is the longest factor previously seen plus one character.
FactorSequence
Field Summary | |
protected java.lang.Object |
data
This node's stored data. |
protected Trie |
sibling
A pointer to this node's next sibling. |
protected Trie |
son
A pointer to the first of this node's children. |
protected char |
to_sibling
The character that labels the edge from this node to the sibling pointer by sibling . |
protected char |
to_son
The character that labels the edge from this node to the child node pointer by son . |
Constructor Summary | |
Trie(java.lang.Object data)
Creates a new trie node with the specified data. |
Method Summary | |
Trie |
add(java.lang.Object data,
char c)
Adds a new child to this node. |
protected Trie |
addSibling(java.lang.Object data,
char c)
Adds a sibling to this node. |
java.lang.Object |
getData()
Returns the data associated with this node. |
Trie |
spellDown(char c)
Follows a path from this node to one of its children by spelling the character supplied as an argument. |
protected Trie |
spellRight(char c)
Follows a path from this node to one of its sibling by spelling the character supplied as an argument. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected Trie son
protected char to_son
son
.
protected Trie sibling
protected char to_sibling
sibling
.
protected java.lang.Object data
Constructor Detail |
public Trie(java.lang.Object data)
add
method.
data
- the data that will be associated with the new nodeMethod Detail |
public java.lang.Object getData()
public Trie add(java.lang.Object data, char c)
data
argument, and the edge from this node to the new node will be
labelled by the character argument. If this node already have an edge labelled with
this character, an exception is raised. Otherwise, the new node created and
returned.
If this node have no child, a new node is created straight away. Otherwise, the task is assigned to its first child that will add the new node as a sibling.
data
- the data that will be associated with the new nodec
- the character that will label the edge from this node to the new node
java.lang.IllegalStateException
- if this node already have an edge labelled by
c
protected Trie addSibling(java.lang.Object data, char c)
data
argument, and the edge from this node to the new node will be
labelled by the character argument. If this node already have a sibling with this
character, an exception is raised. Otherwise, the new node is created and returned.
If this node have no direct sibling, a new node is created straight away. Otherwise, the task is assigned to its next sibling.
data
- the data that will be associated with the new nodec
- the character that will label the edge from this node to the new node
java.lang.IllegalStateException
- if this node already have an edge labelled by
c
public Trie spellDown(char c)
null
is returned.
Otherwise, the reached child node is returned.
If this node's direct child is reached with an edge labelled by the character, it is returned straight away. Otherwise, it is assigned the task of finding another sibling labelled with that character.
c
- the character that labels the path to be followed to this node's child
c
protected Trie spellRight(char c)
null
is returned.
Otherwise, the reached sibling node is returned.
If this node's direct sibling is reached with an edge labelled by the character, it is returned straight away. Otherwise, it is assigned the task of finding another sibling labelled with that character.
c
- the character that labels the path to be followed to the sibling
c
|
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |