|
NeoBio API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--neobio.alignment.FactorSequence
This class builds a list of factors of a character sequence as induced by its Lempel-Ziv (LZ78) factorisation. Each factor is enconded as the longest factor previously seen plus one character.
The input can come from any source, provided it is encapsulated in a proper
Reader
instance. The stream is expected to be ready (i.e. the next
read
operation must return the first character of the sequence) and it is
not closed when its end is reached, so the client is allowed to reset it and maybe use
it for another purpose.
Sequences can contain letters only although lines started with the
COMMENT_CHAR
character ('>') are regarded as comments and are completely
skipped. White spaces (including tabs, line feeds and carriage returns) are also
ignored throughout.
This class uses a Trie to keep track of a list of factors. Each node of the trie contains a Factor of the text. As the sequence is read from the input, the trie is traversed as far as possible. When a leaf node is reached (which means that the longest prefix of the input has been found), two tasks are accomplished:
Factor
is created with the character at the current position of
the input and the leaf node's factor;
Each factor also receives a serial number according to the order they are found and a pointer to the next factor (in that order) for fast access. This pointer, together with the factor's ancestor pointer forms a doubly-linked list of factors. The original text can then be reconstructed simply by following the linked list and writing out its factors.
As an example, the sequence ACTAAACCGCATTAATAATAAAA
is parsed into the
following 12 factors:
0 ( , ) = empty
1 (0,A) = A
2 (0,C) = C
3 (0,T) = T
4 (1,A) = AA
5 (1,C) = AC
6 (2,G) = CG
7 (2,A) = CA
8 (3,T) = TT
9 (4,T) = AAT
10 (9,A) = AATA
11 (4,A) = AAA
serial # (prefix, new char) = factor text
This class is used by CrochemoreLandauZivUkelson algorithm to speed up the classic dynamic programming approach to sequence alignment.
Factor
,
Trie
,
CrochemoreLandauZivUkelson
Field Summary | |
protected static char |
COMMENT_CHAR
The character used to start a comment line in a sequence file. |
protected int |
num_chars
The numbers of character represented by this sequence. |
protected int |
num_factors
The numbers of factors generated by the LZ78 parsing of the sequence. |
protected Factor |
root_factor
A pointer to the root factor, the one that starts the list of factors. |
Constructor Summary | |
FactorSequence(java.io.Reader reader)
Creates a new instance of a FactorSequence , loading the sequence data
from the Reader input stream. |
Method Summary | |
Factor |
getRootFactor()
Returns the root factor, the one that starts the list of factors. |
int |
numChars()
Returns the number of characters of the original sequence. |
int |
numFactors()
Returns the number of factors produced by the LZ78 parsing of the text. |
java.lang.String |
printFactors()
Returns a string representation of the actual list of factors produced by the LZ78 parsing of the text. |
java.lang.String |
toString()
Reconstructs the sequence from the list of factors induced by the LZ78 parsing of the text. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
protected static final char COMMENT_CHAR
protected Factor root_factor
protected int num_chars
protected int num_factors
Constructor Detail |
public FactorSequence(java.io.Reader reader) throws java.io.IOException, InvalidSequenceException
FactorSequence
, loading the sequence data
from the Reader
input stream. A doubly-linked list of factors is built
according to its LZ78 factorisation.
reader
- source of characters for this sequence
java.io.IOException
- if an I/O exception occurs when reading the input
InvalidSequenceException
- if the input does not contain a valid sequenceMethod Detail |
public Factor getRootFactor()
public int numFactors()
public int numChars()
public java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String printFactors()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |