|
NeoBio API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--neobio.alignment.PairwiseAlignmentAlgorithm | +--neobio.alignment.SmithWaterman
This class implement the classic local alignment algorithm (with linear gap penalty function) due to T.F.Smith and M.S.Waterman (1981).
This algorithm is very similar to the NeedlemanWunsch algorithm for global alignment. The idea here also consists of building an (n+1 x m+1) matrix M given two sequences A and B of sizes n and m, respectively. However, unlike in the global alignment case, every position M[i,j] in the matrix contains the similarity score of suffixes of A[1..i] and B[1..j].
Starting from row 0, column 0, the computeMatrix
method
computes each position M[i,j] with the following recurrence:
M[0,0] = M[0,j] = M[i,0] = 0
M[i,j] = max { M[i,j-1] + scoreInsertion (B[j]),
M[i-1,j-1] + scoreSubstitution (A[i], B[j]),
M[i-1,j] + scoreDeletion(A[i]) }
Note that, here, all cells in the first row and column are set to zero. The best local alignment score is the highest value found anywhere in the matrix.
Just like in global alignment case, this algorithm has quadratic space complexity because it needs to keep an (n+1 x m+1) matrix in memory. And since the work of computing each cell is constant, it also has quadratic time complexity.
After the matrix has been computed, the alignment can be retrieved by tracing a path
back in the matrix from the position of the highest score until a cell of value zero is
reached. This step is performed by the buildOptimalAlignment
method, and its time complexity is linear on the size of the
alignment.
If the similarity value only is needed (and not the alignment itself), it is easy to
reduce the space requirement to O(n) by keeping just the last row or column in memory.
This is precisely what is done by the computeScore
method. Note
that it still requires O(n2) time.
For a more efficient approach to the local alignment problem, see the CrochemoreLandauZivUkelson algorithm. For global alignment, see the NeedlemanWunsch algorithm.
NeedlemanWunsch
,
CrochemoreLandauZivUkelson
,
CrochemoreLandauZivUkelsonLocalAlignment
,
CrochemoreLandauZivUkelsonGlobalAlignment
Field Summary | |
protected int[][] |
matrix
The dynamic programming matrix. |
protected int |
max_col
Indicate the column of where an optimal local alignment can be found in the matrix. |
protected int |
max_row
Indicate the row of where an optimal local alignment can be found in the matrix.. |
protected CharSequence |
seq1
The first sequence of an alignment. |
protected CharSequence |
seq2
The second sequence of an alignment. |
Fields inherited from class neobio.alignment.PairwiseAlignmentAlgorithm |
alignment, APPROXIMATE_MATCH_TAG, GAP_CHARACTER, GAP_TAG, MATCH_TAG, MISMATCH_TAG, score, score_computed, scoring, sequences_loaded, use_match_tag |
Constructor Summary | |
SmithWaterman()
|
Method Summary | |
protected PairwiseAlignment |
buildOptimalAlignment()
Builds an optimal local alignment between the loaded sequences. |
protected void |
computeMatrix()
Computes the dynamic programming matrix. |
protected PairwiseAlignment |
computePairwiseAlignment()
Builds an optimal local alignment between the loaded sequences after computing the dynamic programming matrix. |
protected int |
computeScore()
Computes the score of the best local alignment between the two sequences using the scoring scheme previously set. |
protected void |
loadSequencesInternal(java.io.Reader input1,
java.io.Reader input2)
Loads sequences into CharSequence instances. |
protected void |
unloadSequencesInternal()
Frees pointers to loaded sequences and the dynamic programming matrix so that their data can be garbage collected. |
Methods inherited from class neobio.alignment.PairwiseAlignmentAlgorithm |
getPairwiseAlignment, getScore, loadSequences, max, max, max, scoreDeletion, scoreInsertion, scoreSubstitution, setScoringScheme, unloadSequences, useMatchTag |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected CharSequence seq1
protected CharSequence seq2
protected int[][] matrix
seq1
and a suffix of
the first j characters of seq2
.
protected int max_row
protected int max_col
Constructor Detail |
public SmithWaterman()
Method Detail |
protected void loadSequencesInternal(java.io.Reader input1, java.io.Reader input2) throws java.io.IOException, InvalidSequenceException
CharSequence
(please check
the specification of that class for specific requirements).
loadSequencesInternal
in class PairwiseAlignmentAlgorithm
input1
- Input for first sequenceinput2
- Input for second sequence
java.io.IOException
- If an I/O error occurs when reading the sequences
InvalidSequenceException
- If the sequences are not validCharSequence
protected void unloadSequencesInternal()
unloadSequencesInternal
in class PairwiseAlignmentAlgorithm
PairwiseAlignmentAlgorithm.unloadSequences()
protected PairwiseAlignment computePairwiseAlignment() throws IncompatibleScoringSchemeException
buildOptimalAlignment
method
after the computeMatrix
method computes the dynamic programming
matrix.
computePairwiseAlignment
in class PairwiseAlignmentAlgorithm
IncompatibleScoringSchemeException
- If the scoring scheme is not compatible
with the loaded sequences.computeMatrix()
,
buildOptimalAlignment()
protected void computeMatrix() throws IncompatibleScoringSchemeException
IncompatibleScoringSchemeException
- If the scoring scheme is not compatible
with the loaded sequences.protected PairwiseAlignment buildOptimalAlignment() throws IncompatibleScoringSchemeException
computeMatrix
method.
IncompatibleScoringSchemeException
- If the scoring scheme is not compatible
with the loaded sequences.computeMatrix()
protected int computeScore() throws IncompatibleScoringSchemeException
computeScore
in class PairwiseAlignmentAlgorithm
IncompatibleScoringSchemeException
- If the scoring scheme is not compatible
with the loaded sequences.PairwiseAlignmentAlgorithm.getScore()
|
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |