NeoBio API

neobio.alignment
Class Smawk

java.lang.Object
  |
  +--neobio.alignment.Smawk

public class Smawk
extends java.lang.Object

This class implement the SMAWK algorithm to compute column maxima on a totally monotone matrix as described.

This implementation derives from the paper of A.Aggarwal, M.Klawe, S.Moran, P.Shor, and R.Wilber, Geometric Applications of a Matrix Searching Algorithm, Algorithmica, 2, 195-208 (1987).

The matrix must be an object that implements the Matrix interface. It is also expected to be totally monotone, and the number of rows should be greater than or equals to the number of columns. If these conditions are not met, the the result is unpredictable and can lead to an ArrayIndexOutOfBoundsException.

computeColumnMaxima is the main public method of this class. It computes the column maxima of a given matrix, i.e. the rows that contain the maximum value of each column in O(n) (linear) time, where n is the number of rows. This method does not return the maximum values itself, but just the indexes of their rows.

Note that it is necessary to create an instance of this class to execute the computeColumnMaxima because it stores temporary data is that instance. To prevent problems with concurrent access, the computeColumnMaxima method is declared synchronized.

 // create an instance of Smawk
 Smawk smawk = new Smawk();

 // create an array to store the result
 int col_maxima = new int [some_matrix.numColumns()];

 // now compute column maxima
 smawk.computeColumnMaxima (some_matrix, col_maxima)
 

Note that the array of column maxima indexes (the computation result) must be created beforehand and its size must be equal to the number of columns of the matrix.

This implementation creates arrays of row and column indexes from the original array and simulates all operations (reducing, deletion of odd columns, etc.) by manipulating these arrays. The benefit is two-fold. First the matrix is not required to implement any of these this operations but only a simple method to retrieve a value at a given position. Moreover, it tends to be faster since it uses a manipulation of these small vectors and no row or column is actually deleted. The downside is, of course, the use of extra memory (in practice, however, this is negligible).

Note that this class does not contain a computeRowMaxima method, however, the computeColumnMaxima can easily be used to compute row maxima by using a transposed matrix interface, i.e. one that inverts the indexes of the valueAt method (returning [col,row] when [row,col] is requested) and swaps the number of rows by the number of columns, and vice-versa.

Another simpler method, naiveComputeColumnMaxima, does the same job without using the SMAWK algorithm. It takes advantage of the monotone property of the matrix only (SMAWK explores the stronger constraint of total monotonicity), and therefore has a worst case time complexity of O(n * m), where n is the number of rows and m is the number of columns. However, this method tends to be faster for small matrices because it avoids recursions and row and column manipulations. There is also a naiveComputeRowMaxima method to compute row maxima with the naive approach.

Author:
Sergio A. de Carvalho Jr.
See Also:
Matrix

Field Summary
protected  int[] col
          An array of column indexes reflecting the current state of the matrix.
protected  Matrix matrix
          A pointer to the matrix that is being manipulated.
protected  int numcols
          The matrix's current number of columns.
protected  int numrows
          The matrix's current number of rows.
protected  int[] row
          An array of row indexes reflecting the current state of the matrix.
protected  int[] row_position
          This array is used to store for each row of the original matrix, its index in the current state of the matrix, i.e. its index in the row array.
 
Constructor Summary
Smawk()
           
 
Method Summary
protected  void computeColumnMaxima(int[] col_maxima)
          This method implements the SMAWK algorithm to compute the column maxima of a given matrix.
 void computeColumnMaxima(Matrix matrix, int[] col_maxima)
          Computes the column maxima of a given matrix.
protected  void deleteOddColumns()
          This method simulates a deletion of odd rows by manipulating the col array of indexes.
protected  void deleteRow(int reduced_rows, int k)
          This method simulates a deletion of a row in the matrix during the reduce operation.
static void naiveComputeColumnMaxima(Matrix matrix, int[] col_maxima)
          This is a simpler method for calculating column maxima.
static void naiveComputeRowMaxima(Matrix matrix, int[] row_maxima)
          This is a simpler method for calculating row maxima.
protected  void printMatrix()
          Prints the current state of the matrix (reflecting deleted rows and columns) in the standard output.
static void printMatrix(Matrix matrix)
          Prints the contents of an object implementing the matrix interface in the standard output.
protected  void reduce()
          This method is the key component of the SMAWK algorithm.
protected  void restoreOddColumns(int original_numcols)
          Restores the col array of indexes to the state it was before the deleteOddColumns method was called.
protected  void restoreRows(int original_numrows)
          Restores the row array of indexes to the state it was before the reduce method was called.
protected  int valueAt(int r, int c)
          This is a helper method to simplify the call to the valueAt method of the matrix.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

matrix

protected Matrix matrix
A pointer to the matrix that is being manipulated.


numrows

protected int numrows
The matrix's current number of rows. This reflects any deletion of rows already performed.


row

protected int[] row
An array of row indexes reflecting the current state of the matrix. When rows are deleted, the corresponding indexes are simply moved to the end of the vector.


row_position

protected int[] row_position
This array is used to store for each row of the original matrix, its index in the current state of the matrix, i.e. its index in the row array.


numcols

protected int numcols
The matrix's current number of columns. This reflects any deletion of columns already performed.


col

protected int[] col
An array of column indexes reflecting the current state of the matrix. When columns are deleted, the corresponding indexes are simply moved to the end of the vector.

Constructor Detail

Smawk

public Smawk()
Method Detail

computeColumnMaxima

public void computeColumnMaxima(Matrix matrix,
                                int[] col_maxima)
Computes the column maxima of a given matrix. It first sets up arrays of row and column indexes to simulate a copy of the matrix (where all operations will be performed). It then calls the recursive protected computeColumnMaxima method.

The matrix is required to be an object that implements the Matrix interface. It is also expected to be totally monotone, and the number of rows should be greater than or equals to the number of columns. If it is not, the the result is unpredictable and can lead to an ArrayIndexOutOfBoundsException.

This method does not return the maximum values itself, but just the indexes of their rows. Note that the array of column maxima (the computation result) must be created beforehand and its size must be equal to the number of columns of the matrix.

To prevent problems with concurrent access, this method is declared synchronized.

Parameters:
matrix - the matrix that will have its column maxima computed
col_maxima - the array of column maxima (indexes of the rows containing maximum values of each column); this is the computation result
See Also:
computeColumnMaxima(int[])

computeColumnMaxima

protected void computeColumnMaxima(int[] col_maxima)
This method implements the SMAWK algorithm to compute the column maxima of a given matrix. It uses the arrays of row and column indexes to performs all operations on this 'fake' copy of the original matrix.

The first step is to reduce the matrix to a quadratic size (if necessary). It then delete all odd columns and recursively computes column maxima for this matrix. Finally, using the information computed for the odd columns, it searches for column maxima of the even columns. The column maxima are progressively stored in the col_maxima array (each recursive call will compute a set of column maxima).

Parameters:
col_maxima - the array of column maxima (the computation result)

valueAt

protected final int valueAt(int r,
                            int c)
This is a helper method to simplify the call to the valueAt method of the matrix. It returns the value at row r, column c.

Parameters:
r - the row number of the value being retrieved
c - the column number of the value being retrieved
Returns:
the value at row r, column c
See Also:
Matrix.valueAt(int, int)

deleteOddColumns

protected void deleteOddColumns()
This method simulates a deletion of odd rows by manipulating the col array of indexes. In fact, nothing is deleted, but the indexes are moved to the end of the array in a way that they can be easily restored by the restoreOddColumns method using a reverse approach.

See Also:
restoreOddColumns(int)

restoreOddColumns

protected void restoreOddColumns(int original_numcols)
Restores the col array of indexes to the state it was before the deleteOddColumns method was called. It only needs to know how many columns there was originally. The indexes that were moved to the end of the array are restored to their original position.

Parameters:
original_numcols - the number of columns before the odd ones were deleted
See Also:
deleteOddColumns()

reduce

protected void reduce()
This method is the key component of the SMAWK algorithm. It reduces an n x m matrix (n rows and m columns), where n >= m, to an n x n matrix by deleting m - n rows that are guaranteed to have no maximum value for any column. The result is an squared submatrix matrix that contains, for each column c, the row that has the maximum value of c in the original matrix. The rows are deleted with the deleteRowmethod.

It uses the total monotonicity property of the matrix to identify which rows can safely be deleted.

See Also:
deleteRow(int, int)

deleteRow

protected void deleteRow(int reduced_rows,
                         int k)
This method simulates a deletion of a row in the matrix during the reduce operation. It just moves the index to the end of the array in a way that it can be restored afterwards by the restoreRows method (nothing is actually deleted). Deleted indexes are kept in ascending order.

Parameters:
reduced_rows - the current number of rows in the reducing matrix
k - the index of the row to be deleted
See Also:
restoreRows(int)

restoreRows

protected void restoreRows(int original_numrows)
Restores the row array of indexes to the state it was before the reduce method was called. It only needs to know how many rows there was originally. The indexes that were moved to the end of the array are restored to their original position.

Parameters:
original_numrows - the number of rows before the reduction was performed
See Also:
deleteRow(int, int), reduce()

naiveComputeColumnMaxima

public static void naiveComputeColumnMaxima(Matrix matrix,
                                            int[] col_maxima)
This is a simpler method for calculating column maxima. It does the same job as computeColumnMaxima, but without complexity of the SMAWK algorithm.

The matrix is required to be an object that implements the Matrix interface. It is also expected to be monotone. If it is not, the result is unpredictable but, unlike computeColumnMaxima, it cannot lead to an ArrayIndexOutOfBoundsException.

This method does not return the maximum values itself, but just the indexes of their rows. Note that the array of column maxima (the computation result) must be created beforehand and its size must be equal to the number of columns of the matrix.

It takes advantage of the monotone property of the matrix only (SMAWK explores the stronger constraint of total monotonicity), and therefore has a worst case time complexity of O(n * m), where n is the number of rows and m is the number of columns. However, this method tends to be faster for small matrices because it avoids recursions and row and column manipulations.

Parameters:
matrix - the matrix that will have its column maxima computed
col_maxima - the array of column maxima (indexes of the rows containing maximum values of each column); this is the computation result
See Also:
naiveComputeRowMaxima(neobio.alignment.Matrix, int[])

naiveComputeRowMaxima

public static void naiveComputeRowMaxima(Matrix matrix,
                                         int[] row_maxima)
This is a simpler method for calculating row maxima. It does not use the SMAWK algorithm.

The matrix is required to be an object that implements the Matrix interface. It is also expected to be monotone. If it is not, the result is unpredictable but, unlike computeColumnMaxima, it cannot lead to an ArrayIndexOutOfBoundsException.

This method does not return the maximum values itself, but just the indexes of their columns. Note that the array of row maxima (the computation result) must be created beforehand and its size must be equal to the number of columns of the matrix.

It takes advantage of the monotone property of the matrix only (SMAWK explores the stronger constraint of total monotonicity), and therefore has a worst case time complexity of O(n * m), where n is the number of rows and m is the number of columns. However, this method tends to be faster for small matrices because it avoids recursions and row and column manipulations.

Parameters:
matrix - the matrix that will have its row maxima computed
row_maxima - the array of row maxima (indexes of the columns containing maximum values of each row); this is the computation result
See Also:
naiveComputeColumnMaxima(neobio.alignment.Matrix, int[])

printMatrix

protected void printMatrix()
Prints the current state of the matrix (reflecting deleted rows and columns) in the standard output. It can be used internally for debugging.


printMatrix

public static void printMatrix(Matrix matrix)
Prints the contents of an object implementing the matrix interface in the standard output. It can be used for debugging.

Parameters:
matrix - a matrix

SourceForge.net

http://neobio.sourceforge.net
NeoBio is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. NeoBio is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with NeoBio; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.