Bioinformatics Algorithms in Java
What the heck is NeoBio? NeoBio is a library of bioinformatics algorithms implemented in Java.
What algorithms? The current version consists mainly of (pairwise) sequence alignment algorithms such as the classical dynamic programming methods of Needleman & Wunsch (global alignment) and Smith & Waterman (local alignment).
Anything else? Yes, a more efficient approach, due to Crochemore, Landau and Ziv-Ukelson is also available. It uses Lempel-Ziv compression to speed-up the computation of the dynamic programming matrix. It also relies on the SMAWK algorithm, due to Aggarwal et al., that computes all column maxima of a totally monotone matrix in linear time.
Hum... And all sequence alignment algorithms support simple scoring schemes as well as substitution matrices such as standard BLOSUM and PAM matrices. But so far they support constant gap penalty functions only. Future versions may contain related algorithms such as multiple sequence alignment, database search and protein structure prediction.
Wow...Last but not least, NeoBio also provides a simple GUI and command line based tools to run the sequence alignment algorithms on DNA and protein sequences.
NeoBio project is hosted at SourceForge.net and is available in two distributions. The executable JAR file can be used to easily run the library's main utility as described in the next section.
The ZIP file can be used to compile the source code as described in the compiling section. It contains:
NeoBio is free software, so feel free to download and use it at your own will.
NeoBio was developed in Java and therefore requires a compliant Java VM (virtual machine). Sun's VM can be downloaded from here.
The easiest way to experiment with NeoBio is to download the JAR file and run the GUI tool. Simply go to the directory where the JAR file was saved and type the following command:
java -jar neobio.jar
NeoBio has also a simple command-line tool that works in a similar way. To run this utility, download and extract the ZIP file to your machine. Then, go to the directory where the file was extracted and execute the following commands:
cd bin java neobio.textui.NeoBio <algorithm> <S1> <S2> [M <matrix> | S <match> <mismatch> <gap>]
NWfor Needleman & Wunsch (global alignment),
SWfor Smith & Waterman (local alignment),
CLZGfor Crochemore, Landau & Ziv-Ukelson (global alignment) or
CLZLfor Crochemore, Landau & Ziv-Ukelson (local alignment);
<S1>is the first sequence file;
<S2>is the second sequence file;
M <matrix>for using a scoring matrix file;
S <match> <mismatch> <gap>for using a simple scoring scheme, where
<match>is the match reward value,
<mismatch>is the mismatch penalty value and
<gap>is the cost of a gap (linear gap cost function).
When the ZIP file is extracted, several DNA and protein sequences as well as common PAM and BLOSUM substitution matrices can be found in the data directory. They can be used to compute alignments as in the following examples:
cd bin java neobio.textui.NeoBio NW ..\data\dna01a.txt ..\data\dna02a.txt S 1 -1 -1 java neobio.textui.NeoBio CLZG ..\data\dna01a.txt ..\data\dna02a.txt M ..\data\blosum62.txt
Note: NeoBio is being developed with Sun's Java 2 SDK Standard Edition v 1.4.0 on a Windows XP machine and has not been tested on any other environment, although it should run anywhere (if it does not, ask Sun Microsystems).
Compiling NeoBio should be rather straightforward. There is no special requirement since it does not use non-core Java libraries. If you have a Java 2 compliant SDK (Software Development Kit) installed in your machine just follow these steps:
1. Download and extract the ZIP file.
2. Go to the home directory of the source code (
3. Run your compiler with the following command:
on a Unix-like machine, or
javac neobio/alignment/*.java neobio/textui/*.java neobio/gui/*.java
on a Windows machine.
javac neobio\alignment\*.java neobio\textui\*.java neobio\gui\*.java
Note: NeoBio is being developed with Sun's Java 2 SDK Standard Edition v 1.4.0 and has not been tested with any other Java SDK.
The NeoBio API documentation is available online. For a general introduction to the sequence alignment problem and a description of the algorithms implemented by NeoBio, please see my MSc dissertation.
This project is hosted at SourceForge.net; more information is available here. To check the project's usage statistics (page hits and downloads), click here. To see the last 10 visitors to this site, click here.
I heard from Christoph Gille that NeoBio was incorportated into STRAP, his interactive alignment software. More recently, Will Gilbert has successfully added NeoBio to his Sequence Analysis package for Mac OS X.
And now, the creator of the Oscar-winning NeoBio proudly presents... neochip!
NeoBio is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
NeoBio is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with NeoBio; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
Click here to read the GNU General Public License.
Since NeoBio is free software, there is absolutely no obligation to pay any money to use or redistribute it. However, if you are one of the many... er... sorry, few... happy users thinking "wow! this is great software!" and felling a little generous today, please do not hesitate to click here.
This project was initiated under the supervision of Professor Maxime Crochemore at the Department of Computer Science, King's College London, UK.
Sergio Anibal de Carvalho Junior
sergioanibaljr @ users.sourceforge.net