About
Sputnik is a generic pipeline for the annotation and analysis
of biological sequence data. It was implemented at MIPS (Institute
for Bioinformatics, GSF Research Centre, Neuherberg, Germany)
as part of the BMBF Funded GABI project fostering plant genomics
research in Germany. Large numbers of sequences (100,000s)
could annotated and analysed in a heterogeneous and distributed
computational infrastructure. The core infrastructure was
implemented in Python and used a PostgreSQL database for the
storage and archival of both primary data, annotations and
indexes. Sputnik adopted a client::server architecture and
TCP/IP sockets were used to transfer primary data and computed
results between a single master server and compute nodes along
with specifications on how to perform the analysis and any
transformations that should be applied to the primary data.
Sputnik could be run on Linux, Tru64 and SGI IRIX. Analysis
methods were specified in XML and Sputnik was used to compute
comparative annotations for the complete EST collections of
all large plant EST collections, a few species specific proteomes
and EST collections from a few butterfly and moth species,
mouse and hydra.
Within the context of various collaborations in an international
context the potential of Sputnik became clear - the original
code base was however "biologist hacked" and owned
by the GSF Research Centre. As a result Sputnik is not suitable
for wide scale deployment.
The Sputnik pipeline was published in the database issue
of Nucleic Acids Research in January 2003.
Sputnik
manuscript - NAR 2003
The Current Sputnik sequence collections can be viewed at
the MIPS plant group
pages. Please note that these sequence collections may
be a little outdated and deprecated.
In January 2004 openSputnik was born as an open
source project hosted at SourceForge with pages and methods
available from my servers at Zettai.
This reimplementation was preciptiated on two main counts,
a relocation to the Centre
for Biotechnology in Turku and the desire to make the
computational infrastructure available to the widest available
audience.
openSputnik (not a very good name - but who really
cares?) is the name for the core analytical engine. This has
been implemented in Java and can communicate with MySQL, PostgreSQL
and ODBC databases using JDBC. Communication between compute
nodes and a central server is acheived using threading java
sockets and things should be a little more robust than before.
This Java-fication of Sputnik means that a true platform dependence
may have been acheived - I have a production server running
on linux with PostgreSQL and development on Windows XP with
MySQL (shudder). The only caveats to the platform choice will
be driven by the availability of binaries for the required
methods for the particular operating System. I think that
the future immediately looks a little more promising even
if the code has been written by a biologist!
As the project slowly develops I will be adding more information,
pictures and schemas to these pages - I am keen that you should
know everything about openSputnik as an when I develop it
or implement it.
Much of openSputnik is hosted at SourceForge

Stephen Rudd works at the
Centre for Biotechnology in Turku, Finland
|