openSputnik - a comparative genomics platform

About

Sputnik is a generic pipeline for the annotation and analysis of biological sequence data. It was implemented at MIPS (Institute for Bioinformatics, GSF Research Centre, Neuherberg, Germany) as part of the BMBF Funded GABI project fostering plant genomics research in Germany. Large numbers of sequences (100,000s) could annotated and analysed in a heterogeneous and distributed computational infrastructure. The core infrastructure was implemented in Python and used a PostgreSQL database for the storage and archival of both primary data, annotations and indexes. Sputnik adopted a client::server architecture and TCP/IP sockets were used to transfer primary data and computed results between a single master server and compute nodes along with specifications on how to perform the analysis and any transformations that should be applied to the primary data. Sputnik could be run on Linux, Tru64 and SGI IRIX. Analysis methods were specified in XML and Sputnik was used to compute comparative annotations for the complete EST collections of all large plant EST collections, a few species specific proteomes and EST collections from a few butterfly and moth species, mouse and hydra.

Within the context of various collaborations in an international context the potential of Sputnik became clear - the original code base was however "biologist hacked" and owned by the GSF Research Centre. As a result Sputnik is not suitable for wide scale deployment.

The Sputnik pipeline was published in the database issue of Nucleic Acids Research in January 2003.

Sputnik manuscript - NAR 2003

The Current Sputnik sequence collections can be viewed at the MIPS plant group pages. Please note that these sequence collections may be a little outdated and deprecated.

In January 2004 openSputnik was born as an open source project hosted at SourceForge with pages and methods available from my servers at Zettai. This reimplementation was preciptiated on two main counts, a relocation to the Centre for Biotechnology in Turku and the desire to make the computational infrastructure available to the widest available audience.

openSputnik (not a very good name - but who really cares?) is the name for the core analytical engine. This has been implemented in Java and can communicate with MySQL, PostgreSQL and ODBC databases using JDBC. Communication between compute nodes and a central server is acheived using threading java sockets and things should be a little more robust than before. This Java-fication of Sputnik means that a true platform dependence may have been acheived - I have a production server running on linux with PostgreSQL and development on Windows XP with MySQL (shudder). The only caveats to the platform choice will be driven by the availability of binaries for the required methods for the particular operating System. I think that the future immediately looks a little more promising even if the code has been written by a biologist!

As the project slowly develops I will be adding more information, pictures and schemas to these pages - I am keen that you should know everything about openSputnik as an when I develop it or implement it.

Much of openSputnik is hosted at SourceForge

Stephen Rudd works at the Centre for Biotechnology in Turku, Finland