EST analysis

Sputnik was first written to automate the analysis of EST annotation for comparative genomics. openSputnik both maintains the concept behind Sputnik and continues to develop as an optimal solution for the processing of large EST collections.

The core concept behind successful EST annotation is to create an object relational infrastructure where for example a unigene cluster inherits the attributes of the underlying EST sequence data. Such annotative attributes include information such as e.g. the mouse strain from which the ESTs were sequenced, the developmental stage at which the clones were sequenced and so on. With the derivation and annotation of a peptide sequence we can consider the other extreme. With a single EST sequence that we have previously shown to stem from a candidate gene we can associate annotation that stems from peptide domains that are not associated with this EST, but rather from ESTs that assemble either directly or indirectly with this sequence.

The focus of my research group is firmly embedded in comparative genomics - a wide variety of methods have been implemented in openSputnik that allow for the selection of lineage specific transcripts, transcript families, lineage specific domains or domain architectures. We have all plant EST collections with more than 5,000 ESTs clustered, assembled and placed within the openSputnik comparative framework. The next step (the hard one) is to make some sense of this data and to present it in a meaningful manner.

In addition to plant EST analysis we have on-going collaborations with various research groups working on molecular markers in pig, mouse, chickpea and barley. We are also looking at EST collections from some of the more exotic genomes including Hydra magnipapillata, Bombyx mori, Cycas rumphii and Ginkgo biloba.

In the context of ESTs and molecular markers Sputnik and openSputnik have been mentioned in publications

Brenner, E. D., Stevenson, D. W., McCombie, R. W., Katari, M. S., Rudd, S. A., Mayer, K. F., Palenchar, P. M., Runko, S. J., Twigg, R. W., Dai, G., et al. (2003). Expressed sequence tag analysis in Cycas, the most primitive living seed plant. Genome Biol 4, R78.
Kota, R., Rudd, S., Facius, A., Kolesov, G., Thiel, T., Zhang, H., Stein, N., Mayer, K., and Graner, A. (2003). Snipping polymorphisms from large EST collections in barley ( Hordeum vulgareL.). Mol Genet Genomics.
Rudd, S. (2003). Expressed sequence tags: alternative or complement to whole genome sequences? Trends Plant Sci 8, 321-329.
Rudd, S., Mewes, H. W., and Mayer, K. F. (2003). Sputnik: a database platform for comparative plant genomics. Nucleic Acids Res 31, 128-132.

Much of openSputnik is hosted at SourceForge
SourceForge.net Logo

Stephen Rudd works at the Centre for Biotechnology in Turku, Finland