Evidence-based annotation of transcripts and proteins in the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough (2011)

M. N. Price, A. M. Deutschbauer, J. V. Kuehl, H. Liu, H. E. Witkowska, A. P. Arkin
J. Bacteriology 193:5716-27

Abstract

We used high-resolution tiling microarrays and 5' RNA sequencing to identify transcripts in Desulfovibrio vulgaris Hildenborough, a model sulfate-reducing bacterium. We identified the first nucleotide position for 1,124 transcripts, including 54 proteins with leaderless transcripts and another 72 genes for which a major transcript initiates within the upstream protein-coding gene, which confounds measurements of the upstream gene's expression. Sequence analysis of these promoters showed that D. vulgaris sigma70 prefers a different -10 box and -35 box than Escherichia coli sigma does. 549 transcripts ended at intrinsic (rho-independent) terminators, but most other transcripts seemed to have variable ends. We found low-level antisense expression of most genes, and the 5' ends of these transcripts mapped to promoter-like sequences. Because antisense expression was reduced for highly-expressed genes, we suspect that elongation of non-specific antisense transcripts is suppressed by transcription of the sense strand. Finally, we combined the transcripts with comparative analysis and proteomics data to make 505 revisions to the original annotation of 3,531 proteins: we removed 255 (7.5%) proteins, changed 123 (3.6%) start codons, and added 127 (3.7%) proteins that were missed. Tiling data had higher coverage than shotgun proteomics and hence led to most of the corrections, but many errors probably remain. Our data are available at http://genomics.lbl.gov/supplemental/DvHtranscripts2011/.

Freely available at JB (or see older version)

Viewing the data in Artemis

This zip file (46 megabytes, updated) includes:

Artemis is available here

Loading all the plots will require a computer with several gigabytes of memory and will probably require you to change your Java or Artemis settings to allow it to use that much memory. Also, we recommend smoothing the tiling data with a window size of 10-15 and log transforming the 5' RNASeq data. By default Artemis may smooth 5' RNASeq data but this is not desirable, you can turn this off by lowering the window size to 1.

Tables to download

All files are tab-delimited. *.gz files are compressed with gzip.

Analyzing the data in R

Sources

This work conducted by ENIGMA -- Ecosystems and Networks Integrated with Genes and Molecular Assemblies -- was supported by the Office of Science, Office of Biological and Environmental Research, of the U. S. Department of Energy under Contract No. DE-AC02-05CH11231.