Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison
Abstract
The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring sequence pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation.
DOI: 10.4056/sigs.531120
Keywords
This work is licensed under a Creative Commons Attribution 3.0 License.
| This article doi:10.4056/sigs.531120 has been cited by 17 other articles: |
Towards a genome based taxonomy of Mycoplasmas
Thompson et al.
Infection, Genetics and Evolution 11(7) 1798.
10.1016/j.meegid.2011.07.020
Complete genome sequence of Cellulophaga lytica type strain (LIM-21T)
Pati et al.
Stand. Genomic Sci. 4(2) 221.
10.4056/sigs.1774329
Non-contiguous finished genome sequence of Bacteroides coprosuis type strain (PC139T)
Land et al.
Stand. Genomic Sci. 4(2) 233.
10.4056/sigs.1784330
Complete genome sequence of Geodermatophilus obscurus type strain (G-20T)
Ivanova et al.
Stand. Genomic Sci. 2(2) 158.
10.4056/sigs.711311
Complete genome sequence of Mahella australiensis type strain (50-1 BONT)
Sikorski et al.
Stand. Genomic Sci. 4(3) 331.
10.4056/sigs.1864526
Complete genome sequence of the hyperthermophilic chemolithoautotroph Pyrolobus fumarii type strain (1AT)
Anderson et al.
Stand. Genomic Sci. 4(3) 381.
10.4056/sigs.2014648
En route to a genome-based classification of Archaea and Bacteria?
Klenk and Göker
Systematic and Applied Microbiology 33(4) 175.
10.1016/j.syapm.2010.03.003
Complete genome sequence of the gliding, heparinolytic Pedobacter saltans type strain (113T)
Liolios et al.
Stand. Genomic Sci. 5(1) 30.
10.4056/sigs.2154937
Genome sequence of the moderately thermophilic halophile Flexistipes sinusarabici strain (MAS10T)
Lapidus et al.
Stand. Genomic Sci. 5(1) 86.
10.4056/sigs.2235024
Non-contiguous finished genome sequence and contextual data of the filamentous soil bacterium Ktedonobacter racemifer type strain (SOSP1-21T)
Chang et al.
Stand. Genomic Sci. 5(1) 97.
10.4056/sigs.2114901
Complete genome sequence of the thermophilic, hydrogen-oxidizing Bacillus tusciae type strain (T2T) and reclassification in the new genus, Kyrpidia gen. nov. as Kyrpidia tusciae comb. nov. and emendation of the family Alicyclobacillaceae da Costa and Rainey, 2010.
Klenk et al.
Stand. Genomic Sci. 5(1) 121.
10.4056/sigs.2144922
Complete genome sequence of Hydrogenobacter thermophilus type strain (TK-6T)
Zeytun et al.
Stand. Genomic Sci. 4(2) 131.
10.4056/sigs.1463589
Ribosomal and protein coding gene based multigene phylogeny on the family Streptomycetaceae
Han et al.
Systematic and Applied Microbiology 35(1) 1.
10.1016/j.syapm.2011.08.007
Complete genome sequence of Ignisphaera aggregans type strain (AQ1.S1T)
Göker et al.
Stand. Genomic Sci. 3(1) 66.
10.4056/sigs.1072907
Relationship of Bacillus amyloliquefaciens clades associated with strains DSM 7T and FZB42T: a proposal for Bacillus amyloliquefaciens subsp. amyloliquefaciens subsp. nov. and Bacillus amyloliquefaciens subsp. plantarum subsp. nov. based on complete genome sequence comparisons
Borriss et al.
INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY 61(8) 1786.
10.1099/ijs.0.023267-0
Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs
Auch et al.
Stand. Genomic Sci. 2(1) 142.
10.4056/sigs.541628
Comparative genomics of Neisseria weaveri clarifies the taxonomy of this species and identifies genetic determinants that may be associated with virulence
Yi et al.
FEMS Microbiol Lett () n/a.
10.1111/j.1574-6968.2011.02485.x
Acknowledgements
We would like to gratefully acknowledge the support of many members of the Genomic Standards Consortium, the broader genomic science community, and those who have indicated their willingness to serve as editors, reviewers and contributors.
Funding for SIGS is provided by a grant from the Office of the Vice President for Research and Graduate Studies at Michigan State University, the Michigan State University Foundation, and the US Department of Energy Biological and Environmental Research DE-FG02-08ER64707.
Standards in Genomic Sciences is indexed in:
![]() | ![]() | ![]() |
![]() | ![]() |
Sponsors of the Genomic Standards Consortium:
![]() |





