PacBio’s long-read sequencing technologies can be successfully used for a complete bacterial genome assembly using recently developed non-hybrid assemblers in the absence of second-generation, high-quality short reads. However, standardized procedures that take into account multiple pre-existing second-generation sequencing platforms are scarce. In addition to Illumina HiSeq and Ion Torrent PGM-based genome sequencing results derived from previous studies, we generated further sequencing data, including from the PacBio RS II platform, and applied various bioinformatics tools to obtain complete genome assemblies for five bacterial strains. Our approach revealed that the hierarchical genome assembly process (HGAP) non-hybrid assembler resulted in nearly complete assemblies at a moderate coverage of ~75x, but that different versions produced non-compatible results requiring post processing. The other two platforms further improved the PacBio assembly through scaffolding and a final error correction.
) LASTZ alignment
between two versions of
HGAP assemblies. The upper panel shows a dot plot; and the lower panel, alignment blocks. The major contig from the old version of HGAP is shown in the horizontal axis. The plots were generated using Geneious Pro R8 (
) MUMmer whole-genome alignments
of two versions of
sp. HS311 HGAP assemblies (left, old version; right, new version) with the complete genome sequence of
CR1 (upper panel) and cumulative GC skew plots as calculated by (G-C)/(G+C) with a window size of 5 kb (lower panel). (
) Ion Torrent PGM mate-pair reads on
HGAP contigs were mapped and visualized using Consed software
, the results indicating that the four contigs are arranged in a single scaffold. The lightgreen plot designates the read depth. Multiple copies of ribosomal RNA genes, designated by the thick arrows at the bottom, induced mate reads to align at a longer span (○). RNA genes at the end of the adjacent contigs, represented through filled-in arrows of the same color, were used to join them, resulting in two contigs.
This work was supported by the KRIBB Research Initiative Program, Ministry of Science, ICT, and Future Planning, and by the Next-Generation BioGreen 21 Program (SSAC Grant No. PJ009524) funded by the RDA (to C.M.R), Republic of Korea.
SPAdes: a n ew g enome a ssembly algorithm and its applications to single-cell sequencing.
J. Comput. Biol.
DOI : 10.1089/cmb.2012.0021
Plantagora: modeling whole genome sequencing and assembly of plant genomes.
DOI : 10.1371/journal.pone.0028436
Atypical at skew in Firmicute genomes results from selection and not from mutation.
DOI : 10.1371/journal.pgen.1002283
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
DOI : 10.1038/nmeth.2474
A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data.
DOI : 10.1093/bioinformatics/btu661
Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology.
DOI : 10.1371/journal.pone.0047768
Improved pairwise alignment of genomic DNA.
Pennsylvania State University
Genome sequences ofPseudomonas amygdalipv.tabacistrain ATCC 11528 and pv.lachrymansstrain 98A-744.
Whole genome complete resequencing ofBacillus subtilis nattoby combining long reads with highquality short reads.
DOI : 10.1371/journal.pone.0109999
One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly.
Curr. Opin. Microbiol.
DOI : 10.1016/j.mib.2014.11.014
Hybrid error correction and de novo assembly of single-molecule sequencing reads.
DOI : 10.1038/nbt.2280
Versatile and open software for comparing large genomes.
DOI : 10.1186/gb-2004-5-2-r12
Completing bacterial genome assemblies: strategy and performance comparisons.
DOI : 10.1038/srep08747
An improved approach to mate-paired library preparation for Illumina sequencing.
Methods Next Gener. Seq.
NovelPaenibacillussp. and the method for yield increase of potato using the same.
Republic of Korea patent application 10-1498155
Genome sequence and comparative genome analysis ofPseudomonas syringaepv.syringaetype strain ATCC 19310.
J. Microbiol. Biotechnol.
DOI : 10.4014/jmb.1312.12082
Finished bacterial genomes from shotgun sequence data.
DOI : 10.1101/gr.141515.112
Velvet: algorithms for de novo short read assembly using de Bruijn graphs.
DOI : 10.1101/gr.074492.107