The development of rapid and efficient genome sequencing methods has enabled us to study the evolutionary background of bacterial genetic information. Here, we present comparative genomic analysis of 17
Streptomyces
species, for which the genome has been completely sequenced, using the pan-genome approach. The analysis revealed that 34,592 ortholog clusters constituted the pan-genome of these
Streptomyces
species, including 2,018 in the core genome, 11,743 in the dispensable genome, and 20,831 in the unique genome. The core genome was converged to a smaller number of genes than reported previously, with 3,096 gene families. Functional enrichment analysis showed that genes involved in transcription were most abundant in the
Streptomyces
pan-genome. Finally, we investigated core genes for the sigma factors, mycothiol biosynthesis pathway, and secondary metabolism pathways; our data showed that many genes involved in stress response and morphological differentiation were commonly expressed in
Streptomyces
species. Elucidation of the core genome offers a basis for understanding the functional evolution of
Streptomyces
species and provides insights into target selection for the construction of industrial strains.
Introduction
Streptomycetes are active producers of a wide range of secondary metabolites, including more than two-thirds of the natural antibiotics in the pharmaceutical industry
[1
,
19]
. They are members of the largest genus of actinobacteria, which are ubiquitous in soil and undergo complex differentiation from filamentous mycelia to aerial hyphae, and spores
[6
,
9]
. For the genome-scale elucidation of the genetic background of secondary metabolites and the rich repertoire of novel enzymes in this genus, extensive sequence analyses have been carried out for different model
Streptomyces
strains, such as
Streptomyces coelicolor
A3(2)
[1]
,
S. griseus
[24]
, and
S. avermitilis
[14]
. In addition to a high G+C ratio and linear chromosome shape as important genomic features, the
Streptomyces
genome encodes a number of sigma factors and transcription factors that are involved in the complex transcriptional regulatory network
[2]
. Many genes are involved in morphological differentiation, and tens of gene clusters encode genes that participate in the biosynthesis of secondary metabolites in each strain
[22]
.
To date, the genome sequences of over 30,000 bacterial species have been reported in the NCBI genome database (
http://www.ncbi.nlm.nih.gov/genome/browse
). From this abundance of information, comparative genomics analyses between multiple genomes of individual species have been used to reveal extensive genomic inter- and intraspecies diversity
[3]
. Among currently available comparative analysis methods, pan-genome analysis has been used to describe the entire gene repertoire of bacterial species through identifying the sum of the core and dispensable genomes
[21]
. Thus, a pan-genome is defined as the full set of non-orthologous genes present in species, composed of the core and dispensable genomes; that is sets of genes that are present in all strains and unique to single strains, respectively. This analysis demonstrated how many new genes can be identified from newly sequenced genomes. Several reports of comparative genomic studies have revealed a catalog of genomic components and the evolutionary history of
Streptomyces
species
[13
,
15
,
32]
. However, even the most recent study analyzed five model
Streptomyces
spp.
[32]
, requiring incorporation of current sequence information.
In this study, based on the rapidly increasing number of genomes sequenced, we performed comprehensive analysis of the genomes of all 17
Streptomyces
species that have been completely sequenced to date in order to understand their genomic components. We estimated the pan-genome of
Streptomyces
and identified the core genome that was conserved in all of the analyzed strains. In addition, ortholog clusters within the pan-genome were classified according to their functions, and genes that showed distinctive characteristics of
Streptomyces
were listed. This analysis provides up-to-date information on genomic diversity and core conservation of
Streptomyces
genomes, facilitating our comprehensive understanding of this genus.
Materials and Methods
- Nucleotide Sequence Accession Numbers
All the complete genome sequences of the 17
Streptomyces
species used for our analysis were retrieved from NCBI FTP (
ftp://ftp.ncbi.nih.gov/genome/Bacteria
). The accession numbers for these 17
Streptomyces
species are NC_021055 (
Streptomyces
sp. PAMC26508), NC_015953 (
Streptomyces
sp. SirexAA E), NC_020990 (
S. albus
J1074), NC_003155 (
S. avermitilis
MA 4680), NC_016582 (
S. bingchenggensis
BCW1), NC_016111 (
S. cattleya
NRRL 8057), NC_003888 (
S. coelicolor
A3(2)), NC_021985 (
S. collinus
Tü 365), NC_020504 (
S. davawensis
JCM 4913), NC_016114 (
S. flavogriseus
ATCC 33331), NC_021177 (
S. fulvissimus
DSM 40593), NC_010572 (
S. griseus
NBRC 13350), NC_017765 (
S. hygroscopicus jinggangensis
5008), NC_022785 (
S. rapamycinicus
NRRL 5491), NC_013929 (
S. scabiei
87 22), NC_018750 (
S. venezuelae
ATCC 10712), and NC_015957 (
S. violaceusniger
Tü 4113).
- Pan-Genome Calculation
For the pan-genome computation of
Streptomyces
species, PGAP ver. 1.12 was used
[31]
. Ortholog clusters were organized using the open reading frame (ORF) contents of each genome with the GF (Gene Family) method using default parameters (E-value: 1e-10, score: 40; identity: 50; coverage: 50). The pan-genome and core genome profiles were then built. Functional enrichment of ortholog clusters was performed using the PGAP program and was used for the classification of clusters of orthologous groups (COGs). Subsequent classification work was performed using an in-house script.
Results and Discussion
- The Pan-Genome of 17StreptomycesSpecies
Seventeen completely sequenced
Streptomyces
species genomes available at the NCBI FTP database (
ftp://ftp.ncbi.nih.gov/genomes/Bacteria
) were used in this study. The genomic characteristics of each species are summarized in Supplementary Table S1. All strains are reported to contain linear chromosomes. Their genome sizes range from 6.3 to 12.7 Mb with G+C contents from 70.6% to 73.3%. The number of predicted coding sequences (CDSs; 5,832–10,022) was positively correlated with their genome size.
Pan-genome analysis of the 17
Streptomyces
chromosomes revealed 34,592 ortholog clusters from 1,129,413 total genes that constituted the pan-genome. The size of the
Streptomyces
pan-genome may grow with the number of sequenced strains, and this pan-genome can therefore be considered an open pan-genome (
Fig. 1
A)
[21]
. This trend suggests that
Streptomyces
has flexible genome contents, reflecting the diversity of secondary metabolism and morphological differentiation, which is pronounced in this genus. The core genome consisted of 2,018 ortholog clusters (
Fig. 1
B and Table S2). This number is smaller than that in a previous report, which described 3,096 gene families based on five
Streptomyces
strains
[32]
. The ratio of the core genome in each species ranged from 24% to 38% and was negatively correlated with the number of ORFs. Although this number may be decreased when the analyzed genome is added, the number of core genomes would be expected to converge to a constant value, as judged from the slope of exponential decay. The number of dispensable gene families that were conserved in at least two species was 11,743, and the number of ortholog clusters of unique genes that were present in only one strain was 20,831. We called these two groups accessory genomes; these genomes are thought to contribute to the species’ diversity and generally provide functions that were not essential to viability. However, these genes may have conferred a selective advantage to Streptomycetes in their specific environmental niche.
Pan-genome analysis of Streptomyces. (A) Pan-genome and core genome profiles. The numbers of new genes in the Streptomyces pan-genome and core genome are plotted against the number of genomes added. The deduced mathematical function is also reported. (B) Venn diagram showing the number of species-specific gene families in the genome of each species. The number of core genomes is represented in the center.
- Functional Distribution of Ortholog Clusters
Next, we examined the functional classifications of ortholog clusters using the COG database (Table S3). The most abundant COG category in the pan-genome, except poorly or uncharacterized ones, was transcription (K) that included 1,945 gene families. The next abundant COGs were transport and metabolism of carbohydrates (G; 1,242) and amino acids (E; 1,046). In the core genome, the transcription category still encompassed the largest gene families (211), followed by metabolism of amino acids (192) and carbohydrates (130). The abundance of transcriptional regulators, including sigma factors, is a hallmark of Streptomycetes, consistent with their complex transcriptional regulatory networks that support morphological and physiological differentiation
[2]
.
We then investigated the proportion of each conserved group (core, dispensable, and unique genomes) to determine the numbers of genes in each category (
Fig. 2
). We found that the occurrence ratio in core genomes was high for the COG categories of translation (J) and nucleotide metabolism (F). This reveals the importance of protein and nucleic acid synthesis as the conserved core function, and the relatively lower diversity of genes within these categories. In comparison, those for secondary metabolism, defense mechanisms, carbohydrate transport/metabolism, and transcription occurred less frequently in the core genomes. Even though the absolute number of gene families in these categories is large, the finding that most of these genes reside in accessory genomes suggests that they provide functions to increase the diversity and uniqueness of the Streptomycetes.
Distribution of orthologous genes based on COG category. The bars are sorted by the proportion of core genomes in each functional category.
- The Core Genome of Streptomycetes
We further investigated the core genome to understand the conserved basic biology of Streptomycetes. In general,
Streptomyces
species contain a linear chromosome, which has a “core region” that houses the relatively conserved housekeeping genes and two “arms” that contain more divergent and horizontally transferred genes
[7]
. The terminal regions of the chromosomes are highly unstable, and unequal crossing-over between the two arms of the chromosome or between one arm of the chromosome and a linear plasmid also occurs frequently, giving rise to gross rearrangements of the chromosome
[7]
. The dynamic nature of the arms is consistent with their high genetic diversity. Therefore, a large part of the terminal region was deleted when the genome-minimized host for heterologous expression was constructed, due to the infrequent occurrence of essential genes at the region
[18]
. We confirmed that there was a high frequency of core genes at the central region of the chromosome in most species, consistent with prior knowledge (
Fig. 3
).
Proportion of the core genome according to the location in linear chromosomes. All genomes were normalized to the same size and divided into 100 sections. The plot represents the average ratio of the length of the total and core genes to each section. Error bars indicate the standard deviation of the ratio in each section.
Next, we further examined several groups of core genes in
S. coelicolor
as a reference strain. First, among the transcription-related genes that occupy 12% of the ORFs in the genome of
S. coelicolor
, we examined genes for sigma factors that bring diversity in the gene expression pattern by altering the specificity of RNA polymerase. The genome of
S. coelicolor
A3(2) is known to encode more than 60 different sigma factors
[1
,
11]
. The COG clusters for sigma factors in the
Streptomyces
core genome were assigned to 15 clusters. We found that 25 out of 65 sigma factors encoded in the
S. coelicolor
genome were included in the 15 core ortholog clusters (
Table 1
). Genes for the major housekeeping sigma factor HrdB and its paralogs HrdA, HrdC, and HrdD were clustered in a single group (cluster ID 24). This cluster comprised 3–4 genes in each analyzed species, indicating that the multiplicity of these HrdB-like sigma factors is conserved in
Streptomyces
. Ortholog cluster 4 contained the largest number of sigma factor genes (
sigB
,
sigI
,
sigN
,
sigF
,
sigH
, and
sigL
), many of which were reported to function in differentiation and response to osmotic and oxidative stresses
[30]
. Except for WhiG (cluster ID 1910), all the other 14 sigma factors are classified as group 4 or ECF-family sigma factors
[12]
. The conserved ECF sigma factors in all
Streptomyces
spp. include SigU, SigE, SigR, SigR1, SigR, BldN, and SigQ. Among these, some are known to be involved in differentiation and secondary metabolism (SigU, BldN, SigR, and SigT)
[8
,
10
,
20
,
27]
, cell wall function (SigE)
[25]
, and oxidative stress response (SigR and SigR1)
[16
,
17
,
26]
. Investigation of other conserved sigma factors is needed to unravel the conserved core functions governed by conserved alternate sigma factors.
Conserved genes in 17Streptomycesspecies.
Conserved genes in 17 Streptomyces species.
In addition, conservation of genes involved in the biosynthesis of mycothiol, the major principal thiol compound found in many actinomycetes, was investigated
[23]
. This maintains a high level of reducing environment within the cells and protects against disulfide stress. Four putative genes in this pathway,
mshA
(cluster ID 705),
mshB
(cluster ID 1092),
mshC
(cluster ID 953), and
mshD
(cluster ID 1601), were conserved in all the species that we analyzed, despite of their scattered location in the chromosome (
Table 1
). This proved that mycothiol acts as the common reducing agent in
Streptomyces
species.
We further examined the conserved core genes that are annotated to be involved in secondary metabolism. Among genes involved in secondary metabolism, more than 95% resided mostly in the accessory (dispensable and unique) genomes of
Streptomyces
(
Fig. 2
). This reflects the diversity of secondary metabolism of
Streptomyces
spp. Only 5% of genes for secondary metabolism is in the conserved core genome.
Table 2
lists 27 genes for secondary metabolism that are conserved among 17
Streptomyces
spp. They belong to seven COG clusters for secondary metabolism out of 30 clusters predicted
[22]
. Most or all of the genes in the 5-hydroxyectoin (4/4), siderophore (2/3), geosmin (1/1), and hopene (11/13) clusters were conserved throughout
Streptomyces
spp. 5-Hydroxyectoin is known to have an important role as a compatible solute in response to salt and heat stresses in the
Streptomyces
genus
[4]
. Geosmin, which is responsible for the odor of soil, is also likely to be produced in all of the strains examined in this work
[5]
. Hopene, a pentacyclic triterpene, can provide stability to bacterial membranes at high temperatures and under conditions of extreme acidity
[28]
. This study reveals that among secondary metabolites, only a handful of compounds such as hydroxyectoine, geosmin, and hopanoids are universally conserved among
Streptomyces
. More intensive investigation of the functions of these metabolites, either characterized or uncharacterized, is in need to understand their roles in the biology of Streptomycetes.
Conserved genes involved in secondary metabolism in 17Streptomycesspecies.
Conserved genes involved in secondary metabolism in 17 Streptomyces species.
The amount of bacterial genomic information has been rapidly increasing with the development of high-throughput DNA sequencing technologies. In particular, the acquisition and understanding of the genome sequences of
Streptomyces
are important for drug discovery, because these organisms are an abundant source of secondary metabolites
[29]
. In this study, we revealed the conservation of 2,018 and 32,574 gene families (COG clusters) within the core and accessary genomes, respectively, of 17 completely sequenced
Streptomyces
species using pan-genome analysis. Functional classification of ortholog clusters showed the distribution of ratios of core and accessory genomes. Furthermore, we investigated the functions of the conserved gene groups, which included the sigma factors, mycothiol biosynthesis pathway, and secondary metabolic pathways. This analysis showed that
Streptomyces
species encode many common genes involved in stress response and morphological differentiation. Compared with previous reports
[13
,
15
,
32]
, we could reduce the number of core genes using more completed genomes. Despite of the fewer number of core genes, we could find that many genes and secondary metabolite clusters that respond to stress and external stimulus were still conserved significantly. Therefore, it is concluded that adaptation or survival in various environments is one of the distinguishing characters of
Streptomyces
genus.
Elucidation of the core genome will provide insights into target selection for genome minimization during the construction of industrial strains or for metabolic engineering. Moreover, this analysis offers a basis for understanding the processes through which information from one strain is transferred to another strain. Integration of genomic information with other -omics studies, such as transcriptomics, proteomics, and metabolomics, will provide an opportunity for understanding more about the functional evolution of
Streptomyces
species.
Acknowledgements
This work was supported by the Intelligent Synthetic Biology Center of the Global Frontier Project (2011-0031957) and the Basic Core Technology Development Program for the Oceans and the Polar Regions (2011-0021053) through the National Research Foundation of Korea (NRF), which is funded by the Ministry of Science, ICT, and Future Planning.
Bentley SD
,
Chater KF
,
Cerdeno-Tarraga AM
,
Challis GL
,
Thomson NR
,
James KD
2002
Complete genome sequence of the model actinomyceteStreptomyces coelicolorA3(2).
Nature
417
141 -
147
DOI : 10.1038/417141a
Binnewies TT
,
Motro Y
,
Hallin PF
,
Lund O
,
Dunn D
,
La T
2006
Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.
Funct. Integr. Genomics
6
165 -
185
DOI : 10.1007/s10142-006-0027-2
Bursy J
,
Kuhlmann AU
,
Pittelkow M
,
Hartmann H
,
Jebbar M
,
Pierik AJ
,
Bremer E
2008
Synthesis and uptake of the compatible solutes ectoine and 5-hydroxyectoine byStreptomyces coelicolorA3(2) in response to salt and heat stresses.
Appl. Environ. Microbiol.
74
7286 -
7296
DOI : 10.1128/AEM.00768-08
Cane DE
,
Watt RM
2003
Expression and mechanistic analysis of a germacradienol synthase fromStreptomyces coelicolorimplicated in geosmin biosynthesis.
Proc. Natl. Acad. Sci. USA
100
1547 -
1551
DOI : 10.1073/pnas.0337625100
Chandra G
,
Chater KF
2014
Developmental biology ofStreptomycesfrom the perspective of 100 actinobacterial genome sequences.
FEMS Microbiol. Rev.
38
345 -
379
DOI : 10.1111/1574-6976.12047
Dyson P
2011
Streptomyces: Molecular Biology and Biotechnology.
Horizon Scientific Press
Norfolk, UK
Feng WH
,
Mao XM
,
Liu ZH
,
Li YQ
2011
The ECF sigma factor SigT regulates actinorhodin production in response to nitrogen stress inStreptomyces coelicolor.
Appl. Microbiol. Biotechnol.
92
1009 -
1021
DOI : 10.1007/s00253-011-3619-2
Flärdh K
,
Buttner MJ
2009
Streptomycesmorphogenetics: dissecting differentiation in a filamentous bacterium.
Nat. Rev. Microbiol.
7
36 -
49
DOI : 10.1038/nrmicro1968
Gehring AM
,
Yoo NJ
,
Losick R
2001
RNA polymerase sigma factor that blocks morphological differentiation byStreptomyces coelicolor.
J. Bacteriol.
183
5991 -
5996
DOI : 10.1128/JB.183.20.5991-5996.2001
Hahn MY
,
Bae JB
,
Park JH
,
Roe JH
2003
Isolation and characterization ofStreptomyces coelicolorRNA polymerase, its sigma, and antisigma factors.
Methods Enzymol
370
73 -
82
Helmann JD
2002
The extracytoplasmic function (ECF) sigma factors.
Adv. Microb. Physiol.
46
47 -
110
Hsiao NH
,
Kirby R
2007
Comparative genomics ofStreptomyces avermitilis,Streptomyces cattleya,Streptomyces maritimusandKitasatospora aureofaciensusing aStreptomyces coelicolormicroarray system.
Antonie Van Leeuwenhoek
93
1 -
25
DOI : 10.1007/s10482-007-9175-1
Ikeda H
,
Ishikawa J
,
Hanamoto A
,
Shinose M
,
Kikuchi H
,
Shiba T
2003
Complete genome sequence and comparative analysis of the industrial microorganismStreptomyces avermitilis.
Nat. Biotechnol.
21
526 -
531
DOI : 10.1038/nbt820
Jayapal KP
,
Lian W
,
Glod F
,
Sherman DH
,
Hu W-S
2007
Comparative genomic hybridizations reveal absence of largeStreptomyces coelicolorgenomic islands inStreptomyces lividans.
BMC Genomics
8
229 -
DOI : 10.1186/1471-2164-8-229
Kallifidas D
,
Thomas D
,
Doughty P
,
Paget MS
2010
The sigmaR regulon ofStreptomyces coelicolorA32 reveals a key role in protein quality control during disulphide stress.
Microbiology
156
1661 -
1672
DOI : 10.1099/mic.0.037804-0
Kim M-S
,
Dufour YS
,
Yoo JS
,
Cho Y-B
,
Park J-H
,
Nam G-B
2012
Conservation of thiol-oxidative stress responses regulated by SigR orthologues in actinomycetes.
Mol. Microbiol.
85
326 -
344
DOI : 10.1111/j.1365-2958.2012.08115.x
Komatsu M
,
Uchiyama T
,
Omura S
,
Cane DE
,
Ikeda H
2010
Genome-minimizedStreptomyceshost for the heterologous expression of secondary metabolism.
Proc. Natl. Acad. Sci. USA
107
2646 -
2651
DOI : 10.1073/pnas.0914833107
Liu G
,
Chater KF
,
Chandra G
,
Niu G
,
Tan H
2013
Molecular regulation of antibiotic biosynthesis inStreptomyces.
Microbiol. Mol. Biol. Rev.
77
112 -
143
DOI : 10.1128/MMBR.00054-12
Mao XM
,
Zhou Z
,
Cheng LY
,
Hou XP
,
Guan WJ
,
Li YQ
2009
Involvement of SigT and RstA in the differentiation ofStreptomyces coelicolor.
FEBS Lett.
583
3145 -
3150
DOI : 10.1016/j.febslet.2009.09.025
Medini D
,
Donati C
,
Tettelin H
,
Masignani V
,
Rappuoli R
2005
The microbial pan-genome.
Curr. Opin. Genet. Dev.
15
589 -
594
DOI : 10.1016/j.gde.2005.09.006
Nett M
,
Ikeda H
,
Moore BS
2009
Genomic basis for natural product biosynthetic diversity in the actinomycetes.
Nat. Prod. Rep.
26
1362 -
1384
DOI : 10.1039/b817069j
Newton GL
,
Buchmeier N
,
Fahey RC
2008
Biosynthesis and functions of mycothiol, the unique protective thiol of Actinobacteria.
Microbiol. Mol. Biol. Rev.
72
471 -
494
DOI : 10.1128/MMBR.00008-08
Ohnishi Y
,
Ishikawa J
,
Hara H
,
Suzuki H
,
Ikenoya M
,
Ikeda H
2008
Genome sequence of the streptomycinproducing microorganismStreptomyces griseusIFO 13350.
J. Bacteriol.
190
4050 -
4060
DOI : 10.1128/JB.00204-08
Paget MS
,
Chamberlin L
,
Atrih A
,
Foster SJ
,
Buttner MJ
1999
Evidence that the extracytoplasmic function sigma factor sigmaE is required for normal cell wall structure inStreptomyces coelicolorA3(2).
J. Bacteriol.
181
204 -
211
Paget MS
,
Kang JG
,
Roe JH
,
Buttner MJ
1998
sigmaR, an RNA polymerase sigma factor that modulates expression of the thioredoxin system in response to oxidative stress inStreptomyces coelicolorA3(2).
EMBO J.
17
5776 -
5782
DOI : 10.1093/emboj/17.19.5776
Paget MS
,
Molle V
,
Cohen G
,
Aharonowitz Y
,
Buttner MJ
2001
Defining the disulphide stress response inStreptomyces coelicolorA3(2): identification of the sigmaR regulon.
Mol. Microbiol.
42
1007 -
1020
DOI : 10.1046/j.1365-2958.2001.02675.x
Poralla K
,
Muth G
,
Hartner T
2000
Hopanoids are formed during transition from substrate to aerial hyphae inStreptomyces coelicolorA3(2).
FEMS Microbiol. Lett.
189
93 -
95
DOI : 10.1111/j.1574-6968.2000.tb09212.x
Rebets Y
,
Brotz E
,
Tokovenko B
,
Luzhetskyy A
2014
Actinomycetes biosynthetic potential: how to bridgein silicoandin vivo?
J. Ind. Microbiol. Biotechnol.
41
387 -
402
DOI : 10.1007/s10295-013-1352-9
Viollier PH
,
Kelemen GH
,
Dale GE
,
Nguyen KT
,
Buttner MJ
,
Thompson CJ
2003
Specialized osmotic stress response systems involve multiple SigB-like sigma factors inStreptomyces coelicolor.
Mol. Microbiol.
47
699 -
714
DOI : 10.1046/j.1365-2958.2003.03302.x
Zhou Z
,
Gu J
,
Li Y-Q
,
Wang Y
2012
Genome plasticity and systems evolution inStreptomyces.
BMC Bioinformatics.
13
(Suppl. 10)
S8 -
DOI : 10.1186/1471-2105-13-S10-S8