Advanced
Identification of Novel Cupredoxin Homologs Using Overlapped Conserved Residues Based Approach
Identification of Novel Cupredoxin Homologs Using Overlapped Conserved Residues Based Approach
Journal of Microbiology and Biotechnology. 2015. Jan, 25(1): 127-136
Copyright © 2015, The Korean Society For Microbiology And Biotechnology
  • Received : September 11, 2014
  • Accepted : September 24, 2014
  • Published : January 28, 2015
Download
PDF
e-PUB
PubReader
PPT
Export by style
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Amit Goyal
Bharat Madan
Kyu-Suk Hwang
kshwang@pusan.ac.kr
Sun-Gu Lee
kshwang@pusan.ac.kr

Abstract
Cupredoxin-like proteins are mainly copper-binding proteins that conserve a typical rigid Greek-key arrangement consisting of an eight-stranded β-sandwich, even though they share as little as 10-15% sequence similarity. The electron transport function of the Cupredoxins is critical for respiration and photosynthesis, and the proteins have therapeutic potential. Despite their crucial biological functions, the identification of the distant Cupredoxin homologs has been a difficult task due to their low sequence identity. In this study, the overlapped conserved residue (OCR) fingerprint for the Cupredoxin superfamily, which consists of conserved residues in three aspects ( i.e. , the sequence, structure, and intramolecular interaction), was used to detect the novel Cupredoxin homologs in the NCBI non-redundant protein sequence database. The OCR fingerprint could identify 54 potential Cupredoxin sequences, which were validated by scanning them against the conserved Cupredoxin motif near the Cu-binding site. This study also attempted to model the 3D structures and to predict the functions of the identified potential Cupredoxins. This study suggests that the OCR-based approach can be used efficiently to detect novel homologous proteins with low sequence identity, such as Cupredoxins.
Keywords
Introduction
Proteins in the Cupredoxin superfamily are single-domain Cupredoxins, such as amicyanin, plastocyanin, azurin, nitrocyanin, and the periplasmic domain of cytochrome c oxidase subunit II; or multi-domain Cupredoxins, such as nitrite reductase, laccase, ceruloplamin, and coagulation factor V [1 , 21 , 25 , 26 , 28] . The Cupredoxin domain consists of a β-sandwich with eight strands in the two β-sheets in a Greek-key β-barrel arrangement. The domains in the superfamily generally contain a copper-binding site, which functions as an electron transfer (ET) agent, but some domains lack the metal-binding site.
The proteins in the Cupredoxin superfamily are redox and electron transfer proteins that are critical for respiration, photosynthesis, and proper metabolism [4 , 8 , 11] . In addition, the Cupredoxin family proteins have been shown to be a potential therapeutic tool because of their ability to block or delay the invasion of diverse intracellular pathogens, such as Plasmodium falciparum , Toxoplasma , etc . [6 , 23] . Moreover, the prospects for the use of azurin in the treatment of cancer have also been shown [24] . Owing to these important biological roles and therapeutic potentials of Cupredoxin-like proteins, it is very crucial to explore and discover novel homologs in the superfamily. On the other hand, the protein sequences in the Cupredoxin superfamily are quite diverse, with sequence identities as low as 10-15% among members of the protein superfamily, which have made identification of the Cupredoxin homologs difficult [8] .
Recently, we developed a method called the “Overlapped Conserved Residue (OCR)-based approach” to identify a fingerprint for a protein superfamily [10] . The OCR fingerprint was identified by selecting the commonly conserved residues in three aspects; that is, sequence, structure, and intramolecular interaction. The OCR-based sequence patterns were demonstrated to be quite efficient in identifying the structural homologs for various folds, including the β-strands, regardless of the sequence similarity. The OCR-based approach was expected to be able to detect distant homologs, which might allow us to identify novel homologous proteins that might have specific functions in various organisms.
This study focused on the identification of novel Cupredoxin protein homologs using the OCR-based approach. In the first phase of this study, the fold detection was performed against the NCBI non-redundant (nr) protein sequence database using the OCR-based fingerprint for the Cupredoxin superfamily. Second, the sequence hits identified were analyzed, and the sequences without structural/functional annotation, which were regarded as potential Cupredoxin homologs, were isolated. Finally, the potential Cupredoxin sequences were validated by the sequence motif near the Cu-binding site of Cupredoxin to determine if the identified Cupredoxin homologous proteins actually belong to this group. Structural modelling of the identified sequences was also attempted to identify the potential function of the novel Cupredoxin sequences. Fig. 1 shows the scheme of this study to detect and characterize the distant Cupredoxin homologs.
PPT Slide
Lager Image
Research scheme to detect distant Cupredoxin homologs. The figure depicts the strategy to detect the distant Cupredoxin homologs using the OCR-based approach. In the first step, the OCR-based pattern for Cupredoxin-like protein was scanned against the NCBI nr-protein sequence database to detect the homologous protein sequences, and the novel potential Cupredoxin-like sequences were identified. In the second step, 54 identified potential Cupredoxin sequences were validated using the conserved Cupredoxin sequence motif near the active site. Finally, structure and function predictions of the identified potential Cupredoxins were attempted for further verification.
Methods
- Generation of OCR-Based Sequence Pattern and Fold Detection
The OCR-based fingerprint for Cupredoxins was identified by selecting the commonly conserved positions among the three individual sequence alignment methods; that is, multiple sequence alignment (MSA), structure-based alignment (SBA) and supersecondary structure (SSS)-based alignment [12 , 14 , 16] . Individual alignments were performed using the 10 representative protein structures/sequences for the Cupredoxin-like proteins. Details of the procedures to extract the OCR-based fingerprint and the selection criterion for the representative protein sequences are described in our previous research paper [10] . Fold detection was performed using the OCR-based fingerprint as an input to the ScanProsite server against the NCBI nr-protein sequence database [32] .
- Extraction of the Sequence Motif near the Cu-Binding Site of Cupredoxin
In the Structural Classification of Proteins (SCOP) database [22] , the Cupredoxin superfamily includes the structurally similar protein with/without the copper-binding site and are classified in seven different families. A large number of proteins in the Cupredoxin superfamily contain the Cu-binding site. Any structurally similar Cupredoxin-like proteins lacking the Cu-binding site were excluded from the dataset. A dataset of 62 protein structures belonging to the Cupredoxins superfamily was generated and is shown in Table S3. The dataset contained one representative protein for each species in a number of protein families. The protein sequences were downloaded from the Protein Data Bank (PDB) and were passed to the Clustal Omega multiple sequence alignment program [29] to generate the alignment. Local conserved positions in the multiple sequence alignment ( i.e. , consensus positions near the Cu-binding site) were obtained, which were then used to generate the conserved sequence motif near the Cu-binding site of Cupredoxin in the PROSITE format [30 , 31] .
- Cupredoxin Sequence Motif Scan against the Novel Potential Cupredoxins
The fold detection efficiencies of the newly developed Cupredoxin motif and the previously reported Cupredoxin motifs were determined by scanning the motifs against the PDB database using the ScanProsite server, as reported previously [10] .
The ScanProsite server was used to scan the 54 identified potential Cupredoxin sequences against the newly developed Cupredoxin sequence motif near the Cu-binding site. Option 3, “Submit PROTEIN sequences and PATTERNS to scan them against each other,” was used for this purpose, and a graphical view was selected as the output format [7] .
- Prediction of the Tertiary Structures and Biochemical Functions of the Identified Potential Cupredoxins
The potential Cupredoxin sequences were passed to Phyre2 ( P rotein H omology/analog Y R ecognition E ngine ver. 2.0) to predict the protein structure [13] . Structure modelling was performed using the batch processing option in expert mode, where the text file containing the identified sequences in FASTA format was used as the input. The predicted tertiary structure models were minimized further using the YASARA energy minimization server [15] . The refined models were validated using the structural analysis and verification server (SAVES), which utilizes a range of tools, including PROCHECK [17] , ERRAT [5] , and VERIFY_3D [9] . The validated protein models were submitted to the Protein Model Data Base (PMDB; http://mi.caspur.it/PMDB/ ), and the corresponding PMDB identifiers were assigned [3] .
The likely biochemical function of the protein model was predicted using the ProFucn server [18 , 19] and COFACTOR server [27] , which utilized the predicted model as an input. Furthermore, the predicted tertiary-structure models were checked manually to determine if the conserved Cu-binding site was preserved in the models.
Results
- OCR-Based Fingerprint for the Cupredoxin Superfamily
In our previous study, the fingerprint for the Cupredoxin superfamily was deduced by using the OCR-based approach and its fold detection efficiency was evaluated [10] . The overlapped conserved residues, which were conserved at three aspects ( i.e. , sequence, structure and intramolecular interaction), were obtained by selecting the commonly conserved positions from three independent sequence alignments ( i.e. , MSA, SBA, and SSS, simultaneously). The three different alignments were performed using the 10 representative structures/sequences of Cupredoxin-like proteins (shown in Table S1). The fingerprint for the Cupredoxin superfamily was composed of just 10 overlapped conserved residues, as shown in Fig. S1, and was highly efficient to detect the structural homologs when the fold detection was performed against the PDB. The fingerprint identified more than 97% of the homologous structures out of the 120 Cupredoxin-like structures deposited in the PDB with a 100% specificity.
- Detection of Novel Cupredoxin Homologs Using the OCR Fingerprint
In this study, the detection of Cupredoxin homologs was performed against the NCBI nr-protein sequence database to check the possibility that novel Cupredoxin homologous proteins can be identified using the OCR fingerprint for the Cupredoxin superfamily. The NCBI nr-protein sequence database contains non-identical sequences from the GenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF. The database contains both annotated and unannotated protein sequences, and we expected to identify Cupredoxinlike protein candidates that might not be annotated owing to the lack of any structure or sequence homology.
The fold detection against the NCBI nr-protein sequence database identified a total of 1,806 sequence hits, which was analyzed further to distinguish the sequences annotated to Cupredoxin-like proteins and unannotated sequences. The functional and/or structural annotation was provided by the NCBI database, which allowed the identified sequences to be divided in the following groups ( Fig. 1 ):
1. True Positive or Confirmed Cupredoxin-like Homologous Proteins: A sequence hit is defined as a “True Positive (TP)” hit if the sequence is annotated as a Cupredoxin-like protein in the NCBI database. A total of 1,613 TP sequences were retrieved from the database scan, which was approximately 89% of the total sequence hits. The result confirmed that the OCR-based pattern is highly sensitive (~90%) in detecting a homologous protein even when the fold detection was performed against the protein sequence database, such as the NCBI nr-protein sequence database.
2. False-Positive or Confirmed non-Cupredoxin-like proteins: A sequence hit is defined as “False Positive (FP)” if the sequence is annotated as a non-Cupredoxin protein. A total of 139 FP sequences were identified. This was only ~7% of the total sequence hits, which also suggests that the OCR-based pattern is highly specific.
3. Potential Cupredoxin-like Homologous Proteins: 54 protein sequences out of 1,806 sequence hits were retrieved as sequences that lack any functional or structural annotation. The identified 54 sequences, which are listed in Table 1 , showed no significant sequence identity with the known Cupredoxin proteins. For example, the sequences showed only 6-20% sequence identity with the representative Cupredoxin protein, Amicyanin (PDB-ID 1AAJ) (Table S2). When a BLAST search was performed against the NCBI nr-protein sequence database and PDB, no sequence or structural homologs were observed for the 54 identified protein sequences. These sequences could not be annotated using general homology search methods, presumably because of the low sequence identity with the known Cupredoxin-like proteins.
NCBI Accession ID of the 54 potential Cupredoxins identified using OCR-based pattern.
PPT Slide
Lager Image
Table lists the NCBI Accession ID of the 54 potential Cupredoxin-like protein candidates identified using the OCR-based pattern during the fold detection against the NCBI nr-protein sequence database. aNCBI Accession IDs of the 44 sequence hits, which could be obtained by the scan of the newly developed Cupredoxin sequence motif against the 54 identified Cupredoxin-like protein candidates. bNCBI Accession IDs of the 15 potential Cupredoxins that could be modelled using the Phyre2 server. cNCBI Accession IDs of 14 recently annotated Cupredoxin homologs in the NCBI database.
As mentioned above, the OCR fingerprint for the Cupredoxin superfamily was highly specific in detecting the Cupredoxin homologs when the fold detection was performed against the PDB database. The OCR fingerprint also exhibited very high sensitivity and specificity for the detection of Cupredoxin homologs against the NCBI nr-protein sequence database. The high specificity of the OCR fingerprint suggests that the 54 identified sequences might belong to the Cupredoxin superfamily. In addition, the sequence lengths of the 54 identified protein sequences were approximately 105~158 amino acids, which correlates with the Cupredoxin domain length. The identification of such new potential Cupredoxin homologs motivated us to check further the various aspects of the identified sequences at the sequence, structural, and functional levels.
- Validation of the Identified Potential Cupredoxins Using Cupredoxin Cu-Binding Motif
Protein active site motifs are unique motifs ( e.g. , a substrate-binding site or ion-binding site) that distinguish proteins with special functions. Such active site motifs are generally specific to a particular family, but can also define the large protein superfamily. Active site motifs are generally conserved in the related proteins and have been used consistently to predict homologous proteins. Cupredoxin-like proteins also possess specific motifs around the Cu-binding site, which have been used to detect and evaluate the Cupredoxin homologs. For example, sequence motifs, such as H(x)nC(x)mH for the blue copper protein, H(x) n XxxxCxxxHxxM for the purple copper protein, E(x)nCxxHxxxxH for the red copper protein [20 , 2] , and the PROSITE pattern “[GA]-x(0,2)-[YSA]-x(0,1)-[VFY]-{SEDT}-C-x(1,2)-[PG]-x(0,1)-H-x(2,4)-[MQ]” for the Type-1 copper (blue) proteins [30 , 31] , have been reported and applied to the detection of the Cupredoxin homologs.
A test was performed to determine if such conserved Cu-binding site motifs can be used to validate the 54 identified potential Cupredoxins. For this, the fold detection efficiencies of the reported motifs were evaluated by performing the fold detection against the PDB database, as listed in Table 2 . The Cu-binding site motif H-x(n)-C-x(m)-H showed good sensitivity but was too degenerative (low specificity). The motif E-x(n)-C-x(2)-H-x(4)-H exhibited poor fold detection efficiency, both in specificity and sensitivity. In the case of the motifs H-x(n)-C-x(3)-C-x(3)-H-x(2)-M and [GA]-x(0,2)-[YSA]-x(0,1)-[VFY]-{SEDT}-C-x(1,2)-[PG]-x(0,1)-H-x(2,4)-[MQ], the fold detection specificity and sensitivity were better than the other two motifs but their efficiencies were limited to approximately 55%. Overall, the efficiencies of the reported Cupredoxin motifs were too low to be used to evaluate the 54 identified potential Cupredoxins. Therefore, there was a need to find a more efficient Cupredoxin sequence motif with high fold detection specificity and sensitivity.
Evaluation of the fold detection efficiencies of various Cupredoxin sequence motifs.
PPT Slide
Lager Image
Table lists the fold detection efficiencies of the various Cupredoxin sequence motifs near the Cu-binding site. For each conserved motif, the fold detection specificity, sensitivity, and efficiency were calculated as described in our previous manuscript and are listed.
To improve the efficiency of such sequence motifs near the Cu-binding site of Cupredoxin, a dataset of 62 Cupredoxin-like proteins (Table S3) containing the Cu-binding site was developed, as described in the Methods section. The Cupredoxins that did not include the Cu-binding site were excluded. Consensus positions near the Cu-binding site were identified by the multiple sequence alignment of 62 Cupredoxin-like proteins, as shown in Fig. 2 . A range of Cupredoxin sequence motifs near the Cu-binding site were obtained based on the alignment and were tested against the PDB database for their fold detection efficiencies. The Cupredoxin motif “G-x(1)-[FY]-x(1)-[VLFY]-x-C-x(2,4)-H” exhibited the highest fold detection efficiency with ~90% specificity and sensitivity ( Table 2 ), which was used to evaluate the 54 identified potential Cupredoxins.
PPT Slide
Lager Image
Alignment of the Cupredoxin-like protein sequences in the dataset of 62 proteins. The figure shows the multiple sequence alignment of the amino acid residues near the Cu-binding site, generated using the dataset of 62 Cupredoxin-like proteins. Conserved positions are shown in a rectangle, and a consensus residue, with >70% occupancy, is shown below the sequences. Multiple sequence alignment was generated using the Clustal Omega alignment server while the ESPript 3.0 tool was used to convert the alignment into PostScript output.
The Cupredoxin sequence motif “G-x(1)-[FY]-x(1)-[VLFY]-x-C-x(2,4)-H” was scanned against the 54 potential Cupredoxin-like protein candidates identified. The ScanProsite server revealed that 44 of the 54 sequences consisted of the conserved Cupredoxin motif, which recommended that these sequences might belong to the Cupredoxin superfamily. The NCBI Accession IDs of the 44 sequences are listed in Table 1 and are marked using the superscript “ a ”.
- Tertiary-Structure Prediction of the Identified Potential Cupredoxins
Three-dimensional structural features, such as catalytic sites, metal ion-binding sites, conserved domain, etc. , determine the biochemical functions of the proteins and are more conserved than the protein sequences during the evolution process. Therefore, this study attempted to predict the tertiary structures of the novel Cupredoxin sequences to catch their potential functions. The protein fold recognition server Phyre2 was used to predict the 3D structures of the 54 identified potential Cupredoxin homologs. Phyre2 is one of the best tools for predicting the homology models for the distant protein homologs. Phyre2 generates the evolutionary fingerprints (Hidden Markov Model) for the input query sequences using similar sequences in the PSI-BLAST scan and performs HMMHMM matching with the previously generated evolutionary fingerprints of the known proteins in the PDB to select a template structure. The selected templates are used to construct homology models for the input sequence and for each predicted model, “template information”, “Confidence,” and “% Identity” are provided.
The Phyre2 server could provide the full-length 3D structural models for 15 of the 54 potential Cupredoxins. Table 1 lists the NCBI Accession IDs of the 15 potential Cupredoxins that are marked using the superscript “ b ”. The reliable tertiary structures of the remaining 39 sequences could not be modeled using the server. Table 3 lists the PDB-ID of the template, confidence score, and query-template sequence identity of the 15 successful 3D models. The confidence scores for the protein models were above 92%, which suggests that the selected structure templates are suitable for the tertiary structure predictions. Table S4 lists the structure validation results, such as the Ramachandran plot, ERRAT score, and Verify3D analysis score, for the energy minimized models. The results showed that the predicted models had good quality factors and were reliable.
Results of three-dimensional protein structure prediction using the Phyre2 server.
PPT Slide
Lager Image
Table lists the PDB-ID and the description of the protein structure used as a template to generate the protein model using the Phyre2 folds recognition server. The last two columns list the “Confidence score” and “%ID,” which represents the probability (from 0 to 100) that the match with input sequence is a true homology and a template-query percentage sequence identity, respectively.
- Function Prediction of the Identified 3D Structure Models
Biochemical functions of the three-dimensional protein models were predicted using the ProFunc and COFACTOR server. ProFunc uses sequence and structure-based homology methods to assign the functions, while COFACTOR assigns the function using the local and global structure that matches with the known proteins in the BioLiP protein function database. The servers identified the matching Cupredoxin-like folds/analogs as a 3D functional template, and the functions of the predicted matching folds were assigned to the protein models. Table 4 presents the function prediction results using the servers. For 7 of the 15 predicted protein models, namely Query 2, Query 4-6, Query 9-10, and Query 51, the servers identified the same subfamily. On the other hand, for the remaining eight protein models, the servers predicted different subfamilies but the same family; hence, these protein models must also have a similar function. Overall, all 15 protein models are suggested to consist of the Cupredoxin-like domain; that is, all the protein models may belong to the Cupredoxin superfamily.
Protein function assignment based on similar fold detected by ProFunc and COFACTOR servers.
PPT Slide
Lager Image
Table lists the protein function prediction results obtained using the ProFunc and COFACTOR server for the 15 predicted models. ProFunc results are shown using Q-score, PDB-ID, Family and Subfamily that represent the quality function of an alignment, PDB-ID of the best matching fold, and family and subfamily names, respectively. COFACTOR server results are shown using TM-score, PBD_ID, and Family and Subfamily that represent the quantitative assessment score, PDB-ID of the best matching fold, and family and subfamily names, respectively.
The Cu-ion-binding sites for the predicted models were also checked manually ( Fig. 3 ) and Table 4 lists the corresponding amino acid residues. In 12 of the 15 protein models, the Cu-ion-binding residues H-x m -C-x n -H-x o -M were conserved and located near each other in three-dimensional space. In two cases, for Queries 3 and 7, Met was replaced with Ile, and for Query 11, the first His was replaced with Lys. Overall, the results also suggest that the predicted models belong to the Cupredoxin superfamily and preserve the Cu-binding site in the structure.
PPT Slide
Lager Image
Three-dimensional protein structure model predicted by the Phyre2 server. The structure of the 15 predicted models prepared using the Pymol software. The overall β-barrel fold of the protein is shown in different colors for each model, labeled as A-K for Query sequence 1-11, L-N for Query 49-51, and O for the Query 53. For each protein model, amino acid residues forming the conserved Cu-binding site are shown as a stick model and encircled in red color.
Discussion
In the present study, the OCR-based pattern for the Cupredoxin-like protein was used to detect novel potential Cupredoxin-like protein sequences, and the identified sequences were characterized further if they possessed the Cu-ion-binding sequence pattern, Cupredoxin-like domain, and Cu-binding site. The meaning of this study is 2-fold. First, novel potential Cupredoxin-like proteins were discovered. Despite the crucial biological functions and great potential of Cupredoxin-like proteins, it has not been easy to discover their homologs because of their low sequence identity. Second, this study demonstrated that the OCR-based approach, which has been tested only against the PDB database in our previous study [10] , can be used to detect novel homologs. It is expected that the OCR-based approach may contribute to the discovery of novel homologs for various valuable proteins.
In this study, the OCR-based pattern for Cupredoxins could identify 54 of the novel potential Cupredoxin-like protein candidates. The sequence motif of the Cupredoxin Cu-binding site suggests that 44 ( i.e. , 80%) of the identified sequences possess a conserved motif. Most of the 44 sequences were presumed to be Cupredoxin-like proteins, considering the high fold detection efficiency of the motif against the PDB database for Cupredoxins. On the other hand, the other 10 sequences could not be detected by the motif, but the possibility that some Cupredoxin-like proteins are also included among the 10 sequences cannot be excluded. The active site motif for the Cupredoxin protein was designed by considering only the Cupredoxins containing the metal-binding sites. Therefore, Cupredoxins without Cu-binding residues might not be detected by the motif. The experimental characterization of their activities and structures should be conducted to confirm exactly how many sequences are Cupredoxin-like proteins in the identified sequences.
When this study was initiated, the 54 identified sequences were previously classified as un-annotated sequences in the NCBI nr-protein sequence database. While preparing this manuscript, 14 of the 54 sequences were annotated as Cupredoxin-like proteins in the NCBI sequence database ( Table 1 , denoted using “ c ” as superscript). This latest annotation also shows that the sequences detected by the OCR approach indeed include Cupredoxin-like proteins. Interestingly, the recently annotated 14 sequences coincide with the sequences that could be modeled in this study.
The Phyre2 server was used to model the structures of the 54 identified potential Cupredoxins, which successfully predicted three-dimensional structures only for 15 of the identified sequences. In fact, an attempt was made to model the structures of the 54 sequences using a range of other structural modeling tools, but the sequences that could be modeled were not so different from Phyre2. Tertiary-structure prediction of the distant protein homologs, which do not share any significant sequence identity with any known protein structure, is still a big challenge. The 54 identified sequences showed low sequence identity with the known Cupredoxin proteins, which might limit modeling of their protein structure. Although many computational modeling tools are being developed, X-ray crystallography or NMR methods might be the most efficient approach to determine the structures of such remote homologs at this moment.
Acknowledgements
This research was supported by the Basic Science Program through the National Research Foundation of Korea (NRF) funded by the Korea government (MSIP) (NRF-2012R1A2A2A01045306).
References
Adman ET 1991 Copper protein structures. Adv. Protein Chem. 42 145 - 197
Arciero DM , Pierce BS , Hendrich MP , Hooper AB 2002 Nitrosocyanin, a red cupredoxin-like protein fromNitrosomonas europaea. Biochemistry 41 1703 - 1709    DOI : 10.1021/bi015908w
Castrignanò T , De Meo PD , Cozzetto D , Talamo IG , Tramontano A 2006 The PMDB Protein Model Database. Nucleic Acids Res. 34 D306 - D309    DOI : 10.1093/nar/gkj105
Choi M , Davidson VL 2011 Cupredoxins - a study of how proteins may evolve to use metals for bioenergetic processes. Metallomics 3 140 - 151    DOI : 10.1039/c0mt00061b
Colovos C , Yeates TO 1993 Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci. 2 1511 - 1519    DOI : 10.1002/pro.5560020916
Cruz-Gallardo I , Díaz-Moreno I , Díaz-Quintana A , Donaire A , Velázquez-Campoy A , Curd RD 2013 Antimalarial activity of cupredoxins: the interaction ofPlasmodium merozoitesurface protein 119 (MSP119) and rusticyanin. J. Biol. Chem. 288 20896 - 20907    DOI : 10.1074/jbc.M113.460162
de Castro E , Sigrist CJ , Gattiker A , Bulliard V , Langendijk-Genevaux PS , Gasteiger E 2006 ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 34 W362 - W365    DOI : 10.1093/nar/gkl124
Dennison C 2005 Investigating the structure and function of cupredoxins. Coord. Chem. Rev. 249 3025 - 3054    DOI : 10.1016/j.ccr.2005.04.021
Eisenberg D , Lüthy R , Bowie JU 1997 VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol. 277 396 - 404
Goyal A , Sokalingam S , Hwang KS , Lee SG 2014 Identification of an ideal-like fingerprint for a protein fold using overlapped conserved residues based approach. Sci. Rep. 4 5643 -
Gray HB , Malmström BG , Williams RJ 2000 Copper coordination in blue proteins. J. Biol. Inorg. Chem. 5 551 - 559    DOI : 10.1007/s007750000146
Holm L , Rosenstrom P 2010 Dali server: conservation mapping in 3D. Nucleic Acids Res. 38 W545 - W549    DOI : 10.1093/nar/gkq366
Kelley LA , Sternberg MJE 2009 Protein structure prediction on the Web: a case study using the Phyre server. Nat. Protoc. 4 363 - 371    DOI : 10.1038/nprot.2009.2
Kister AE , Gelfand I 2009 Finding of residues crucial for supersecondary structure formation. Proc. Natl. Acad. Sci. USA 106 18996 - 19000    DOI : 10.1073/pnas.0909714106
Krieger E , Joo K , Lee J , Lee J , Raman S , Thompson J 2009 Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: four approaches that performed well in CASP8. Proteins 77 114 - 122    DOI : 10.1002/prot.22570
Larkin MA , Blackshields G , Brown NP , Chenna R , McGettigan PA , McWilliam H 2007 Clustal W and Clustal X version 2.0. Bioinformatics 23 2947 - 2948    DOI : 10.1093/bioinformatics/btm404
Laskowski RA , Macarthur MW , Moss DS , Thornton JM 1993 PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26 283 - 291    DOI : 10.1107/S0021889892009944
Laskowski RA , Watson JD , Thornton JM 2005 ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 33 W89 - W93    DOI : 10.1093/nar/gki414
Laskowski RA , Watson JD , Thornton JM 2005 Protein function prediction using local 3D templates. J. Mol. Biol. 351 614 - 626    DOI : 10.1016/j.jmb.2005.05.067
Lieberman RL , Arciero DM , Hooper AB , Rosenzweig AC 2001 Crystal structure of a novel red copper protein fromNitrosomonas europaea. Biochemistry 40 5674 - 5681    DOI : 10.1021/bi0102611
Murphy ME , Lindley PF , Adman ET 1997 Structural comparison of cupredoxin domains: domain recycling to construct proteins with novel functions. Protein Sci. 6 761 - 770    DOI : 10.1002/pro.5560060402
Murzin AG , Brenner SE , Hubbard T , Chothia C 1995 SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247 536 - 540
Naguleswaran A , Fialho AM , Chaudhari A , Hong CS , Chakrabarty AM , Sullivan WJ 2008 Azurin-like protein blocks invasion ofToxoplasma gondiithrough potential interactions with parasite surface antigen SAG1. Antimicrob. Agents Chemother. 52 402 - 408    DOI : 10.1128/AAC.01005-07
Punj V , Bhattacharyya S , Saint-Dic D , Vasu C , Cunningham EA , Graves J 2004 Bacterial cupredoxin azurin as an inducer of apoptosis and regression in human breast cancer. Oncogene 23 2367 - 2378    DOI : 10.1038/sj.onc.1207376
Roberts SA , Weichsel A , Grass G , Thakali K , Hazzard JT , Tollin G 2002 Crystal structure and electron transfer kinetics of CueO, a multicopper oxidase required for copper homeostasis inEscherichia coli. Proc. Natl. Acad. Sci. USA. 99 2766 - 2771    DOI : 10.1073/pnas.052710499
Roger M , Biaso F , Castelle CJ , Bauzan M , Chaspoul F , Lojou E 2014 Spectroscopic characterization of a green copper site in a single-domain cupredoxin. PLoS One 9 e98941 -    DOI : 10.1371/journal.pone.0098941
Roy A , Yang J , Zhang Y 2012 COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res. 40 W471 - W477    DOI : 10.1093/nar/gks372
Savelieff MG 2008 Experimental evidence for a link among cupredoxins: red, blue, and purple copper transformations in nitrous oxide reductase. Proc. Natl. Acad. Sci. USA 105 7919 - 7924    DOI : 10.1073/pnas.0711316105
Sievers F , Wilm A , Dineen D , Gibson TJ , Karplus K , Li W 2011 Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7 539 -    DOI : 10.1038/msb.2011.75
Sigrist CJ , Cerutti L , Hulo N , Gattiker A , Falquet L , Pagni M 2002 PROSITE: a documented database using patterns and profiles as motif descriptors. Brief. Bioinform. 3 265 - 274    DOI : 10.1093/bib/3.3.265
Sigrist CJ , de Castro E , Cerutti L , Cuche BA , Hulo N , Bridge A 2013 New and continuing developments at PROSITE. Nucleic Acids Res. 41 D344 - D347    DOI : 10.1093/nar/gks1067
Tatusova T , Ciufo S , Fedorov B , O’Neill K , Tolstoy I 2014 RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Res. 42 D553 - D559    DOI : 10.1093/nar/gkt1274