Reduction of Ambiguity in Phosphorylation-site Localization in Large-scale Phosphopeptide Profiling by Data Filter using Unique Mass Class Information
Reduction of Ambiguity in Phosphorylation-site Localization in Large-scale Phosphopeptide Profiling by Data Filter using Unique Mass Class Information
Bulletin of the Korean Chemical Society. 2014. Mar, 35(3): 845-850
Copyright © 2014, Korea Chemical Society
  • Received : September 21, 2013
  • Accepted : October 01, 2013
  • Published : March 20, 2014
Export by style
Cited by
About the Authors
Inamul Hasan Madar
Seunghoon Back
Dong-Gi Mun
Hokeun Kim
Jae Hun Jung
Department of Applied Chemistry, College of Applied Science, Kyung Hee University, Yongin, Gyeonggi 446-701, Korea
Kwang Pyo Kim
Department of Applied Chemistry, College of Applied Science, Kyung Hee University, Yongin, Gyeonggi 446-701, Korea
Sang-Won Lee

The rapid development of shotgun proteomics is paving the way for extensive proteome profiling, while providing extensive information on various post translational modifications (PTMs) that occur to a proteome of interest. For example, the current phosphoproteomic methods can yield more than 10,000 phosphopeptides identified from a proteome sample. Despite these developments, it remains a challenging issue to pinpoint the true phosphorylation sites, especially when multiple sites are possible for phosphorylation in the peptides. We developed the Phospho-UMC filter, which is a simple method of localizing the site of phosphorylation using unique mass classes (UMCs) information to differentiate phosphopeptides with different phosphorylation sites and increase the confidence in phosphorylation site localization. The method was applied to large scale phosphopeptide profiling data and was demonstrated to be effective in the reducing ambiguity associated with the tandem mass spectrometric data analysis of phosphopeptides.
Protein phosphorylation plays an important role in understanding cell signaling cascade for many signal transductions in eukaryotes. 1 A third of cellular proteins are estimated to be phosphorylated at any given time during their cellular lifetime. 2 Reversible protein phosphorylation was shown to control a wide variety of biological functions and activities of cellular proteins. The deregulated phosphorylation of a protein may be implicated in a variety of human diseases and disorder. 3
Substantial efforts have been dedicated to accurate and extensive phophoproteome profiling including the developments of specific phosphoproteome enrichment methods and computational analysis methods of phosphoproteomic data. Due to the relatively low occurrence of phosphorylation, a variety of enrichment strategies coupled with mass spectrometry (MS)-based approaches have been reported. IMAC (immobilized metal afnity chromatography) is by far the most commonly applied technique for phosphopeptide enrichment and involves immobilizing trivalent metals ( i. e . Fe 3+ , Al 3+ , Ga 3+ ) for specific interaction with the phosphate groups of the phosphopeptides. Recently, a titanium dioxide (TiO 2 )-based enrichment method has shown promising results for phosphopeptide analysis. 4 Chromatographic methods, such as strong cation exchange (SCX) chromatography, were also often applied for separating phosphopeptides from nonphosphopeptides at low pH. 5
Due to the difficulties in pinpointing the exact localization of phosphorylation site, there have been several computational methods developed for predicting confidence phosphorylation site. 6 - 13 For example, AScore is a probabilitybased approach which determines the phosphorylation site based on presence and intensity of phosphorylation site specific ions in tandem mass spectrometric (MS/MS) spectra. 7 The PhosphoRS algorithm validates the peptide identified by a database search engine and calculates the probability of each possible potential phosphorylation site of a phosphopeptide using a cumulative binomial distribution. 12
We introduce a simple method for phosphorylation site localization that utilizes unique mass class (UMC) information, which is a collection of peptide MS features based on the precursor masses and liquid chromatography (LC) elution times. A peptide elutes from the separation column over a period of LC elution time and is measured multiple times during its elution. The similar monoisotopic masses (typically within 10 ppm) that are being measured sequentially on mass spectra are from a peptide and can be grouped as a unique mass class (UMC). 13 Predicting the exact phosphorylation site of a peptide based on UMC filtering has been demonstrated to be a simple way of localizing the site based on their monoisotopic mass and elution pattern. Peptides of the precursor mass that are ramified into different UMCs were shown to be considered as phosphopeptides with different phosphorylation sites.
Chemicals . Acetonitrile and methanol were purchased from J. T. Baker (Phillipsburg, NJ). Iron chloride (FeCl 3 , anhydrous), tris (hydroxymethyl) aminomethane (Tris), urea, hydrochloric acid, ethylenediaminetetraacetic acid (EDTA), and trifluoroacetic acid (TFA) were obtained from Sigma- Aldrich (St. Louis, MO). Sequence-grade modified porcine trypsin was obtained from Promega (Madison, WI). Sodium dodecyl sulfate (SDS) was obtained from USB corporation (Cleveland, OH), and dithiothreitol (DTT) was purchased from GE Healthcare Life Sciences (Uppsala, Sweden). A BCA assay kit was obtained from Thermo Fisher Scientific (Rockford, IL). All of the chemicals were analytical grade or HPLC grade.
Sample Preparation. Frozen gastric cancer tissues (normal and tumor) were pulverized, and tissue powder was dissolved with lysis buffer (4% SDS, and 0.1 M Tris-HCl, pH 7.6) using a focus sonication apparatus (S220, Covaris,US). The lysate was further lysed using a probe sonicator (CL-188, QSonica Sonicator, U.S.A). Debris was removed by centrifugation of the lysate at 16,000 × g for 5 min at 20 ℃. The supernatant was transferred to a new tube and the protein concentration was determined using BCA Protein assay. Peptide samples were prepared by a modified filter-aided sample preparation (FASP). 14 Proteins (500 μg each) were reduced in SDT buffer (4% SDS and 0.1 M DTT in 0.1 M Tris-HCl, pH 7.6) for 45 min at 37 ℃and then boiled for 10 min. After sonication for 10 min, the sample was centrifuged for 5 min at 16,000 × g. The protein solution was transferred to a membrane filter (Microcon devices, YM-30, Millipore, MA). After the membrane filter was centrifuged at 14,000 × g at 20℃ for 60 min, the concentrate was dissolved in 200 μL of 8 M urea and the device was centrifuged to remove the remaining of SDS (× 2). Subsequently, 100 μL of 50 mM iodoacetamide in 8 M urea were added to the concentrate for alkylation for 25 min at 25 ℃ in the dark. After centrifugation at 14,000 × g for 30 min, the resulting product was diluted with 200 μL of 8 M urea and concentrated again (× 4). The concentrate was washed with 100 μL of 50 mM NH 4 HCO 3 at 14,000 × g for 40 min (× 2). The protein concentrate was subjected to proteolytic digestion using trypsin (1:50 enzyme-to-protein ratio) for 1 min at 600 rpm in a thermomixer (Eppendorf) before overnight digestion without shaking at 37 ℃. After the first digestion, the second digestion was performed using trypsin (1:100 enzyme-toprotein ratio) for 6 h. The digested tryptic peptides were eluted from the filter by centrifugation at 14,000 × g for 30 min, the filter device was rinsed with 60 μL of 50 mM NH 4 HCO 3 and centrifuged at 14,000 × g for 20 min and the flow-through was mixed with the first eluent. The eluent was dried completely using a SpeedVac concentrator (Thermo, San Jose, CA). The dried peptides were stored at −80 ℃
iTRAQ Labeling of Peptides. Peptides were labeled with 4-plex iTRAQ TM reagent according to manufacturer’s instructions (AB Sciex, Foster City, CA). For 3.2 mg peptide (two 800μg normal peptides and two 800 μg tumor peptides), 8 units of 4-plex iTRAQ reagents were added to each tube as follows: 114 and 116 label to normal tissue sample and 115 and 117 label to tumor sample. Peptides were dissolved in dissolution buffer (500 mM TEAB pH 8.5) and labeled with iTRAQ reagent. After 1 h of incubation at room temperature, the unreacted reagent was hydrolyzed by adding 300μL of 0.05% TFA and further incubated for 30 min at room temperature. The contents of all labeled samples were pooled into one tube. The labeled sample was concentrated to 900 μL and used for subsequent basic RP fractionation.
Basic pH Reversed-phase Fractionation. Basic pH reversed-phase liquid chromatography was used for peptide fractionation as previously described. 15 iTRAQ labeled tryptic peptides were loaded into an analytical column (4.6 mm × 250 mm, Xbridge, C18, 5 μm) with a guard column (4.6 mm × 20 mm Xbridge, C18, 5 μm). Solvents A and B were 10 mM TEAB in water (pH 7.5) and 10 mM TEAB in 90% ACN (pH 7.5), respectively. Samples were separated with the gradient as follows: held at 100% solvent A for 10 min, from 0% to 5% solvent B in 10 min, from 5% to 35% solvent A in 60 min, from 35% to 70% in 15 min, and held at 70% for 10 min. 96 fractions were collected every 1 min along with separation time, and 96 fractions were collected along with the LC separation and concatenated into 12 fractions, as previously described (e.g. concatenations of #1, #13, #25, #37, #49, #61, #73 and #85 fractions, and so on). 15
Phosphopeptide Enrichment. IMAC experiments were performed in batch to enrich phosphopeptides from 12 sample fractions. 16 IMAC beads were prepared from Ni- NTA magnetic agarose beads (N#3611, Qiagen GmbH, Hilden, Germany). A 1500 μL aliquot of Ni-NTA beads was washed (× 3) with 1200 μL of deionized water. Beads were treated with 1200 μL of 100 mM EDTA (pH 8.0) for 30 min with end-over-end rotation (Stuart-Rotator|SB2, Bibby Scientific Limited, UK) to remove Ni 2+ ions. EDTA solution was then removed and beads were washed (× 3) with 1200 μL of deionized water. The NTA beads were incubated in 1200 μL of 10 mM aqueous FeCl 3 solution for 30 min with end-overend rotation. Fe-NTA beads were washed (× 3) with 1200 μL of DIW, and resuspended in 1200 μL of 1:1:1 ACN/ MeOH/0.01% acetic acid. IMAC beads were aliquoted into 12 microcentrifuge tubes and 12 aliquoted beads were washed with 400 μL of 80% ACN/0.1% TFA. Peptide fraction samples were resuspended in 500 μL of 80% ACN/ 0.1% TFA and mixed with the IMAC beads, respectively. 12 fraction samples were incubated for 30 minutes with endover- end rotation and then each bead (12 fractions) was washed (× 4) with 500 μL of 80% ACN/0.1% TFA. Phosphopeptides were eluted using 125 μL of 1:1 ACN/2.5% ammonia in 2 mM phosphate buffer (pH 10) after incubating for 1.5 min. All samples were acidified immediately with 10% TFA. 17 18
LC-MS/MS Experiments. Samples were dissolved in 0.1% FA and separated in a reverse-phase capillary column installed on a modified nanoACQUITY UPLC (Waters, Milford, MA) system. 19 The analytical column and the solid phase extraction (SPE) columns were prepared by packing C18 materials (3-μm diameter, 300 Å pore size, Jupiter, Phenomenex) into 100-cm-long (150-μm i.d. × 360-μm o.d) and 3-cm-long (150-μm i.d. × 360-μm o.d) fused silica capillaries, respectively, by acetonitrile slurry packing. The temperature of the analytical columns was set to 60 ℃using a semi-rigid gasline heater. The mobile-phases A (0.1% formic acid in H 2 0 and B (0.1% formics acid in acetonitrile) were used to generate a linear gradient: 220 min, 1-50% of solvent B.
A Q-Exactive mass spectrometer (Thermo Fisher Scientific) was used to acquire tandem mass spectrometric data. The eluted peptides from the LC were ionized through a home-built nano-electrospary source at an electric potential of 2.4 kV. MS survey scans (400-2000 Th) were acquired at a resolution of 70,000 (at m/z 400) with an automated gain control (AGC) target value of 1.0 × 10 6 and a maximum ion injection time of 20 ms. Up to the 10 most abundant ions with charges > 2 in the survey scan were dynamically selected with an isolation width of 1.6 Th and fragmented by higher energy collisional dissociation (HCD) with a normalized collision energy (NCE) of 30. The MS/MS scans were acquired at a resolution of 17,500 (at m/z 400) with a fixed first m/z of 100 Th under the maximum ion injection time of 60 ms.
Data Analysis. LC-MS/MS data was preprocessed by PEMMR, which was previously demonstrated to assign accurate precursor mass of the mass spectrometry data prior to a database search. 20 The resultant MS/MS data were searched against a composite protein database, which is a composite of the Uniprot-human-reference database (released May 2013; 90,219 entries), common contaminants (180 entries), the reverse complements, by MS-GF+ search engine (v9387). 21 Mass tolerance was set to 10 ppm. MS/MS searches for the proteome data sets were performed with the following parameters; semi-tryptic, maximum miscleavage number of 3, static modification on carbamidomethylation of cysteine (C, +57.0214 Da), iTRAQ labeling of peptide N-terminal (Nterm iTRAQ, +144.102063) and lysine (K, +144.102063), oxidation of methionine (M, +15.994915 Da), phosphorylation on serine, threonine and tyrosine (STY, +79.9663 Da) as dynamic modifications. Peptides within FDR 1% were used for further analyses.
Results and Discussion
Phosphorylation is a common chemical modification occurring to cellular proteins. The resultant phosphopeptides exhibit physicochemical properties that differ from those of non-phosphorylated peptides. 22 Phosphorylation led to a change in hydrophobicity of peptides, so the elution time of the phosphospeptides is changed from that of their corresponding unphosphorylated peptides. The site of phosphorylation is important in for determining the hydrophobicity (and thereby the LC elution time), especially when multiple serine/threonine/tyrosine residues or their combinations are present on a peptide. While their precursor mass remains the same for a phosphopeptide with two different sites for single phosphorylation, the two phosphopeptides of different phosphorylation sites can be differ in regard to hydrophobicity and elution time.
PPT Slide
Lager Image
(a) Overall schematic representation of Phospho-UMC filtering strategy, (b) A Virtual 2D display of LC/MS data, and (c) Expanded view of LC/MS data showing two UMCs separated by 10 holes, (d) and (e) show annotated peptide from two different UMCs.
Figure 1(a) shows the overall workflow of the phospho- UMC filter. It utilizes PE-MMR to perform UMC clusterization and to link MS/MS spectra to UMCs corresponding to their precursor ions. 20 PE-MMR uses the RAPID deisotoping algorithm to obtain monoisotopic masses of all precursor ions in MS spectra. 23 It then identifies MS features with similar precursor masses (within 10 ppm) but emerging over a period time during their LC elution and groups them into a UMC. A UMC contains information of the monoisotopic masses of all detected peptide ions of different charge states along with their MS intensities. UMC mass is also calculated by PE-MMR software, which is intensity-weighted average mass of all the detected precursor masses. Ideally, one UMC corresponds to a peptide. Subsequently, PE-MMR searches for the match of precursor mass of the MS/MS data to the UMC mass and creates links between the MS/MS data and the UMCs, while it replaces the original precursor mass of the MS/MS data with the matched UMC masses. The resultant MS/MS data were subjected to a database search by the MS-GF+ engine and the peptide identifications within FDR 1% were used and linked back to the corresponding UMCs.
While phosphopeptides can be effectively fragmented and identied by LC-MS/MS experiments, it is difficult to pinpoint the exact position of the phosphorylation site in the MS/MS spectra. The sequence information from the search result alone may not be sufcient to designate the phosphorylation site, and often requires manual inspection of each annotated MS/MS spectrum for the site-specific fragments. Because multiple MS/MS events are possible for a peptide during its elution, each UMC is often linked to multiple MS/ MS data. When analyzing phosphopeptide enriched samples, the UMC linked MS/MS spectra may result in two or more phosphopeptides having the same peptide sequence but different phosphorylation sites as they are similar in fragmentation pattern.
Figure 1(c) shows an expanded view of a virtual 2D display (Figure 1(b) ) of LC/MS data where peptide masses were plotted against their scan numbers ( i . e . elution time). The two red boxes show two UMCs with the same precursor masses but different elution times. The two UMCs are separated by 10 holes, which are the MS events or MS spectra with no matched MS features. The database search resulted in assigning the earlier UMC with SpSLSGDEEDELFK, while the later UMC was identified as SSLpSGDEEDELFK ( Figures 1(d) and 1(e) ). The two peptides have the same sequence but different sites of phosphorylation. These peptides were observed to have different elution times and their precursor masses, while they are the same, were separated into two UMCs. The difference in phosphorylation site in SpSLSGDEEDELFK and SSLpSGDEEDELFK led to change in their elution time, due to the change in their hydrophobicity, and resulted in being grouped into two different UMCs. The phospho-UMC filter in this case reports both phosphopeptides as confident identification.
PPT Slide
Lager Image
The virtual 2D display of LC/MS data with UMC # 126 shown in red box, where point A and B denotes same aminoacid sequence but different phosphorylation site appears in single UMC with corresponding annotated spectrum.
Figure 2 displays an expanded view of the virtual 2D display of the LC/MS data from the fraction no. 6. Two phosphopeptides, RPSGpTGTGPEDGRPSLGSPYGQPPR and RPpSGTGTGPEDGRPSLGSPYGQPPR, were identified by the database search and linked to the same UMC (UMC #126). RPSGpTGTGPEDGRPSLGSPYGQPPR was identified by a MS/MS spectrum at an early elution time of the peptide (point A). The MS/MS spectrum was low in intensity and lacked many sequence specific fragments. On the other hand, RPpSGTGTGPEDGRPSLGSPYGQPPR was identified from the MS/MS spectrum when its MS intensity was relatively high (point B), so the MS/MS spectrum was high in intensity, resulting in many fragments observed in the spectrum. From the comparison of the two annotated MS/MS spectra and their peptide scores (19.35 vs . 25.15), RPpSGTGTGPEDGRPSLGSPYGQPPR is a significantly better identification, and the phospho-UMC filter reports RPpSGTGTGPEDGRPSLGSPYGQPPR as the phosphopeptide corresponding to the UMC, excluding RPSGpTGTGPEDGRPSLGSPYGQPPR. The ambiguity in phosphorylation site localization due to a lack of sequencespecific fragments in the MS/MS spectra was reduced by the phospho-UMC filter, as the multiple annotated MS/MS spectra associated with the UMC are compared for the best identification.
A total of 12,607 non-redundant phosphopeptides were identified from the 12 fractions of IMAC enriched phosphopeptide samples. The phosphorylation site analysis by the phospho-UMC filter resulted in 9,850 non-redundant phosphorylation sites identified from the phosphopeptides. The number of non-redundant phosphorylation sites without using phosphor-UMC filter was 10,855, which is 1,005 more than the number when the phospho-UMC filter was used. The 1,005 phosphorylation sites were removed by phosphor- UMC filter because they were assigned to UMCs where another phosphopeptide of a different phosphorylation site is also present with higher peptide score, as shown in Figure 2.
Phosphorylation changes the physicochemical properties of a peptide, including the hydrophobicity. The hydrophobicity of a phosphopeptide is observed to be dependent on not only the number of phosphorylation sites but also the position of phosphorylation on the peptide. We have shown that the reverse-phase LC elution time of a phosphopeptide changes as the phosphorylation site changes (Figure 1 ). Therefore, two phosphopeptide with the same sequence but different phosphorylation sites can be grouped into two different UMCs. This observation led to an intriguing possibility of using UMC information to reduce the ambiguity associated with the localization of phosphorylation sites. The Phospho-UMC filter simply reports the phosphopeptide ID of the highest peptide score when multiple phosphopeptide IDs are linked to one UMC. These phosphopeptide IDs were often observed to have the same peptide sequence but different phosphorylation sites. The inaccurate localization of phosphopeptides in the database search was due to the lack of sequence-specific fragments, especially when the MS/MS spectrum was acquired at an early or late elution time, so the peptide MS intensity was weak.
When the phospho-UMC filter was applied to the analyses of phosphopeptides from LC-MS/MS experiments on 12 IMAC enriched phosphopeptide samples, it was shown that it reduces the number of phosphorylation sites significantly ( ca . 10% fewer phosphorylation sites) and increases the accuracy of phosphopeptide analyses of high throughput LC-MS/MS data.
This paper is dedicated to Professor Myung Soo Kim on the occasion of his honourable retirement. This work was financially supported by the Proteogenomic Research Program through the National Research Foundation of Korea funded by the Korean Ministry of Science, ICT & Future Planning, the Converging Research Center Program through the Ministry of Science, ICT and Future Planning, Korea (2013K000443) and a grant of the National Project for Personalized Genomic Medicine, Ministry for Health & Welfare, Republic of Korea (A111218-CP02). We also acknowledge the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF20100020209).
Villen J. , Gygi S. P. 2008 Nat. Protoc. 3 1630 - 1638
Mann M. , Ong S. E. 2002 Trends Biotechnol. 20 261 - 268
Macek B. , Mann M. , Olsen J. V. 2009 Annu. Rev. Pharmacol. Toxicol. 49 199 - 221
Ikeguchi Y. , Nakamura H. 1997 Anal. Sci. 13 479 - 483
Beausoleil S. A. , Jedrychowski M. , Schwartz D. , Elias J. E. , Villen J. , Li J. , Cohn M. A. , Cantley L. C. , Gygi S. P. 2004 Proc. Natl. Acad. Sci. USA 101 12130 - 12135
Baker P. R. , Trinidad J. C. , Chalkley R. J. 2011 Mol. Cell Proteomics 10
Beausoleil S. A. , Villen J. , Gerber S. A. , Rush J. , Gygi S. P. 2006 Nat. Biotechnol. 24 1285 - 1292
Frese C. K. , Zhou H. J. , Taus T. , Altelaar A. F. M. , Mechter K. , Heck A. J. R. , Mohammed S. 2013 J. Proteome Res. 12 1520 - 1525
Phanstiel D. H. , Brumbaugh J. , Wenger C. D. , Tian S. , Probasco M. D. , Bailey D. J. , Swaney D. L. , Tervo M. A. , Bolin J. M. , Ruotti V. , Stewart R. , Thomson J. A. , Coon J. J. 2011 Nat. Methods 8 821 - 827
Ruttenberg B. E. , Pisitkun T. , Knepper M. A. , Hoffert J. D. 2008 J. Proteome Res. 7 3054 - 3059
Savitski M. M. , Lemeer S. , Boesche M. , Lang M. , Mathieson T. , Bantscheff M. , Kuster B. 2011 Mol. Cell Proteomics 10
Taus T. , Kocher T. , Pichler P. , Paschke C. , Schmidt A. , Henrich C. , Mechtler K. 2011 J. Proteome Res. 10 5354 - 5362
Zimmer J. S. , Monroe M. E. , Qian W. J. , Smith R. D. 2006 Mass Spectrom. Rev. 25 450 - 482
Wisniewski J. R. , Zougman A. , Nagaraj N. , Mann M. 2009 Nat. Methods 6 359 - 362
Wang Y. , Yang F. , Gritsenko M. A. , Wang Y. , Clauss T. , Liu T. , Shen Y. , Monroe M. E. , Lopez-Ferrer D. , Reno T. , Moore R. J. , Klemke R. L. , Camp D. G. , Smith R. D. 2011 Proteomics 11 2019 - 2026
Ficarro S. B. , Adelmant G. , Tomar M. N. , Zhang Y. , Cheng V. J. , Marto J. A. 2009 Anal. Chem. 81 4566 - 4575
Ficarro S. B. , Zhang Y. , Carrasco-Alfonso M. J. , Garg B. , Adelmant G. , Webber J. T. , Luckey C. J. , Marto J. A. 2011 Mol. Cell Proteomics 10
Tsai C. F. , Wang Y. T. , Chen Y. R. , Lai C. Y. , Lin P. Y. , Pan K. T. , Chen J. Y. , Khoo K. H. , Chen Y. J. 2008 J. Proteome Res. 7 4058 - 4069
Lee J. H. , Hyung S. W. , Mun D. G. , Jung H. J. , Kim H. , Lee H. , Kim S. J. , Park K. S. , Moore R. J. , Smith R. D. , Lee S. W. 2012 J. Proteome Res. 11 4373 - 4381
Shin B. , Jung H. J. , Hyung S. W. , Kim H. , Lee D. , Lee C. , Yu M. H. , Lee S. W. 2008 Mol. Cell Proteomics 7 1124 - 1134
Kim S. , Mischerikow N. , Bandeira N. , Navarro J. D. , Wich L. , Mohammed S. , Heck A. J. , Pevzner P. A. 2010 Mol. Cell Proteomics 9 2840 - 2852
Mann M. , Hendrickson R. C. , Pandey A. 2001 Annu. Rev. Biochem. 70 437 - 473
Park K. , Yoon J. Y. , Lee S. , Paek E. , Park H. , Jung H. J. , Lee S. W. 2008 Anal. Chem. 80 7294 - 7303