Peptide sequence tag

Last updated

A peptide sequence tag is a piece of information about a peptide obtained by tandem mass spectrometry that can be used to identify this peptide in a protein database. [1] [2] [3]

Contents

Mass spectrometry

In general, peptides can be identified by fragmenting them in a mass spectrometer. For example, during collision-induced dissociation peptides collide with a gas within the mass spectrometer and break into pieces at their peptide bonds. The resulting fragment ions (called b-ions and y-ions) have mass differences corresponding to the residue masses of the respective amino acids. Thus, a tandem mass spectrum contains partial information about the amino acid sequence of the peptide. The peptide sequence tag approach, developed by Matthias Wilm and Matthias Mann at the EMBL, [4] uses this information to identify the peptide in a database. Briefly, a couple of masses are extracted from the spectrum in order to obtain the peptide sequence tag. This peptide sequence tag is a unique identifier of a specific peptide and can be used to find it in a database containing all possible peptide sequences.

Peptide fragment notation

Peptide fragmentation notation using the scheme of Roepstorff and Fohlman (1984). Peptide fragmentation.gif
Peptide fragmentation notation using the scheme of Roepstorff and Fohlman (1984).

A notation has been developed for indicating peptide fragments that arise from a tandem mass spectrum. [5] Peptide fragment ions are indicated by a, b, or c if the charge is retained on the N-terminus and by x, y or z if the charge is maintained on the C-terminus. The subscript indicates the number of amino acid residues in the fragment. Prime symbols indicate the number of protons or hydrogens added to the fragment to form the observed ion. For example, y'' denotes the singly charged ion analogous to a protonated peptide, (y''')2+ is a doubly charged ion analogous to a doubly protonated peptide. [6]

See also

Related Research Articles

Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are typically presented as a mass spectrum, a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used in many different fields and is applied to pure samples as well as complex mixtures.

Tandem mass spectrometry

Tandem mass spectrometry, also known as MS/MS or MS2, is a technique in instrumental analysis where two or more mass analyzers are coupled together using an additional reaction step to increase their abilities to analyse chemical samples. A common use of tandem MS is the analysis of biomolecules, such as proteins and peptides.

Protein sequencing

Protein sequencing is the practical process of determining the amino acid sequence of all or part of a protein or peptide. This may serve to identify the protein or characterize its post-translational modifications. Typically, partial sequencing of a protein provides sufficient information to identify it with reference to databases of protein sequences derived from the conceptual translation of genes.

Peptide mass fingerprinting

Peptide mass fingerprinting (PMF) is an analytical technique for protein identification in which the unknown protein of interest is first cleaved into smaller peptides, whose absolute masses can be accurately measured with a mass spectrometer such as MALDI-TOF or ESI-TOF. The method was developed in 1993 by several groups independently. The peptide masses are compared to either a database containing known protein sequences or even the genome. This is achieved by using computer programs that translate the known genome of the organism into proteins, then theoretically cut the proteins into peptides, and calculate the absolute masses of the peptides from each protein. They then compare the masses of the peptides of the unknown protein to the theoretical peptide masses of each protein encoded in the genome. The results are statistically analyzed to find the best match.

Matrix-assisted laser desorption/ionization

In mass spectrometry, matrix-assisted laser desorption/ionization (MALDI) is an ionization technique that uses a laser energy absorbing matrix to create ions from large molecules with minimal fragmentation. It has been applied to the analysis of biomolecules and various organic molecules, which tend to be fragile and fragment when ionized by more conventional ionization methods. It is similar in character to electrospray ionization (ESI) in that both techniques are relatively soft ways of obtaining ions of large molecules in the gas phase, though MALDI typically produces far fewer multi-charged ions.

Electron-capture dissociation

Electron-capture dissociation (ECD) is a method of fragmenting gas-phase ions for structure elucidation of peptides and proteins in tandem mass spectrometry. It is one of the most widely used techniques for activation and dissociation of mass selected precursor ion in MS/MS. It involves the direct introduction of low-energy electrons to trapped gas-phase ions.

SEQUEST is a tandem mass spectrometry data analysis program used for protein identification. Sequest identifies collections of tandem mass spectra to peptide sequences that have been generated from databases of protein sequences.

Hydrogen–deuterium exchange is a chemical reaction in which a covalently bonded hydrogen atom is replaced by a deuterium atom, or vice versa. It can be applied most easily to exchangeable protons and deuterons, where such a transformation occurs in the presence of a suitable deuterium source, without any catalyst. The use of acid, base or metal catalysts, coupled with conditions of increased temperature and pressure, can facilitate the exchange of non-exchangeable hydrogen atoms, so long as the substrate is robust to the conditions and reagents employed. This often results in perdeuteration: hydrogen-deuterium exchange of all non-exchangeable hydrogen atoms in a molecule.

Genome-based peptide fingerprint scanning (GFS) is a system in bioinformatics analysis that attempts to identify the genomic origin of sample proteins by scanning their peptide-mass fingerprint against the theoretical translation and proteolytic digest of an entire genome. This method is an improvement from previous methods because it compares the peptide fingerprints to an entire genome instead of comparing it to an already annotated genome. This improvement has the potential to improve genome annotation and identify proteins with incorrect or missing annotations.

Mascot is a software search engine that uses mass spectrometry data to identify proteins from peptide sequence databases. Mascot is widely used by research facilities around the world. Mascot uses a probabilistic scoring algorithm for protein identification that was adapted from the MOWSE algorithm. Mascot is freely available to use on the website of Matrix Science. A license is required for in-house use where more features can be incorporated.

Electron-transfer dissociation

Electron-transfer dissociation (ETD) is a method of fragmenting multiply-charged gaseous macromolecules in a mass spectrometer between the stages of tandem mass spectrometry (MS/MS). Similar to electron-capture dissociation, ETD induces fragmentation of large, multiply-charged cations by transferring electrons to them. ETD is used extensively with polymers and biological molecules such as proteins and peptides for sequence analysis. Transferring an electron causes peptide backbone cleavage into c- and z-ions while leaving labile post translational modifications (PTM) intact. The technique only works well for higher charge state peptide or polymer ions (z>2). However, relative to collision-induced dissociation (CID), ETD is advantageous for the fragmentation of longer peptides or even entire proteins. This makes the technique important for top-down proteomics. The method was developed by Hunt and coworkers at the University of Virginia.

Protein mass spectrometry

Protein mass spectrometry refers to the application of mass spectrometry to the study of proteins. Mass spectrometry is an important method for the accurate mass determination and characterization of proteins, and a variety of methods and instrumentations have been developed for its many uses. Its applications include the identification of proteins and their post-translational modifications, the elucidation of protein complexes, their subunits and functional interactions, as well as the global measurement of proteins in proteomics. It can also be used to localize proteins to the various organelles, and determine the interactions between different proteins as well as with membrane lipids.

Shotgun proteomics refers to the use of bottom-up proteomics techniques in identifying proteins in complex mixtures using a combination of high performance liquid chromatography combined with mass spectrometry. The name is derived from shotgun sequencing of DNA which is itself named after the rapidly expanding, quasi-random firing pattern of a shotgun. The most common method of shotgun proteomics starts with the proteins in the mixture being digested and the resulting peptides are separated by liquid chromatography. Tandem mass spectrometry is then used to identify the peptides.

Top-down proteomics

Top-down proteomics is a method of protein identification that either uses an ion trapping mass spectrometer to store an isolated protein ion for mass measurement and tandem mass spectrometry (MS/MS) analysis or other protein purification methods such as two-dimensional gel electrophoresis in conjunction with MS/MS. Top-down proteomics is capable of identifying and quantitating unique proteoforms through the analysis of intact proteins. The name is derived from the similar approach to DNA sequencing. During mass spectrometry intact proteins are typically ionized by electrospray ionization and trapped in a Fourier transform ion cyclotron resonance, quadrupole ion trap or Orbitrap mass spectrometer. Fragmentation for tandem mass spectrometry is accomplished by electron-capture dissociation or electron-transfer dissociation. Effective fractionation is critical for sample handling before mass-spectrometry-based proteomics. Proteome analysis routinely involves digesting intact proteins followed by inferred protein identification using mass spectrometry (MS). Top-down MS (non-gel) proteomics interrogates protein structure through measurement of an intact mass followed by direct ion dissociation in the gas phase.

Bottom-up proteomics

Bottom-up proteomics is a common method to identify proteins and characterize their amino acid sequences and post-translational modifications by proteolytic digestion of proteins prior to analysis by mass spectrometry. The major alternative workflow used in proteomics is called top-down proteomics where intact proteins are purified prior to digestion and/or fragmentation either within the mass spectrometer or by 2D electrophoresis. Essentially, bottom-up proteomics is a relatively simple and reliable means of determining the protein make-up of a given sample of cells, tissues, etc.

Quantitative proteomics

Quantitative proteomics is an analytical chemistry technique for determining the amount of proteins in a sample. The methods for protein identification are identical to those used in general proteomics, but include quantification as an additional dimension. Rather than just providing lists of proteins identified in a certain sample, quantitative proteomics yields information about the physiological differences between two biological samples. For example, this approach can be used to compare samples from healthy and diseased patients. Quantitative proteomics is mainly performed by two-dimensional gel electrophoresis (2-DE) or mass spectrometry (MS). However, a recent developed method of quantitative dot blot (QDB) analysis is able to measure both the absolute and relative quantity of an individual proteins in the sample in high throughput format, thus open a new direction for proteomic research. In contrast to 2-DE, which requires MS for the downstream protein identification, MS technology can identify and quantify the changes.

Isobaric labeling

Isobaric labeling is a mass spectrometry strategy used in quantitative proteomics. Peptides or proteins are labeled with various chemical groups that are identical masses (isobaric), but vary in terms of distribution of heavy isotopes around their structure. These tags, commonly referred to as tandem mass tags, are designed so that the mass tag is cleaved at a specific linker region upon high-energy CID (HCD) during tandem mass spectrometry yielding reporter ions of different masses. The most common isobaric tags are amine-reactive tags. However, tags that react with cysteine residues and carbonyl groups have also been described. These amine-reactive groups go through N-hydroxysuccinimide (NHS) reactions, which are based around three types of functional groups. Isobaric labeling methods include tandem mass tags (TMT), isobaric tags for relative and absolute quantification (iTRAQ), mass differential tags for absolute and relative quantification, and dimethyl labeling. TMTs and iTRAQ methods are most common and developed of these methods. Tandem mass tags have a mass reporter region, a cleavable linker region, a mass normalization region, and a protein reactive group and have the same total mass.

Collision-induced dissociation Mass spectrometry technique to induce fragmentation of selected ions in the gas phase

Collision-induced dissociation (CID), also known as collisionally activated dissociation (CAD), is a mass spectrometry technique to induce fragmentation of selected ions in the gas phase. The selected ions are usually accelerated by applying an electrical potential to increase the ion kinetic energy and then allowed to collide with neutral molecules. In the collision some of the kinetic energy is converted into internal energy which results in bond breakage and the fragmentation of the molecular ion into smaller fragments. These fragment ions can then be analyzed by tandem mass spectrometry.

In bio-informatics, a peptide-mass fingerprint or peptide-mass map is a mass spectrum of a mixture of peptides that comes from a digested protein being analyzed. The mass spectrum serves as a fingerprint in the sense that it is a pattern that can serve to identify the protein. The method for forming a peptide-mass fingerprint, developed in 1993, consists of isolating a protein, breaking it down into individual peptides, and determining the masses of the peptides through some form of mass spectrometry. Once formed, a peptide-mass fingerprint can be used to search in databases for related protein or even genomic sequences, making it a powerful tool for annotation of protein-coding genes.

In mass spectrometry, de novo peptide sequencing is the method in which a peptide amino acid sequence is determined from tandem mass spectrometry.

References

  1. Hardouin J (2007). "Protein sequence information by matrix-assisted laser desorption/ionization in-source decay mass spectrometry". Mass Spectrometry Reviews. 26 (5): 672–82. Bibcode:2007MSRv...26..672H. doi:10.1002/mas.20142. PMID   17492750.
  2. Shadforth I, Crowther D, Bessant C (2005). "Protein and peptide identification algorithms using MS for use in high-throughput, automated pipelines". Proteomics. 5 (16): 4082–95. doi:10.1002/pmic.200402091. PMID   16196103. S2CID   38068737.
  3. Mørtz E, O'Connor PB, Roepstorff P, Kelleher NL, Wood TD, McLafferty FW, Mann M (1996). "Sequence tag identification of intact proteins by matching tanden mass spectral data against sequence data bases". Proc. Natl. Acad. Sci. U.S.A. 93 (16): 8264–7. Bibcode:1996PNAS...93.8264M. doi: 10.1073/pnas.93.16.8264 . PMC   38658 . PMID   8710858.
  4. Mann M, Wilm M (1994). "Error-tolerant identification of peptides in sequence databases by peptide sequence tags". Anal. Chem. 66 (24): 4390–9. doi:10.1021/ac00096a002. PMID   7847635.
  5. 1 2 Roepstorff P, Fohlman J (1984). "Proposal for a common nomenclature for sequence ions in mass spectra of peptides". Biomed. Mass Spectrom. 11 (11): 601. doi:10.1002/bms.1200111109. PMID   6525415.
  6. Tang XJ, Thibault P, Boyd RK (October 1993). "Fragmentation reactions of multiply protonated peptides and implications for sequencing by tandem mass spectrometry with low-energy collision-induced dissociation". Anal. Chem. 65 (20): 2824–34. doi:10.1021/ac00068a020. PMID   7504416.