Shotgun proteomics

Last updated

Shotgun proteomics refers to the use of bottom-up proteomics techniques in identifying proteins in complex mixtures using a combination of high performance liquid chromatography combined with mass spectrometry. [1] [2] [3] [4] [5] [6] The name is derived from shotgun sequencing of DNA which is itself named after the rapidly expanding, quasi-random firing pattern of a shotgun. The most common method of shotgun proteomics starts with the proteins in the mixture being digested and the resulting peptides are separated by liquid chromatography. Tandem mass spectrometry is then used to identify the peptides.

Contents

Targeted proteomics using SRM and data-independent acquisition methods are often considered alternatives to shotgun proteomics in the field of bottom-up proteomics. While shotgun proteomics uses data-dependent selection of precursor ions to generate fragment ion scans, the aforementioned methods use a deterministic method for acquisition of fragment ion scans.

History

Shotgun proteomics arose from the difficulties of using previous technologies to separate complex mixtures. In 1975, two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) was described by O’Farrell and Klose with the ability to resolve complex protein mixtures. [7] [8] The development of matrix-assisted laser desorption ionization (MALDI), electrospray ionization (ESI), and database searching continued to grow the field of proteomics. However these methods still had difficulty identifying and separating low-abundance proteins, aberrant proteins, and membrane proteins. Shotgun proteomics emerged as a method that could resolve even these proteins. [5]

Advantages

Shotgun proteomics allows global protein identification as well as the ability to systematically profile dynamic proteomes. [9] It also avoids the modest separation efficiency and poor mass spectral sensitivity associated with intact protein analysis. [1]

Disadvantages

The dynamic exclusion filtering that is often used in shotgun proteomics maximizes the number of identified proteins at the expense of random sampling. [10] This problem may be exacerbated by the undersampling inherent in shotgun proteomics. [11]

Agilent 1200 HPLC Agilent1200HPLC.jpg
Agilent 1200 HPLC
Quadrupole Time-Of-Flight tandem Mass Spectrometer (Q-TOF) Q-TOF.jpg
Quadrupole Time-Of-Flight tandem Mass Spectrometer (Q-TOF)

Workflow

Cells containing the protein complement desired are grown. Proteins are then extracted from the mixture and digested with a protease to produce a peptide mixture. [9] The peptide mixture is then loaded directly onto a microcapillary column and the peptides are separated by hydrophobicity and charge. As the peptides elute from the column, they are ionized and separated by m/z in the first stage of tandem mass spectrometry. The selected ions undergo collision-induced dissociation or other process to induce fragmentation. The charged fragments are separated in the second stage of tandem mass spectrometry.

The "fingerprint" of each peptide's fragmentation mass spectrum is used to identify the protein from which they derive by searching against a sequence database with commercially available software (e.g. Sequest or Mascot). [9] Examples of sequence databases are the Genpept database or the PIR database. [12] After the database search, each peptide-spectrum match (PSM) needs to be evaluated for validity. [13] This analysis allows researchers to profile various biological systems. [9]

Challenges with peptide identification

Peptides that are degenerate (shared by two or more proteins in the database) makes it difficult to unambiguously identify the protein to which they belong. Additionally, some proteome samples of vertebrates have a large number of paralogs, and alternative splicing in higher eukaryotes can result in many identical protein subsequences. [1] Moreover, many proteins are naturally (co- or post-translational) or artificially (sample preparation artefacts) modified. This further challenges the identification of the peptide sequence by means of conventional database matching approaches. Together with peptide fragmentation spectra of poor quality or high complexity (due to co-isolation or sensitivity limitations), this leaves in a conventional shotgun proteomics experiment many sequencing spectra unidentified. [14] [15] [16] [17]

Practical applications

With the human genome sequenced, the next step is the verification and functional annotation of all predicted genes and their protein products. [4] Shotgun proteomics can be used for functional classification or comparative analysis of these protein products. It can be used in projects ranging from large-scale whole proteome to focusing on a single protein family. It can be done in research labs or commercially.

Large-scale analysis

One example of this is a study by Washburn, Wolters, & Yates in which they used shotgun proteomics on the proteome of a Saccharomyces cerevisiae strain grown to mid-log phase. They were able to detect and identify 1,484 proteins as well as identify proteins rarely seen in proteome analysis, including low-abundance proteins like transcription factors and protein kinases. They were also able to identify 131 proteins with three or more predicted transmembrane domains. [2]

Protein family

Vaisar et al. uses shotgun proteomics to implicate protease inhibition and complement activation in the antiinflammatory properties of high-density lipoprotein. [18] In a study by Lee et al., higher expression level of hnRNP A2/B1 and Hsp90 were observed in human hepatoma HepG2 cells than in wild type cells. This led to a search for reported functional roles mediated in concert by both these multifunctional cellular chaperones. [19]

See also

Related Research Articles

<span class="mw-page-title-main">Proteome</span> Set of proteins that can be expressed by a genome, cell, tissue, or organism

The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. Proteomics is the study of the proteome.

<span class="mw-page-title-main">Proteomics</span> Large-scale study of proteins

Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In addition, other kinds of proteins include antibodies that protect an organism from infection, and hormones that send important signals throughout the body.

<span class="mw-page-title-main">Tandem mass spectrometry</span> Type of mass spectrometry

Tandem mass spectrometry, also known as MS/MS or MS2, is a technique in instrumental analysis where two or more stages of analysis using one or more mass analyzer are performed with an additional reaction step in between these analyses to increase their abilities to analyse chemical samples. A common use of tandem MS is the analysis of biomolecules, such as proteins and peptides.

<span class="mw-page-title-main">Peptide mass fingerprinting</span> Analytical technique for protein identification

Peptide mass fingerprinting (PMF) is an analytical technique for protein identification in which the unknown protein of interest is first cleaved into smaller peptides, whose absolute masses can be accurately measured with a mass spectrometer such as MALDI-TOF or ESI-TOF. The method was developed in 1993 by several groups independently. The peptide masses are compared to either a database containing known protein sequences or even the genome. This is achieved by using computer programs that translate the known genome of the organism into proteins, then theoretically cut the proteins into peptides, and calculate the absolute masses of the peptides from each protein. They then compare the masses of the peptides of the unknown protein to the theoretical peptide masses of each protein encoded in the genome. The results are statistically analyzed to find the best match.

Sequest is a tandem mass spectrometry data analysis program used for protein identification. Sequest identifies collections of tandem mass spectra to peptide sequences that have been generated from databases of protein sequences.

The Trans-Proteomic Pipeline (TPP) is an open-source data analysis software for proteomics developed at the Institute for Systems Biology (ISB) by the Ruedi Aebersold group under the Seattle Proteome Center. The TPP includes PeptideProphet, ProteinProphet, ASAPRatio, XPRESS and Libra.

Mascot is a software search engine that uses mass spectrometry data to identify proteins from peptide sequence databases. Mascot is widely used by research facilities around the world. Mascot uses a probabilistic scoring algorithm for protein identification that was adapted from the MOWSE algorithm. Mascot is freely available to use on the website of Matrix Science. A license is required for in-house use where more features can be incorporated.

A tandem mass tag (TMT) is a chemical label that facilitates sample multiplexing in mass spectrometry (MS)-based quantification and identification of biological macromolecules such as proteins, peptides and nucleic acids. TMT belongs to a family of reagents referred to as isobaric mass tags which are a set of molecules with the same mass, but yield reporter ions of differing mass after fragmentation. The relative ratio of the measured reporter ions represents the relative abundance of the tagged molecule, although ion suppression has a detrimental effect on accuracy. Despite these complications, TMT-based proteomics has been shown to afford higher precision than Label-free quantification. In addition to aiding in protein quantification, TMT tags can also increase the detection sensitivity of certain highly hydrophilic analytes, such as phosphopeptides, in RPLC-MS analyses.

<span class="mw-page-title-main">Protein mass spectrometry</span> Application of mass spectrometry

Protein mass spectrometry refers to the application of mass spectrometry to the study of proteins. Mass spectrometry is an important method for the accurate mass determination and characterization of proteins, and a variety of methods and instrumentations have been developed for its many uses. Its applications include the identification of proteins and their post-translational modifications, the elucidation of protein complexes, their subunits and functional interactions, as well as the global measurement of proteins in proteomics. It can also be used to localize proteins to the various organelles, and determine the interactions between different proteins as well as with membrane lipids.

<span class="mw-page-title-main">Top-down proteomics</span>

Top-down proteomics is a method of protein identification that either uses an ion trapping mass spectrometer to store an isolated protein ion for mass measurement and tandem mass spectrometry (MS/MS) analysis or other protein purification methods such as two-dimensional gel electrophoresis in conjunction with MS/MS. Top-down proteomics is capable of identifying and quantitating unique proteoforms through the analysis of intact proteins. The name is derived from the similar approach to DNA sequencing. During mass spectrometry intact proteins are typically ionized by electrospray ionization and trapped in a Fourier transform ion cyclotron resonance, quadrupole ion trap or Orbitrap mass spectrometer. Fragmentation for tandem mass spectrometry is accomplished by electron-capture dissociation or electron-transfer dissociation. Effective fractionation is critical for sample handling before mass-spectrometry-based proteomics. Proteome analysis routinely involves digesting intact proteins followed by inferred protein identification using mass spectrometry (MS). Top-down MS (non-gel) proteomics interrogates protein structure through measurement of an intact mass followed by direct ion dissociation in the gas phase.

<span class="mw-page-title-main">Bottom-up proteomics</span>

Bottom-up proteomics is a common method to identify proteins and characterize their amino acid sequences and post-translational modifications by proteolytic digestion of proteins prior to analysis by mass spectrometry. The major alternative workflow used in proteomics is called top-down proteomics where intact proteins are purified prior to digestion and/or fragmentation either within the mass spectrometer or by 2D electrophoresis. Essentially, bottom-up proteomics is a relatively simple and reliable means of determining the protein make-up of a given sample of cells, tissues, etc.

<span class="mw-page-title-main">Quantitative proteomics</span> Analytical chemistry technique

Quantitative proteomics is an analytical chemistry technique for determining the amount of proteins in a sample. The methods for protein identification are identical to those used in general proteomics, but include quantification as an additional dimension. Rather than just providing lists of proteins identified in a certain sample, quantitative proteomics yields information about the physiological differences between two biological samples. For example, this approach can be used to compare samples from healthy and diseased patients. Quantitative proteomics is mainly performed by two-dimensional gel electrophoresis (2-DE), preparative native PAGE, or mass spectrometry (MS). However, a recent developed method of quantitative dot blot (QDB) analysis is able to measure both the absolute and relative quantity of an individual proteins in the sample in high throughput format, thus open a new direction for proteomic research. In contrast to 2-DE, which requires MS for the downstream protein identification, MS technology can identify and quantify the changes.

John R. Yates III is an American chemist and Ernest W. Hahn Professor in the Departments of Molecular Medicine and Neurobiology at The Scripps Research Institute in La Jolla, California.

<span class="mw-page-title-main">Isobaric labeling</span>

Isobaric labeling is a mass spectrometry strategy used in quantitative proteomics. Peptides or proteins are labeled with chemical groups that have identical mass (isobaric), but vary in terms of distribution of heavy isotopes in their structure. These tags, commonly referred to as tandem mass tags, are designed so that the mass tag is cleaved at a specific linker region upon high-energy CID (HCD) during tandem mass spectrometry yielding reporter ions of different masses. The most common isobaric tags are amine-reactive tags. However, tags that react with cysteine residues and carbonyl groups have also been described. These amine-reactive groups go through N-hydroxysuccinimide (NHS) reactions, which are based around three types of functional groups. Isobaric labeling methods include tandem mass tags (TMT), isobaric tags for relative and absolute quantification (iTRAQ), mass differential tags for absolute and relative quantification, and dimethyl labeling. TMTs and iTRAQ methods are most common and developed of these methods. Tandem mass tags have a mass reporter region, a cleavable linker region, a mass normalization region, and a protein reactive group and have the same total mass.

A peptide spectral library is a curated, annotated and non-redundant collection/database of LC-MS/MS peptide spectra. One essential utility of a peptide spectral library is to serve as consensus templates supporting the identification of peptide/proteins based on the correlation between the templates with experimental spectra.

<span class="mw-page-title-main">Proteogenomics</span>

Proteogenomics is a field of biological research that utilizes a combination of proteomics, genomics, and transcriptomics to aid in the discovery and identification of peptides. Proteogenomics is used to identify new peptides by comparing MS/MS spectra against a protein database that has been derived from genomic and transcriptomic information. Proteogenomics often refers to studies that use proteomic information, often derived from mass spectrometry, to improve gene annotations. The utilization of both proteomics and genomics data alongside advances in the availability and power of spectrographic and chromatographic technology led to the emergence of proteogenomics as its own field in 2004.

In bio-informatics, a peptide-mass fingerprint or peptide-mass map is a mass spectrum of a mixture of peptides that comes from a digested protein being analyzed. The mass spectrum serves as a fingerprint in the sense that it is a pattern that can serve to identify the protein. The method for forming a peptide-mass fingerprint, developed in 1993, consists of isolating a protein, breaking it down into individual peptides, and determining the masses of the peptides through some form of mass spectrometry. Once formed, a peptide-mass fingerprint can be used to search in databases for related protein or even genomic sequences, making it a powerful tool for annotation of protein-coding genes.

MassMatrix is a mass spectrometry data analysis software that uses a statistical model to achieve increased mass accuracy over other database search algorithms. This search engine is set apart from others dues to its ability to provide extremely efficient judgement between true and false positives for high mass accuracy data that has been obtained from present day mass spectrometer instruments. It is useful for identifying disulphide bonds in tandem mass spectrometry data. This search engine is set apart from others due to its ability to provide extremely efficient judgement between true and false positives for high mass accuracy data that has been obtained from present day mass spectrometer instruments.

<span class="mw-page-title-main">Ancient protein</span>

Ancient proteins are complex mixtures and the term palaeoproteomics is used to characterise the study of proteomes in the past. Ancients proteins have been recovered from a wide range of archaeological materials, including bones, teeth, eggshells, leathers, parchments, ceramics, painting binders and well-preserved soft tissues like gut intestines. These preserved proteins have provided valuable information about taxonomic identification, evolution history (phylogeny), diet, health, disease, technology and social dynamics in the past.

References

  1. 1 2 3 Alves P, Arnold RJ, Novotny MV, Radivojac P, Reilly JP, Tang H (2007). "Advancement in protein inference from shotgun proteomics using peptide detectability". Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing: 409–20. PMID   17990506.
  2. 1 2 Washburn MP, Wolters D, Yates JR (March 2001). "Large-scale analysis of the yeast proteome by multidimensional protein identification technology". Nature Biotechnology. 19 (3): 242–7. doi:10.1038/85686. PMID   11231557. S2CID   16796135.
  3. Wolters DA, Washburn MP, Yates JR (December 2001). "An automated multidimensional protein identification technology for shotgun proteomics". Analytical Chemistry. 73 (23): 5683–90. doi:10.1021/ac010617e. PMID   11774908.
  4. 1 2 Hu L, Ye M, Jiang X, Feng S, Zou H (August 2007). "Advances in hyphenated analytical techniques for shotgun proteome and peptidome analysis--a review". Analytica Chimica Acta. 598 (2): 193–204. doi:10.1016/j.aca.2007.07.046. PMID   17719892.
  5. 1 2 Fournier ML, Gilmore JM, Martin-Brown SA, Washburn MP (August 2007). "Multidimensional separations-based shotgun proteomics". Chemical Reviews. 107 (8): 3654–86. doi:10.1021/cr068279a. PMID   17649983.
  6. Nesvizhskii AI (2007). "Protein identification by tandem mass spectrometry and sequence database searching". Mass Spectrometry Data Analysis in Proteomics. Methods Mol. Biol. Vol. 367. pp. 87–119. doi:10.1385/1-59745-275-0:87. ISBN   978-1-59745-275-5. PMID   17185772.
  7. O'Farrell PH (May 1975). "High resolution two-dimensional electrophoresis of proteins". The Journal of Biological Chemistry. 250 (10): 4007–21. doi: 10.1016/S0021-9258(19)41496-8 . PMC   2874754 . PMID   236308.
  8. Klose J (1975). "Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues. A novel approach to testing for induced point mutations in mammals". Humangenetik. 26 (3): 231–43. doi:10.1007/bf00281458. PMID   1093965. S2CID   30981877.
  9. 1 2 3 4 Wu CC, MacCoss MJ (June 2002). "Shotgun proteomics: tools for the analysis of complex biological systems". Current Opinion in Molecular Therapeutics. 4 (3): 242–50. PMID   12139310.
  10. Zhang B, VerBerkmoes NC, Langston MA, Uberbacher E, Hettich RL, Samatova NF (November 2006). "Detecting differential and correlated protein expression in label-free shotgun proteomics". Journal of Proteome Research. 5 (11): 2909–18. doi:10.1021/pr0600273. PMID   17081042. S2CID   22254554.
  11. Tolmachev AV, Monroe ME, Purvine SO, Moore RJ, Jaitly N, Adkins JN, et al. (November 2008). "Characterization of strategies for obtaining confident identifications in bottom-up proteomics measurements using hybrid FTMS instruments". Analytical Chemistry. 80 (22): 8514–25. doi:10.1021/ac801376g. PMC   2692492 . PMID   18855412.
  12. Eng JK, McCormack AL, Yates JR (November 1994). "An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database". Journal of the American Society for Mass Spectrometry. 5 (11): 976–89. doi: 10.1016/1044-0305(94)80016-2 . PMID   24226387. S2CID   18413192.
  13. Cerqueira FR, Ferreira RS, Oliveira AP, Gomes AP, Ramos HJ, Graber A, Baumgartner C (1 January 2012). "MUMAL: multivariate analysis in shotgun proteomics using machine learning techniques". BMC Genomics. 13 Suppl 5: S4. doi:10.1186/1471-2164-13-S5-S4. PMC   3477001 . PMID   23095859.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  14. Griss J, Perez-Riverol Y, Lewis S, Tabb DL, Dianes JA, Del-Toro N, et al. (August 2016). "Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets". Nature Methods. 13 (8): 651–656. doi:10.1038/nmeth.3902. PMC   4968634 . PMID   27493588.
  15. den Ridder M, Daran-Lapujade P, Pabst M (February 2020). "Shot-gun proteomics: why thousands of unidentified signals matter". FEMS Yeast Research. 20 (1). doi: 10.1093/femsyr/foz088 . PMID   31860055.
  16. Michalski A, Cox J, Mann M (April 2011). "More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC-MS/MS". Journal of Proteome Research. 10 (4): 1785–93. doi:10.1021/pr101060v. PMID   21309581.
  17. Devabhaktuni A, Lin S, Zhang L, Swaminathan K, Gonzalez CG, Olsson N, et al. (April 2019). "TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets". Nature Biotechnology. 37 (4): 469–479. doi:10.1038/s41587-019-0067-5. PMC   6447449 . PMID   30936560.
  18. Vaisar T, Pennathur S, Green PS, Gharib SA, Hoofnagle AN, Cheung MC, et al. (March 2007). "Shotgun proteomics implicates protease inhibition and complement activation in the antiinflammatory properties of HDL". The Journal of Clinical Investigation. 117 (3): 746–56. doi:10.1172/JCI26206. PMC   1804352 . PMID   17332893.
  19. Lee CL, Hsiao HH, Lin CW, Wu SP, Huang SY, Wu CY, et al. (December 2003). "Strategic shotgun proteomics approach for efficient construction of an expression map of targeted protein families in hepatoma cell lines". Proteomics. 3 (12): 2472–86. doi: 10.1002/pmic.200300586 . PMID   14673797. S2CID   24518852.

Further reading