Top Down proteomics is a method of protein identification capable of identifying and quantitating unique proteoforms through the analysis of intact proteins. The name is derived from the similar approach to DNA sequencing. During mass spectrometry, intact proteoforms are typically ionized by electrospray ionization and analysed using a variety of mass analysers, including Orbitraps, Ion Cyclotrons and Time-Of-Flight. Effective fractionation is critical for sample handling before mass-spectrometry-based proteomics. Typical proteome analysis routinely involves digesting intact proteins followed by inferred protein identification using mass spectrometry (MS; Bottom Up proteomics). Top Down proteomics using mass spectrometry interrogates protein structure through measurement of a proteoform's intact mass followed by direct ion dissociation in the gas phase. Top Down proteoform analysis can also be achieved through resolution (separation) of the proteoform from all other proteoforms and then applying peptide-centric LC-MS/MS to characterise the isolated proteoform.
A single gene can code for many protein products (e.g. via alternative splicing; post-transcriptional and -translational processing) and the resulting canonical amino acid sequences (i.e. 'proteins' or more correctly Open Reading Frame (ORF) products) can be further modified by any number of post-translational modifications (PTM) or non-physiological adducts. These varied protein species or proteoforms define proteomes and are the functional entities underlying biological processes. Thus, truly comprehensive or 'deep' proteome analyses must assess proteoforms.[1][2][3]
There are two general approaches to proteome analysis - bottom up (BUP or shotgun) and top down (TDP).[1][3] The former, a peptide-centric or proteogenomic approach, infers (often with quite limited data) the identities of canonical protein sequences by correlation with existing databases, mostly derived from genome sequencing projects. In contrast, TDP can, in theory, yield comprehensive proteome analyses at the level of proteoforms provided the methods used effectively address the full breadth of species in a proteome.
Adopted from analytical chemistry, the term top down in proteomics means the separation of intact proteoforms and their subsequent identification, and is agnostic as to how that is achieved.[4][5] Currently, there are two analytical approaches that enable proteome assessments to different extents: Integrative or Integrated TDP (iTDP; routine high resolution/sensitivity two-dimensional gel electrophoresis tightly coupled with liquid chromatography and tandem mass spectrometry (2DE/LC/MS/MS)) or mass spectrometry-intensive TDP (MSi-TDP).[1][3][5] Although somewhat misleading interpretations appear in the literature implying the latter defines TDP, this is clearly a misconception when considering what is genuinely required for fully effective, comprehensive proteome analyses.
Integrated Top-Down Proteomics (iTDP)
Proteoform determination using Integrative Top Down Proteomics.
Further developed, refined, and optimized since the original report[6] of a routine 2DE separation of protein species (most often using isoelectric focusing and then SDS-PAGE), and subsequently coupled with western blotting and MS, this approach was the first to identify the range of protein species/proteoforms in a variety of samples.[7][8][9][10][11][12] Currently, the iTDP analytical approach offers the highest proteoform resolution and a routine approach to full proteome analysis (e.g., across the full breadth of species in native proteomes).[1][3][13][14][15][16][17][18][19] Spots and/or regions of interest can be excised from the gel, proteolytically digested using well-established methods, and the resulting peptides then assessed using LC/MS/MS to identify canonical amino acid sequences and their inherent PTM (i.e. an 'integration' with BUP). Integration of this sequence information with the isoelectric point (pI) and molecular weight (MW) information from 2DE thus enables definitive identification of proteoforms based on several key defining physico-chemical characteristics. In addition to highly sensitive and quantitative total proteoform detection using fluorescent stains[20]. and notably Coomassie Brilliant Blue as a near-IR dye,[14][20][21] gel staining protocols also enable the identification of broad proteoform groups containing the same PTM (e.g. phospho- and glyco-proteoforms).[14] Thus, iTDP utilizes integration of the best available approaches to enable truly comprehensive, deep proteome analyses at the critically necessary level of proteoforms. Accordingly, following critical evaluation to ensure comprehensive, quantitative analysis, new approaches can also be integrated once fully vetted (see below).
Advantages
The main advantage of the iTDP approach is the routine ability to detect the full potential range of proteoforms (e.g. degradation products, isoforms, sequence variants, PTM combinations, adducts) in native proteomes. This results from capitalizing on integration of the best available analytical approaches and continuous integration of modifications to the approach as new refinements and optimizations are established.[1][3][14][18][19]
2DE enables parallel resolution of replicate samples rather than the serial approach of BUP and MSi-TDP that can result in significant variation between LC-MS runs. This also enables combining of resolved samples (e.g., spots) from several gels if necessary to ensure high quality MS/MS identifications, even of very low abundance species.
Focusing on one select small portion of a gel-resolved proteome at a time enables full implementation of the power of MS/MS, yielding better data than the en masse, whole proteome digest BUP approach. The reduction in the number of proteoforms and thus peptides being introduced into LC/MS/MS means that higher concentrations of individual peptides can be analysed, increasing the quality of MS/MS spectra of the peptides and the likelihood of correctly localising PTM.
Both the first and second dimensions of 2DE are adaptable and easily modified to enhance proteome coverage as necessary (e.g. to focus on specific pI ranges, or to best resolve lower or higher MW species). This flexibility and adaptability further complements the additional analytical capacity enabled by excision and third electrophoretic separations of primary gel regions, as well as the subsequent deep imaging of the primary gel to expand the dynamic range of detection to even very low abundance proteoforms.[14]
Generally straightforward data analysis.
High quality iTDP analyses are fully enabled by established mid-range LC/MS systems; while advanced and/or specialized systems continue to drive throughput and/or sequence coverage, these are not essential to enabling iTDP analyses.
Western blotting after 2DE can also be used to capitalize on the availability of high-quality antibodies. Indeed, this was one of the first approaches to identify multiple variants (i.e. proteoforms) of a given protein in the same sample. Criteria to ensure the highest quality (quantitative) western blots are well-established if not always widely followed.[22][23]
The primary focus of the iTDP approach is the comprehensiveness of analyses (i.e. depth) and thus data quality, rather than high throughput. "It is not the rate or volume of data generated but rather the quality that ultimately matters".[1]
Disadvantages
The primary focus of the iTDP approach is the comprehensiveness of analyses and thus data quality, rather than high throughput. Many claim this as a drawback of the approach. With the widespread adoption of BUP since the turn of the century, a much-touted goal of proteomics has been to achieve high-throughput analyses of amino acid sequences, comparable to the throughput of genomic analyses. Critically, this seems (quantitatively) unlikely considering the vast potential speciation of protein products and thus the complexity of native proteomes.[1][2][3] A truly disruptive (as yet unidentified) technology would be required to genuinely enable quantitatively comprehensive, high-throughput proteome analyses.
2DE has been described as time-consuming or labour-intensive. Again, the issue is clearly one of analytical quality over speed. While it is true that iTDP — notably performed with full, parallel technical replicates — can take longer than a single BUP or MSi-TDP run (i.e. without parallel technical replicates), when one factors in the inherent technical aspects of those approaches (e.g. LC column optimization and packing, multiple orthogonal LC runs, effective system flushing, cleaning clogged electrospray systems, data handling/analysis), there is not a substantial difference in throughput. Furthermore, recent refinements have further optimized sample handling and increased 2DE throughput.[24][25]
It is difficult to ensure full, quantitative recovery of intact proteoforms from polyacrylamide gels, and this varies with the size of species and the PTM present. Whether attempted via passive diffusion from 'mashed' gel pieces or the use of 'dissolvable' formulations, full quantitative recovery has never been demonstrated and/or there is concern that the necessary treatments can modify the resolved native proteoforms. Thus, while recovery of fully intact proteoforms from the gel would be optimal to ensure full sequence coverage, in-gel digestion is an effective option for subsequent LC/MS/MS analyses.
Mass spectrometry-intensive TDP (MSi-TDP)
MSi-TDP (sometimes referred to as TD-MS) is a method of proteoform identification that uses a mass spectrometer to determine the mass of a species from the charge series of the resulting ions and obtain sequence information by selecting a single charge state ion for MS/MS analysis . The stated goal of MSi-TDP is to carry out proteoform analysis fully in the mass spectrometer using a variety of fragmentation methods (e.g. collision-induced dissociation, electron-capture dissociation or electron-transfer dissociation). Due to proteoform molecules taking up different numbers of H+ ions and forming multiple charge states, having multiple different proteoforms appearing in the mass spectrometer at the same time can create extremely complicated spectra that are difficult to deconvolute and analyse, while also having the potential for ion suppression that reduces signal and sensitivity.[26][27] This is most effectively overcome by separating the different proteoforms, typically by tube gel electrophoresis and subsequent reversed phase chromatography, immediately prior to ionisation, to reduce the number of proteoforms entering the instrument at a particular moment. Therefore, effective sample/proteome fractionation is critical before MSi-TDP to ensure success of analyses within the limitations of the method. Thus, in contrast to BUP, MSi-TDP interrogates proteoform structure through measurement of an intact mass followed by direct ion dissociation in the gas phase .
Advantages
Like iTDP, the main advantage of MSi-TDP is the capacity, within limits, to fully assess given proteoforms, including isotopic variants.
MSi-TDP can complement BUP approaches. Characterization of small proteins can be a significant challenge in BUP if an insufficient number of tryptic peptides are generated for analysis. MSi-TDP enables low mass protein detection, thus providing more detailed coverage of proteoforms in the lower MW range.[28][29]
Disadvantages
The most substantial disadvantage of the MSi-TDP approach is the inherent 'MW barrier' that limits routine proteoform analysis to species less that ~20-30 kDa;[1][3][18][26][27][29] indeed, there is a sharp decrease in the signal/noise ratio beyond the 20-30 kDa mass range, mainly due to the increase in the number of charge states the individual proteoform molecules can have as sequence length increases. While a handful of larger proteoforms have been successfully identified and are routinely measured in biopharma QC (although high concentrations are injected), successful fragmentation for comprehensive sequence coverage remains difficult as only a single charge state is selected for fragmentation, meaning a diluted signal yields fewer fragments. Realistically, although clearly powerful (and influential), MSi-TDP thus assesses only a minor MW-sub-proteome but cannot currently deliver routine, truly comprehensive total proteome analyses as identified species >30kDa are vanishingly few relative to even the estimated size of native proteomes. Efforts to manage the MW limitation have used the somewhat inappropriately named 'middle-down' approach, utilizing select proteases to digest larger proteoforms into manageable fragments; in effect, this is a variation of iTDP if the intact proteoform was first isolated (e.g. by gel or LC). Thus, the lack of intact proteoform fractionation methods, that are integrated with tandem MS, continues to plague substantive advances in MSi-TDP over the last 2-3 decades.[3][30]
While clearly powerful for assessing proteoforms that fall within its analytical capabilities, MSi-TDP has arguably been most successful in the analysis of the low MW sub-proteome, individual isolated proteins or simple mixtures, and isolated protein complexes having low MW components.[27][31][32][33]
Protein identification and proteoform characterization using the MSi-TDP approach can suffer from a similar dynamic range challenge as in BUP "shotgun" LC/MS/MS experiments where the same highly abundant species are repeatedly fragmented . Furthermore, ongoing issues also include:
Poor front-end chromatographic resolution of species, even following multiple sequential separation steps, resulting in co-elution of species;
The decay in signal-to-noise with increasing proteoform size due to an increase in charge states; the need for better computing infrastructure and software as data sets increase in size, containing complex spectra requiring multiple software tools for downstream analyses that can take multiple hours or longer to complete searches yet can still yield ambiguous identifications. Although MSi-TDP can be operated in relatively high throughput in order to broadly map the low MW sub-proteome, the rate of identifying new proteins is sharply reduced after initial rounds.
The effect of chemical noise stemming from various factors such as analyte clustering, multimers, or interfering species, further compounds the arduousness of intact proteofrom detection and analysis using MSi-TDP.
The requirement to remove surfactants, specifically SDS as used in Gel Elution Liquid-based Fractionation Entrapment Electrophoresis fractionation,[34][35] prior to LC means that some proteoforms will be lost due to a lack of solubility after surfactant removal.
Potential Future Directions
Affinity-based proteome analysis tools
The definition of TDP includes a requirement to identify the "protein", either as a distinct proteoform or ORF product. While this is most typically achieved using a mass spectrometer to fragment ions, from either intact proteoforms or peptides of resolved proteoforms, it is also possible to identify and quantify canonical "proteins" using affinity-based reagents, such as O-link and SomaScan which use antibodies or aptamers, respectively. The generic term "protein" is used here because it is unclear whether these reagents identify certain proteoforms or a variety of proteoforms from the same ORF product. These methods thus produce similar, yet different, information relative to each other and to proteogenomic BUP approaches using LC/MS/MS.[36][37][38][39][40] Because of the claimed (i) "depth" of these assays in terms of identifying canonical protein sequences; and (ii) apparent ability to quantify changes in the abundance of those proteins in samples that can be problematic for other proteomics technologies (e.g. plasma and serum), these technologies have become popular in studies having enormous sample numbers that are impossible to directly address by other proteomics technologies. However, the substantial lack of correlation between these technologies, as well as with other established proteomics technologies, needs to be addressed, along with fully characterizing the exact proteoforms that these reagents are identifying. This will thus also require transparent verification of the quality and selectivity of any antibodies and aptamers used. Expense must also be duly considered with such assays as they become more frequently applied in very large studies (e.g. potentially involving thousands of samples).
While other approaches are also being made commercially available (e.g. iterative mapping of peptides, fluorescent variation of Edman degradation, and nanopores) these (i) remain broadly untested outside the firms involved; (ii) yield largely, if not completely, only proteogenomic data; (iii) are thus, currently at least, quite limited in terms of any capacity for a broad assessment of proteoforms; (iv) are dependent on the quality of the affinity or other (multiple) reagents required which; (v) tends to also increase costs per assay to the consumer. Notably, while promising, nanopore sequencing is (i) still quite early in development; and (ii) remains unproven in terms of throughput and capacity to address the full range of known PTM and adducts.[41] Thus, the capacity for nanopores to quantitatively address the full breadth of a proteome remains untested. Other technical issues such as potential clogging of pores will also need to be addressed with hopefully routine solutions.
↑Ehrhart, J. C.; Duthu, A.; Ullrich, S.; Appella, E.; May, P. (November 1988). "Specific interaction between a subset of the p53 protein family and heat shock proteins hsp72/hsc73 in a human osteosarcoma cell line". Oncogene. 3 (5): 595–603. ISSN0950-9232. PMID2978869.
↑Marcus, K.; Immler, D.; Sternberger, J.; Meyer, H. E. (July 2000). "Identification of platelet proteins separated by two-dimensional gel electrophoresis and analyzed by matrix assisted laser desorption/ionization-time of flight-mass spectrometry and detection of tyrosine-phosphorylated proteins". Electrophoresis. 21 (13): 2622–2636. doi:10.1002/1522-2683(20000701)21:13<2622::AID-ELPS2622>3.0.CO;2-3. ISSN0173-0835. PMID10949139.
12Carbonara, Katrina; Padula, Matthew P.; Coorssen, Jens R. (February 2023). "Quantitative assessment confirms deep proteome analysis by integrative top-down proteomics". Electrophoresis. 44 (3–4): 472–480. doi:10.1002/elps.202200257. ISSN1522-2683. PMID36416355.
↑Gauci, Victoria J.; Padula, Matthew P.; Coorssen, Jens R. (2013-09-02). "Coomassie blue staining for high sensitivity gel-based proteomics". Journal of Proteomics. 90: 96–106. doi:10.1016/j.jprot.2013.01.027. ISSN1876-7737. PMID23428344.
↑Coorssen, Jens R.; Blank, Paul S.; Albertorio, Fernando; Bezrukov, Ludmila; Kolosova, Irina; Backlund, Peter S.; Zimmerberg, Joshua (2002-08-01). "Quantitative femto- to attomole immunodetection of regulated secretory vesicle proteins critical to exocytosis". Analytical Biochemistry. 307 (1): 54–62. doi:10.1016/s0003-2697(02)00015-5. ISSN0003-2697. PMID12137779.
↑Yang, Lamei; Shao, Qianwen; Su, Juwen; Liu, Yingchao; Chen, Liang; Ndzie Noah, Marie Louise; Li, Na; Coorssen, Jens R.; Zhan, Xianquan (2025-12-01). "What does one-dimensional gel electrophoresis-based western blotting data really mean in the reality of proteoforms?". Talanta. 295 128266. doi:10.1016/j.talanta.2025.128266. ISSN1873-3573. PMID40347635.
↑Le, Jessie; Jung, Wonhyeuk; Arbing, Mark A.; Egea, Pascal F.; Ogorzalek Loo, Rachel R.; Loo, Joseph A. (2025-10-08). "Retention and Rearrangement of Membrane Protein Complexes' Higher Order Structure by Collisionally Activated Dissociation- and Electron Capture Dissociation-Mass Spectrometry". Journal of the American Chemical Society. 147 (40): 36105–36116. Bibcode:2025JAChS.14736105L. doi:10.1021/jacs.5c05797. ISSN1520-5126. PMID40985968.
↑Rukes, Verena; Cao, Chan (August 2025). "Advancing nanopore technology toward protein identification and sequencing". Trends in Biochemical Sciences. 50 (8): 721–732. doi:10.1016/j.tibs.2025.05.005. ISSN0968-0004. PMID40533364.
This page is based on this Wikipedia article Text is available under the CC BY-SA 4.0 license; additional terms may apply. Images, videos and audio are available under their respective licenses.