In molecular biology, proteins are generally thought to adopt unique structures determined by their amino acid sequences. However, proteins are not strictly static objects, but rather populate ensembles of (sometimes similar) conformations. Transitions between these states occur on a variety of length scales (tenths of angstroms to nm) and time scales (ns to s), and have been linked to functionally relevant phenomena such as allosteric signaling [1] and enzyme catalysis. [2]
The study of protein dynamics is most directly concerned with the transitions between these states, but can also involve the nature and equilibrium populations of the states themselves. These two perspectives—kinetics and thermodynamics, respectively—can be conceptually synthesized in an "energy landscape" paradigm: [3] highly populated states and the kinetics of transitions between them can be described by the depths of energy wells and the heights of energy barriers, respectively.
Portions of protein structures often deviate from the equilibrium state. Some such excursions are harmonic, such as stochastic fluctuations of chemical bonds and bond angles. Others are anharmonic, such as sidechains that jump between separate discrete energy minima, or rotamers. [4]
Evidence for local flexibility is often obtained from NMR spectroscopy. Flexible and potentially disordered regions of a protein can be detected using the random coil index. Flexibility in folded proteins can be identified by analyzing the spin relaxation of individual atoms in the protein. Flexibility can also be observed in very high-resolution electron density maps produced by X-ray crystallography, [5] particularly when diffraction data is collected at room temperature instead of the traditional cryogenic temperature (typically near 100 K). [6] Information on the frequency distribution and dynamics of local protein flexibility can be obtained using Raman and optical Kerr-effect spectroscopy [7] as well as anisotropic microspectroscopy [8] in the terahertz frequency domain.
Many residues are in close spatial proximity in protein structures. This is true for most residues that are contiguous in the primary sequence, but also for many that are distal in sequence yet are brought into contact in the final folded structure. Because of this proximity, these residue's energy landscapes become coupled based on various biophysical phenomena such as hydrogen bonds, ionic bonds, and van der Waals interactions (see figure).
Transitions between states for such sets of residues therefore become correlated. [9]
This is perhaps most obvious for surface-exposed loops, which often shift collectively to adopt different conformations in different crystal structures (see figure). However, coupled conformational heterogeneity is also sometimes evident in secondary structure. [10] For example, consecutive residues and residues offset by 4 in the primary sequence often interact in α helices. Also, residues offset by 2 in the primary sequence point their sidechains toward the same face of β sheets and are close enough to interact sterically, as are residues on adjacent strands of the same β sheet. Some of these conformational changes are induced by post-translational modifications in protein structure, such as phosphorylation and methylation. [10] [11]
When these coupled residues form pathways linking functionally important parts of a protein, they may participate in allosteric signaling. For example, when a molecule of oxygen binds to one subunit of the hemoglobin tetramer, that information is allosterically propagated to the other three subunits, thereby enhancing their affinity for oxygen. In this case, the coupled flexibility in hemoglobin allows for cooperative oxygen binding, which is physiologically useful because it allows rapid oxygen loading in lung tissue and rapid oxygen unloading in oxygen-deprived tissues (e.g. muscle).
The presence of multiple domains in proteins gives rise to a great deal of flexibility and mobility, leading to protein domain dynamics. [1] Domain motions can be inferred by comparing different structures of a protein (as in Database of Molecular Motions), or they can be directly observed using spectra [12] [13] measured by neutron spin echo spectroscopy. They can also be suggested by sampling in extensive molecular dynamics trajectories [14] and principal component analysis. [15] Domain motions are important for:
One of the largest observed domain motions is the 'swivelling' mechanism in pyruvate phosphate dikinase. The phosphoinositide domain swivels between two states in order to bring a phosphate group from the active site of the nucleotide binding domain to that of the phosphoenolpyruvate/pyruvate domain. [23] The phosphate group is moved over a distance of 45 Å involving a domain motion of about 100 degrees around a single residue. In enzymes, the closure of one domain onto another captures a substrate by an induced fit, allowing the reaction to take place in a controlled way. A detailed analysis by Gerstein led to the classification of two basic types of domain motion; hinge and shear. [20] Only a relatively small portion of the chain, namely the inter-domain linker and side chains undergo significant conformational changes upon domain rearrangement. [24]
A study by Hayward [25] found that the termini of α-helices and β-sheets form hinges in a large number of cases. Many hinges were found to involve two secondary structure elements acting like hinges of a door, allowing an opening and closing motion to occur. This can arise when two neighbouring strands within a β-sheet situated in one domain, diverge apart as they join the other domain. The two resulting termini then form the bending regions between the two domains. α-helices that preserve their hydrogen bonding network when bent are found to behave as mechanical hinges, storing `elastic energy' that drives the closure of domains for rapid capture of a substrate. [25] Khade et. al. worked on prediction of the hinges [26] in any conformation and further built an Elastic Network Model called hdANM [27] that can model those motions.
The interconversion of helical and extended conformations at the site of a domain boundary is not uncommon. In calmodulin, torsion angles change for five residues in the middle of a domain linking α-helix. The helix is split into two, almost perpendicular, smaller helices separated by four residues of an extended strand. [28] [29]
Shear motions involve a small sliding movement of domain interfaces, controlled by the amino acid side chains within the interface. Proteins displaying shear motions often have a layered architecture: stacking of secondary structures. The interdomain linker has merely the role of keeping the domains in close proximity.[ citation needed ]
The analysis of the internal dynamics of structurally different, but functionally similar enzymes has highlighted a common relationship between the positioning of the active site and the two principal protein sub-domains. In fact, for several members of the hydrolase superfamily, the catalytic site is located close to the interface separating the two principal quasi-rigid domains. [14] Such positioning appears instrumental for maintaining the precise geometry of the active site, while allowing for an appreciable functionally oriented modulation of the flanking regions resulting from the relative motion of the two sub-domains.[ citation needed ]
Evidence suggests that protein dynamics are important for function, e.g. enzyme catalysis in dihydrofolate reductase (DHFR), yet they are also posited to facilitate the acquisition of new functions by molecular evolution. [30] This argument suggests that proteins have evolved to have stable, mostly unique folded structures, but the unavoidable residual flexibility leads to some degree of functional promiscuity, which can be amplified/harnessed/diverted by subsequent mutations.[ citation needed ] Research on promiscuous proteins within the BCL-2 family revealed that nanosecond-scale protein dynamics can play a crucial role in protein binding behaviour and thus promiscuity [31] .
However, there is growing awareness that intrinsically unstructured proteins are quite prevalent in eukaryotic genomes, [32] casting further doubt on the simplest interpretation of Anfinsen's dogma: "sequence determines structure (singular)". In effect, the new paradigm is characterized by the addition of two caveats: "sequence and cellular environment determine structural ensemble".
Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.
In biochemistry, allosteric regulation is the regulation of an enzyme by binding an effector molecule at a site other than the enzyme's active site.
Phenylalanine hydroxylase (PAH) (EC 1.14.16.1) is an enzyme that catalyzes the hydroxylation of the aromatic side-chain of phenylalanine to generate tyrosine. PAH is one of three members of the biopterin-dependent aromatic amino acid hydroxylases, a class of monooxygenase that uses tetrahydrobiopterin (BH4, a pteridine cofactor) and a non-heme iron for catalysis. During the reaction, molecular oxygen is heterolytically cleaved with sequential incorporation of one oxygen atom into BH4 and phenylalanine substrate. In humans, mutations in its encoding gene, PAH, can lead to the metabolic disorder phenylketonuria.
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.
The lac repressor (LacI) is a DNA-binding protein that inhibits the expression of genes coding for proteins involved in the metabolism of lactose in bacteria. These genes are repressed when lactose is not available to the cell, ensuring that the bacterium only invests energy in the production of machinery necessary for uptake and utilization of lactose when lactose is present. When lactose becomes available, it is firstly converted into allolactose by β-Galactosidase (lacZ) in bacteria. The DNA binding ability of lac repressor bound with allolactose is inhibited due to allosteric regulation, thereby genes coding for proteins involved in lactose uptake and utilization can be expressed.
In biochemistry and molecular biology, a binding site is a region on a macromolecule such as a protein that binds to another molecule with specificity. The binding partner of the macromolecule is often referred to as a ligand. Ligands may include other proteins, enzyme substrates, second messengers, hormones, or allosteric modulators. The binding event is often, but not always, accompanied by a conformational change that alters the protein's function. Binding to protein binding sites is most often reversible, but can also be covalent reversible or irreversible.
Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.
In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.
In biochemistry, a conformational change is a change in the shape of a macromolecule, often induced by environmental factors.
A turn is an element of secondary structure in proteins where the polypeptide chain reverses its overall direction.
Phosphoglycerate kinase is an enzyme that catalyzes the reversible transfer of a phosphate group from 1,3-bisphosphoglycerate (1,3-BPG) to ADP producing 3-phosphoglycerate (3-PG) and ATP :
Allosteric enzymes are enzymes that change their conformational ensemble upon binding of an effector which results in an apparent change in binding affinity at a different ligand binding site. This "action at a distance" through binding of one ligand affecting the binding of another at a distinctly different site, is the essence of the allosteric concept. Allostery plays a crucial role in many fundamental biological processes, including but not limited to cell signaling and the regulation of metabolism. Allosteric enzymes need not be oligomers as previously thought, and in fact many systems have demonstrated allostery within single enzymes. In biochemistry, allosteric regulation is the regulation of a protein by binding an effector molecule at a site other than the enzyme's active site.
Molecular biophysics is a rapidly evolving interdisciplinary area of research that combines concepts in physics, chemistry, engineering, mathematics and biology. It seeks to understand biomolecular systems and explain biological function in terms of molecular structure, structural organization, and dynamic behaviour at various levels of complexity. This discipline covers topics such as the measurement of molecular forces, molecular associations, allosteric interactions, Brownian motion, and cable theory. Additional areas of study can be found on Outline of Biophysics. The discipline has required development of specialized equipment and procedures capable of imaging and manipulating minute living structures, as well as novel experimental approaches.
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.
The Gaussian network model (GNM) is a representation of a biological macromolecule as an elastic mass-and-spring network to study, understand, and characterize the mechanical aspects of its long-time large-scale dynamics. The model has a wide range of applications from small proteins such as enzymes composed of a single domain, to large macromolecular assemblies such as a ribosome or a viral capsid. Protein domain dynamics plays key roles in a multitude of molecular recognition and cell signalling processes. Protein domains, connected by intrinsically disordered flexible linker domains, induce long-range allostery via protein domain dynamics. The resultant dynamic modes cannot be generally predicted from static structures of either the entire protein or individual domains.
Fuzzy complexes are protein complexes, where structural ambiguity or multiplicity exists and is required for biological function. Alteration, truncation or removal of conformationally ambiguous regions impacts the activity of the corresponding complex. Fuzzy complexes are generally formed by intrinsically disordered proteins. Structural multiplicity usually underlies functional multiplicity of protein complexes following a fuzzy logic. Distinct binding modes of the nucleosome are also regarded as a special case of fuzziness.
KcsA (Kchannel of streptomyces A) is a prokaryotic potassium channel from the soil bacterium Streptomyces lividans that has been studied extensively in ion channel research. The pH activated protein possesses two transmembrane segments and a highly selective pore region, responsible for the gating and shuttling of K+ ions out of the cell. The amino acid sequence found in the selectivity filter of KcsA is highly conserved among both prokaryotic and eukaryotic K+ voltage channels; as a result, research on KcsA has provided important structural and mechanistic insight on the molecular basis for K+ ion selection and conduction. As one of the most studied ion channels to this day, KcsA is a template for research on K+ channel function and its elucidated structure underlies computational modeling of channel dynamics for both prokaryotic and eukaryotic species.
A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.
In the area of protein structural motifs, niches are three or four amino acid residue features in which main-chain CO groups are bridged by positively charged or δ+ groups. The δ+ groups include groups with two hydrogen bond donor atoms such as NH2 groups and water molecules. In typical proteins, 7% of amino acid residues belong to niches bound to a δ+ group, while another 7% have the conformation but no single cationic bridging group is detected.