Direct coupling analysis

Last updated

Direct coupling analysis or DCA is an umbrella term comprising several methods for analyzing sequence data in computational biology. [1] The common idea of these methods is to use statistical modeling to quantify the strength of the direct relationship between two positions of a biological sequence, excluding effects from other positions. This contrasts usual measures of correlation, which can be large even if there is no direct relationship between the positions (hence the name direct coupling analysis). Such a direct relationship can for example be the evolutionary pressure for two positions to maintain mutual compatibility in the biomolecular structure of the sequence, leading to molecular coevolution between the two positions.

Contents

DCA has been used in the inference of protein residue contacts, [1] [2] [3] [4] [5] RNA structure prediction, [6] [7] the inference of protein-protein interaction networks, [8] [9] [10] [11] [12] the modeling of fitness landscapes, [13] [14] [15] the generation of novel function proteins, [16] and the modeling of protein evolution. [17] [18]

Mathematical Model and Inference

Mathematical Model

The basis of DCA is a statistical model for the variability within a set of phylogenetically related biological sequences. When fitted to a multiple sequence alignment (MSA) of sequences of length , the model defines a probability for all possible sequences of the same length. [1] This probability can be interpreted as the probability that the sequence in question belongs to the same class of sequences as the ones in the MSA, for example the class of all protein sequences belonging to a specific protein family.

We denote a sequence by , with the being categorical variables representing the monomers of the sequence (if the sequences are for example aligned amino acid sequences of proteins of a protein family, the take as values any of the 20 standard amino acids). The probability of a sequence within a model is then defined as

where

  • are sets of real numbers representing the parameters of the model (more below)
  • is a normalization constant (a real number) to ensure

The parameters depend on one position and the symbol at this position. They are usually called fields [1] and represent the propensity of symbol to be found at a certain position. The parameters depend on pairs of positions and the symbols at these positions. They are usually called couplings [1] and represent an interaction, i.e. a term quantifying how compatible the symbols at both positions are with each other. The model is fully connected, so there are interactions between all pairs of positions. The model can be seen as a generalization of the Ising model, with spins not only taking two values, but any value from a given finite alphabet. In fact, when the size of the alphabet is 2, the model reduces to the Ising model. Since it is also reminiscent of the model of the same name, it is often called Potts model. [19]

Even knowing the probabilities of all sequences does not determine the parameters uniquely. For example, a simple transformation of the parameters

for any set of real numbers leaves the probabilities the same. The likelihood function is invariant under such transformations as well, so the data cannot be used to fix these degrees of freedom (although a prior on the parameters might do so [3] ).

A convention often found in literature [3] [20] is to fix these degrees of freedom such that the Frobenius norm of the coupling matrix

is minimized (independently for every pair of positions and ).

Maximum Entropy Derivation

To justify the Potts model, it is often noted that it can be derived following a maximum entropy principle: [21] For a given set of sample covariances and frequencies, the Potts model represents the distribution with the maximal Shannon entropy of all distributions reproducing those covariances and frequencies. For a multiple sequence alignment, the sample covariances are defined as

,

where is the frequency of finding symbols and at positions and in the same sequence in the MSA, and the frequency of finding symbol at position . The Potts model is then the unique distribution that maximizes the functional

The first term in the functional is the Shannon entropy of the distribution. The are Lagrange multipliers to ensure , with being the marginal probability to find symbols at positions . The Lagrange multiplier ensures normalization. Maximizing this functional and identifying

leads to the Potts model above. This procedure only gives the functional form of the Potts model, while the numerical values of the Lagrange multipliers (identified with the parameters) still have to be determined by fitting the model to the data.

Direct Couplings and Indirect Correlation

The central point of DCA is to interpret the (which can be represented as a matrix if there are possible symbols) as direct couplings. If two positions are under joint evolutionary pressure (for example to maintain a structural bond), one might expect these couplings to be large because only sequences with fitting pairs of symbols should have a significant probability. On the other hand, a large correlation between two positions does not necessarily mean that the couplings are large, since large couplings between e.g. positions and might lead to large correlations between positions and , mediated by position . [1] In fact, such indirect correlations have been implicated in the high false positive rate when inferring protein residue contacts using correlation measures like mutual information. [22]

Inference

The inference of the Potts model on a multiple sequence alignment (MSA) using maximum likelihood estimation is usually computationally intractable, because one needs to calculate the normalization constant , which is for sequence length and possible symbols a sum of terms (which means for example for a small protein domain family with 30 positions terms). Therefore, numerous approximations and alternatives have been developed:

All of these methods lead to some form of estimate for the set of parameters maximizing the likelihood of the MSA. Many of them include regularization or prior terms to ensure a well-posed problem or promote a sparse solution.

Applications

Protein Residue Contact Prediction

A possible interpretation of large values of couplings in a model fitted to a MSA of a protein family is the existence of conserved contacts between positions (residues) in the family. Such a contact can lead to molecular coevolution, since a mutation in one of the two residues, without a compensating mutation in the other residue, is likely to disrupt protein structure and negatively affect the fitness of the protein. Residue pairs for which there is a strong selective pressure to maintain mutual compatibility are therefore expected to mutate together or not at all. This idea (which was known in literature long before the conception of DCA [25] ) has been used to predict protein contact maps, for example analyzing the mutual information between protein residues.

Within the framework of DCA, a score for the strength of the direct interaction between a pair of residues is often defined [3] [20] using the Frobenius norm of the corresponding coupling matrix and applying an average product correction (APC):

where has been defined above and

.

This correction term was first introduced for mutual information [26] and is used to remove biases of specific positions to produce large . Scores that are invariant under parameter transformations that do not affect the probabilities have also been used. [1] Sorting all residue pairs by this score results in a list in which the top of the list is strongly enriched in residue contacts when compared to the protein contact map of a homologous protein. [4] High-quality predictions of residue contacts are valuable as prior information in protein structure prediction. [4]

Inference of protein-protein interaction

DCA can be used for detecting conserved interaction between protein families and for predicting which residue pairs form contacts in a protein complex. [8] [9] Such predictions can be used when generating structural models for these complexes, [27] or when inferring protein-protein interaction networks made from more than two proteins. [9] [12]

Modeling of fitness landscapes

DCA can be used to model fitness landscapes and to predict the effect of a mutation in the amino acid sequence of a protein on its fitness. [13] [14]

Online services:

Source code:

Useful applications:

Related Research Articles

<span class="mw-page-title-main">Spin glass</span> Disordered magnetic state

In condensed matter physics, a spin glass is a magnetic state characterized by randomness, besides cooperative behavior in freezing of spins at a temperature called "freezing temperature" Tf. In ferromagnetic solids, component atoms' magnetic spins all align in the same direction. Spin glass when contrasted with a ferromagnet is defined as "disordered" magnetic state in which spins are aligned randomly or without a regular pattern and the couplings too are random.

The Ising model, named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent magnetic dipole moments of atomic "spins" that can be in one of two states. The spins are arranged in a graph, usually a lattice, allowing each spin to interact with its neighbors. Neighboring spins that agree have a lower energy than those that disagree; the system tends to the lowest energy but heat disturbs this tendency, thus creating the possibility of different structural phases. The model allows the identification of phase transitions as a simplified model of reality. The two-dimensional square-lattice Ising model is one of the simplest statistical models to show a phase transition.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.

Ernst Ising was a German physicist, who is best remembered for the development of the Ising model. He was a professor of physics at Bradley University until his retirement in 1976.

<span class="mw-page-title-main">Needleman–Wunsch algorithm</span> Method for aligning biological sequences

The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. It was one of the first applications of dynamic programming to compare biological sequences. The algorithm was developed by Saul B. Needleman and Christian D. Wunsch and published in 1970. The algorithm essentially divides a large problem into a series of smaller problems, and it uses the solutions to the smaller problems to find an optimal solution to the larger problem. It is also sometimes referred to as the optimal matching algorithm and the global alignment technique. The Needleman–Wunsch algorithm is still widely used for optimal global alignment, particularly when the quality of the global alignment is of the utmost importance. The algorithm assigns a score to every possible alignment, and the purpose of the algorithm is to find all possible alignments having the highest score.

In statistical mechanics, the Potts model, a generalization of the Ising model, is a model of interacting spins on a crystalline lattice. By studying the Potts model, one may gain insight into the behaviour of ferromagnets and certain other phenomena of solid-state physics. The strength of the Potts model is not so much that it models these physical systems well; it is rather that the one-dimensional case is exactly solvable, and that it has a rich mathematical formulation that has been studied extensively.

Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.

The Classical Heisenberg model, developed by Werner Heisenberg, is the case of the n-vector model, one of the models used in statistical physics to model ferromagnetism, and other phenomena.

<span class="mw-page-title-main">Substitution model</span> Description of the process by which states in sequences change into each other and back

In biology, a substitution model, also called models of DNA sequence evolution, are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules represented as sequence of symbols. Substitution models are used to calculate the likelihood of phylogenetic trees using multiple sequence alignment data. Thus, substitution models are central to maximum likelihood estimation of phylogeny as well as Bayesian inference in phylogeny. Estimates of evolutionary distances are typically calculated using substitution models. Substitution models are also central to phylogenetic invariants because they are necessary to predict site pattern frequencies given a tree topology. Substitution models are also necessary to simulate sequence data for a group of organisms related by a specific tree.

<span class="mw-page-title-main">Protein contact map</span>

A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues and , the element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the Cα-Cα atom with threshold 6-12 Å; distance between Cβ-Cβ atoms with threshold 6-12 Å ; and distance between the side-chain centers of mass.

The OPLS force field was developed by Prof. William L. Jorgensen at Purdue University and later at Yale University, and is being further developed commercially by Schrödinger, Inc.

<span class="mw-page-title-main">Statistical potential</span>

In protein structure prediction, statistical potentials or knowledge-based potentials are scoring functions derived from an analysis of known protein structures in the Protein Data Bank (PDB).

Implicit solvation is a method to represent solvent as a continuous medium instead of individual “explicit” solvent molecules, most often used in molecular dynamics simulations and in other applications of molecular mechanics. The method is often applied to estimate free energy of solute-solvent interactions in structural and chemical processes, such as folding or conformational transitions of proteins, DNA, RNA, and polysaccharides, association of biological macromolecules with ligands, or transport of drugs across biological membranes.

<span class="mw-page-title-main">Gaussian network model</span>

The Gaussian network model (GNM) is a representation of a biological macromolecule as an elastic mass-and-spring network to study, understand, and characterize the mechanical aspects of its long-time large-scale dynamics. The model has a wide range of applications from small proteins such as enzymes composed of a single domain, to large macromolecular assemblies such as a ribosome or a viral capsid. Protein domain dynamics plays key roles in a multitude of molecular recognition and cell signalling processes. Protein domains, connected by intrinsically disordered flexible linker domains, induce long-range allostery via protein domain dynamics. The resultant dynamic modes cannot be generally predicted from static structures of either the entire protein or individual domains.

Statistical coupling analysis or SCA is a technique used in bioinformatics to measure covariation between pairs of amino acids in a protein multiple sequence alignment (MSA). More specifically, it quantifies how much the amino acid distribution at some position i changes upon a perturbation of the amino acid distribution at another position j. The resulting statistical coupling energy indicates the degree of evolutionary dependence between the residues, with higher coupling energy corresponding to increased dependence.

CS-BLAST (Context-Specific BLAST) is a tool that searches a protein sequence that extends BLAST, using context-specific mutation probabilities. More specifically, CS-BLAST derives context-specific amino-acid similarities on each query sequence from short windows on the query sequences. Using CS-BLAST doubles sensitivity and significantly improves alignment quality without a loss of speed in comparison to BLAST. CSI-BLAST is the context-specific analog of PSI-BLAST, which computes the mutation profile with substitution probabilities and mixes it with the query profile. CSI-BLAST is the context specific analog of PSI-BLAST. Both of these programs are available as web-server and are available for free download.

Biology Monte Carlo methods (BioMOCA) have been developed at the University of Illinois at Urbana-Champaign to simulate ion transport in an electrolyte environment through ion channels or nano-pores embedded in membranes. It is a 3-D particle-based Monte Carlo simulator for analyzing and studying the ion transport problem in ion channel systems or similar nanopores in wet/biological environments. The system simulated consists of a protein forming an ion channel (or an artificial nanopores like a Carbon Nano Tube, CNT), with a membrane (i.e. lipid bilayer) that separates two ion baths on either side. BioMOCA is based on two methodologies, namely the Boltzmann transport Monte Carlo (BTMC) and particle-particle-particle-mesh (P3M). The first one uses Monte Carlo method to solve the Boltzmann equation, while the later splits the electrostatic forces into short-range and long-range components.

Variance-based sensitivity analysis is a form of global sensitivity analysis. Working within a probabilistic framework, it decomposes the variance of the output of the model or system into fractions which can be attributed to inputs or sets of inputs. For example, given a model with two inputs and one output, one might find that 70% of the output variance is caused by the variance in the first input, 20% by the variance in the second, and 10% due to interactions between the two. These percentages are directly interpreted as measures of sensitivity. Variance-based measures of sensitivity are attractive because they measure sensitivity across the whole input space, they can deal with nonlinear responses, and they can measure the effect of interactions in non-additive systems.

In bioinformatics, alignment-free sequence analysis approaches to molecular sequence and structure data provide alternatives over alignment-based approaches.

A dipole glass is an analog of a glass where the dipoles are frozen below a given freezing temperature Tf introducing randomness thus resulting in a lack of long-range ferroelectric order. A dipole glass is very similar to the concept of a spin glass where the atomic spins don't all align in the same direction and thus result in a net-zero magnetization. The randomness of dipoles in a dipole glass creates local fields resulting in short-range order but no long-range order.

References

  1. 1 2 3 4 5 6 7 8 Morcos, F.; Pagnani, A.; Lunt, B.; Bertolino, A.; Marks, D. S.; Sander, C.; Zecchina, R.; Onuchic, J. N.; Hwa, T.; Weigt, M. (21 November 2011). "Direct-coupling analysis of residue coevolution captures native contacts across many protein families". Proceedings of the National Academy of Sciences. 108 (49): E1293–E1301. arXiv: 1110.5223 . Bibcode:2011PNAS..108E1293M. doi: 10.1073/pnas.1111471108 . PMC   3241805 . PMID   22106262.
  2. Kamisetty, H.; Ovchinnikov, S.; Baker, D. (5 September 2013). "Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era". Proceedings of the National Academy of Sciences. 110 (39): 15674–15679. Bibcode:2013PNAS..11015674K. doi: 10.1073/pnas.1314045110 . PMC   3785744 . PMID   24009338.
  3. 1 2 3 4 5 Ekeberg, Magnus; Lövkvist, Cecilia; Lan, Yueheng; Weigt, Martin; Aurell, Erik (11 January 2013). "Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models". Physical Review E. 87 (1): 012707. arXiv: 1211.1281 . Bibcode:2013PhRvE..87a2707E. doi:10.1103/PhysRevE.87.012707. PMID   23410359. S2CID   27772365.
  4. 1 2 3 Marks, Debora S.; Colwell, Lucy J.; Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris; Sali, Andrej (7 December 2011). "Protein 3D Structure Computed from Evolutionary Sequence Variation". PLOS ONE. 6 (12): e28766. Bibcode:2011PLoSO...628766M. doi: 10.1371/journal.pone.0028766 . PMC   3233603 . PMID   22163331.
  5. Ekeberg, Magnus; Hartonen, Tuomo; Aurell, Erik (2014-11-01). "Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences". Journal of Computational Physics. 276: 341–356. arXiv: 1401.4832 . Bibcode:2014JCoPh.276..341E. doi:10.1016/j.jcp.2014.07.024. ISSN   0021-9991. S2CID   15635703.
  6. De Leonardis, Eleonora; Lutz, Benjamin; Ratz, Sebastian; Cocco, Simona; Monasson, Rémi; Schug, Alexander; Weigt, Martin (29 September 2015). "Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction". Nucleic Acids Research. 43 (21): 10444–55. arXiv: 1510.03351 . doi:10.1093/nar/gkv932. PMC   4666395 . PMID   26420827.
  7. Weinreb, Caleb; Riesselman, Adam J.; Ingraham, John B.; Gross, Torsten; Sander, Chris; Marks, Debora S. (May 2016). "3D RNA and Functional Interactions from Evolutionary Couplings". Cell. 165 (4): 963–975. doi:10.1016/j.cell.2016.03.030. PMC   5024353 . PMID   27087444.
  8. 1 2 Ovchinnikov, Sergey; Kamisetty, Hetunandan; Baker, David (1 May 2014). "Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information". eLife. 3: e02030. doi: 10.7554/eLife.02030 . PMC   4034769 . PMID   24842992.
  9. 1 2 3 Feinauer, Christoph; Szurmant, Hendrik; Weigt, Martin; Pagnani, Andrea; Keskin, Ozlem (16 February 2016). "Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon". PLOS ONE. 11 (2): e0149166. arXiv: 1512.05420 . Bibcode:2016PLoSO..1149166F. doi: 10.1371/journal.pone.0149166 . PMC   4755613 . PMID   26882169.
  10. dos Santos, R.N.; Morcos, F.; Jana, B.; Andricopulo, A.D.; Onuchic, J.N. (4 September 2015). "Dimeric interactions and complex formation using direct coevolutionary couplings". Scientific Reports. 5: 13652. Bibcode:2015NatSR...513652D. doi:10.1038/srep13652. PMC   4559900 . PMID   26338201.
  11. Uguzzoni, Guido; John Lovis, Shalini; Oteri, Francesco; Schug, Alexander; Szurmant, Hendrik; Weigt, Martin (2017-03-28). "Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis". Proceedings of the National Academy of Sciences. 114 (13): E2662–E2671. arXiv: 1703.01246 . Bibcode:2017PNAS..114E2662U. doi: 10.1073/pnas.1615068114 . ISSN   0027-8424. PMC   5380090 . PMID   28289198.
  12. 1 2 Croce, Giancarlo; Gueudré, Thomas; Cuevas, Maria Virginia Ruiz; Keidel, Victoria; Figliuzzi, Matteo; Szurmant, Hendrik; Weigt, Martin (2019-10-21). "A multi-scale coevolutionary approach to predict interactions between protein domains". PLOS Computational Biology. 15 (10): e1006891. Bibcode:2019PLSCB..15E6891C. doi: 10.1371/journal.pcbi.1006891 . ISSN   1553-7358. PMC   6822775 . PMID   31634362.
  13. 1 2 Ferguson, Andrew L.; Mann, Jaclyn K.; Omarjee, Saleha; Ndung'u, Thumbi; Walker, Bruce D.; Chakraborty, Arup K. (March 2013). "Translating HIV Sequences into Quantitative Fitness Landscapes Predicts Viral Vulnerabilities for Rational Immunogen Design". Immunity. 38 (3): 606–617. doi:10.1016/j.immuni.2012.11.022. PMC   3728823 . PMID   23521886.
  14. 1 2 Figliuzzi, Matteo; Jacquier, Hervé; Schug, Alexander; Tenaillon, Oliver; Weigt, Martin (January 2016). "Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1". Molecular Biology and Evolution. 33 (1): 268–280. doi:10.1093/molbev/msv211. PMC   4693977 . PMID   26446903.
  15. Asti, Lorenzo; Uguzzoni, Guido; Marcatili, Paolo; Pagnani, Andrea; Ofran, Yanay (13 April 2016). "Maximum-Entropy Models of Sequenced Immune Repertoires Predict Antigen-Antibody Affinity". PLOS Computational Biology. 12 (4): e1004870. Bibcode:2016PLSCB..12E4870A. doi: 10.1371/journal.pcbi.1004870 . PMC   4830580 . PMID   27074145.
  16. Russ, William P.; Figliuzzi, Matteo; Stocker, Christian; Barrat-Charlaix, Pierre; Socolich, Michael; Kast, Peter; Hilvert, Donald; Monasson, Remi; Cocco, Simona; Weigt, Martin; Ranganathan, Rama (2020-07-24). "An evolution-based model for designing chorismate mutase enzymes". Science. 369 (6502): 440–445. Bibcode:2020Sci...369..440R. doi:10.1126/science.aba3304. ISSN   0036-8075. PMID   32703877. S2CID   220714458.
  17. Rodriguez-Rivas, Juan; Croce, Giancarlo; Muscat, Maureen; Weigt, Martin (2022-01-25). "Epistatic models predict mutable sites in SARS-CoV-2 proteins and epitopes". Proceedings of the National Academy of Sciences. 119 (4). arXiv: 2112.10093 . Bibcode:2022PNAS..11913118R. doi:10.1073/pnas.2113118119. ISSN   0027-8424. PMC   8795541 . PMID   35022216.
  18. Vigué, Lucile; Croce, Giancarlo; Petitjean, Marie; Ruppé, Etienne; Tenaillon, Olivier; Weigt, Martin (2022-07-12). "Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes". Nature Communications. 13 (1): 4030. Bibcode:2022NatCo..13.4030V. doi:10.1038/s41467-022-31643-3. ISSN   2041-1723. PMC   9276797 . PMID   35821377.
  19. Feinauer, Christoph; Skwark, Marcin J.; Pagnani, Andrea; Aurell, Erik (9 October 2014). "Improving Contact Prediction along Three Dimensions". PLOS Computational Biology. 10 (10): e1003847. arXiv: 1403.0379 . Bibcode:2014PLSCB..10E3847F. doi: 10.1371/journal.pcbi.1003847 . PMC   4191875 . PMID   25299132.
  20. 1 2 3 Baldassi, Carlo; Zamparo, Marco; Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea (24 March 2014). "Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners". PLOS ONE. 9 (3): e92721. arXiv: 1404.1240 . Bibcode:2014PLoSO...992721B. doi: 10.1371/journal.pone.0092721 . PMC   3963956 . PMID   24663061.
  21. Stein, Richard R.; Marks, Debora S.; Sander, Chris; Chen, Shi-Jie (30 July 2015). "Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models". PLOS Computational Biology. 11 (7): e1004182. Bibcode:2015PLSCB..11E4182S. doi: 10.1371/journal.pcbi.1004182 . PMC   4520494 . PMID   26225866.
  22. Burger, Lukas; van Nimwegen, Erik; Bourne, Philip E. (1 January 2010). "Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments". PLOS Computational Biology. 6 (1): e1000633. Bibcode:2010PLSCB...6E0633B. doi: 10.1371/journal.pcbi.1000633 . PMC   2793430 . PMID   20052271.
  23. Weigt, M.; White, R. A.; Szurmant, H.; Hoch, J. A.; Hwa, T. (30 December 2008). "Identification of direct residue contacts in protein-protein interaction by message passing". Proceedings of the National Academy of Sciences. 106 (1): 67–72. arXiv: 0901.1248 . Bibcode:2009PNAS..106...67W. doi: 10.1073/pnas.0805923106 . PMC   2629192 . PMID   19116270.
  24. Barton, J. P.; De Leonardis, E.; Coucke, A.; Cocco, S. (21 June 2016). "ACE: adaptive cluster expansion for maximum entropy graphical model inference". Bioinformatics. 32 (20): 3089–3097. doi: 10.1093/bioinformatics/btw328 . PMID   27329863.
  25. Göbel, Ulrike; Sander, Chris; Schneider, Reinhard; Valencia, Alfonso (April 1994). "Correlated mutations and residue contacts in proteins". Proteins: Structure, Function, and Genetics. 18 (4): 309–317. doi:10.1002/prot.340180402. PMID   8208723. S2CID   14978727.
  26. Dunn, S.D.; Wahl, L.M.; Gloor, G.B. (5 December 2007). "Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction". Bioinformatics. 24 (3): 333–340. doi: 10.1093/bioinformatics/btm604 . PMID   18057019.
  27. Schug, A.; Weigt, M.; Onuchic, J. N.; Hwa, T.; Szurmant, H. (17 December 2009). "High-resolution protein complexes from integrating genomic information with molecular simulation". Proceedings of the National Academy of Sciences. 106 (52): 22124–22129. Bibcode:2009PNAS..10622124S. doi: 10.1073/pnas.0912100106 . PMC   2799721 . PMID   20018738.
  28. Jarmolinska, Aleksandra I.; Zhou, Qin; Sulkowska, Joanna I.; Morcos, Faruck (11 January 2019). "DCA-MOL: A PyMOL Plugin To Analyze Direct Evolutionary Couplings". Journal of Chemical Information and Modeling. 59 (2): 625–629. doi:10.1021/acs.jcim.8b00690. PMID   30632747. S2CID   58634008.