Distance geometry problem

Last updated

The distance geometry problem is that of characterization and study of sets of points based only on given values of the distances between member pairs. [1] [2] [3] Therefore distance geometry has immediate relevance where distance values are determined or considered, such as biology, [4] sensor network, [5] surveying, cartography, and physics.

In mathematics, the statement that "Property Pcharacterizes object X" means that not only does X have property P, but that X is the only thing that has property P. In other words, P is a defining property of X. Similarly a set of properties P is said to characterize X when these properties distinguish X from all other objects. Common mathematical expressions for a characterization of X in terms of P include "P is necessary and sufficient for X", and "X holds if and only if P".

Set (mathematics) fundamental mathematical concept related to the notions of belonging or inclusion

In mathematics, a set is a collection of distinct objects, considered as an object in its own right. For example, the numbers 2, 4, and 6 are distinct objects when considered separately, but when they are considered collectively they form a single set of size three, written {2, 4, 6}. The concept of a set is one of the most fundamental in mathematics. Developed at the end of the 19th century, set theory is now a ubiquitous part of mathematics, and can be used as a foundation from which nearly all of mathematics can be derived. In mathematics education, elementary topics from set theory such as Venn diagrams are taught at a young age, while more advanced concepts are taught as part of a university degree.

Distance is a numerical measurement of how far apart objects are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria. In most cases, "distance from A to B" is interchangeable with "distance from B to A". In mathematics, a distance function or metric is a generalization of the concept of physical distance. A metric is a function that behaves according to a specific set of rules, and is a way of describing what it means for elements of some space to be "close to" or "far away from" each other.

Contents

Applications

The distance geometry problem (DGP) is that of finding the coordinates of a set of points by using the distances between some pairs of such points. [3] There exists nowadays a large community that is actively working on this problem, because there are several real-life applications that can lead to the formulation of a DGP. As an example, an interesting application is that of locating sensors in telecommunication networks. [5] In such a case, the positions of some sensors are known (which are called anchors) and some of the distances between sensors (which can be anchors or not) are also known: the problem is to identify the positions in space for all sensors.

An interesting application arises in biology. [4] [6] Experimental techniques are able to estimate distances between pairs of atoms of a given molecule, and the problem becomes the one of identifying the three-dimensional conformation of the molecule, i.e. the positions of all its atoms. In this field, the main interest is on proteins, because discovering their three-dimensional conformation allows us to get clues about the function they are able to perform. The implications in related fields, such as biomedicine and drug design, are evident. When dealing with biological molecules, the DGP is generally referred to as molecular DGP (MDGP).

Protein biological molecule consisting of chains of amino acid residues

Proteins are large biomolecules, or macromolecules, consisting of one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific three-dimensional structure that determines its activity.

Biomedicine is a branch of medical science that applies biological and physiological principles to clinical practice. The branch especially applies to biology and physiology. Biomedicine also can relate to many other categories in health and biological related fields. It has been the dominant health system for more than a century.

Drug design inventive process of finding new medications based on the knowledge of a biological target

Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of molecules that are complementary in shape and charge to the biomolecular target with which they interact and therefore will bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is sometimes referred to as computer-aided drug design. Finally, drug design that relies on the knowledge of the three-dimensional structure of the biomolecular target is known as structure-based drug design. In addition to small molecules, biopharmaceuticals including peptides and especially therapeutic antibodies are an increasingly important class of drugs and computational methods for improving the affinity, selectivity, and stability of these protein-based therapeutics have also been developed.

In the following, even if the article considers in general the DGP, the MDGP will be used as an example.

Basic issues

A straight line is the shortest path between two points. Therefore the distance from A to B is no bigger than the length of the straight-line path from A to C plus the length of the straight-line path from C to B. This fact is called the triangle inequality. If that sum happens to be equal to the distance from A to B, then the three points A, B, and C lie on a straight line, with C between A and B.

Line (geometry) straight object with negligible width and depth

The notion of line or straight line was introduced by ancient mathematicians to represent straight objects with negligible width and depth. Lines are an idealization of such objects. Until the 17th century, lines were defined as the "[…] first species of quantity, which has only one dimension, namely length, without any width nor depth, and is nothing else than the flow or run of the point which […] will leave from its imaginary moving some vestige in length, exempt of any width. […] The straight line is that which is equally extended between its points."

Triangle inequality

In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of degenerate triangles, but some authors, especially those writing about elementary geometry, will exclude this possibility, thus leaving out the possibility of equality. If x, y, and z are the lengths of the sides of the triangle, with no side being greater than z, then the triangle inequality states that

Similarly, suppose one knows

Knowing only these six numbers, one would like to figure out

Plane (geometry) Flat, two-dimensional surface

In mathematics, a plane is a flat, two-dimensional surface that extends infinitely far. A plane is the two-dimensional analogue of a point, a line and three-dimensional space. Planes can arise as subspaces of some higher-dimensional space, as with a room's walls extended infinitely far, or they may enjoy an independent existence in their own right, as in the setting of Euclidean geometry.

In geometry, a set of points in space are coplanar if there exists a geometric plane that contains them all. For example, three points are always coplanar, and if the points are distinct and non-collinear, the plane they determine is unique. However, a set of four or more distinct points will, in general, not lie in a single plane.

Distance geometry includes the solution of such problems.

Cayley–Menger determinants

Of particular utility and importance are classifications by means of Cayley–Menger determinants, named after Arthur Cayley and Karl Menger:

In linear algebra, geometry, and trigonometry, the Cayley–Menger determinant is a formula for the content, i.e. the higher-dimensional volume, of a -dimensional simplex in terms of the squares of the distances between any selection of two points of it.

Arthur Cayley English mathematician

Arthur Cayley F.R.S. was a British mathematician. He helped found the modern British school of pure mathematics.

Karl Menger Austrian mathematician

Karl Menger was an Austrian-American mathematician. He was the son of the economist Carl Menger. He is credited with Menger's theorem. He worked on mathematics of algebras, algebra of geometries, curve and dimension theory, etc. Moreover, he contributed to game theory and social sciences.

but not all triples of elements of Π are straight to each other;
but not all quadruples of elements of Φ are plane to each other; and so on.

Discretization and orders

The DGP is, by definition, a constraint satisfaction problem. It is however generally reformulated as an optimization problem in a continuous space, and its solution is then attempted by using techniques for global optimization (see for example [7] ).

Under certain assumptions, however, the problem can be discretized, in the sense that the search domain of the optimization problem can be reduced to a discrete domain. When all distances are supposed to be exact (no experimental errors), the search domain becomes a binary tree, where the candidate positions for the same atom of the molecule are given on a common layer of the tree. [8] [9] The discretization allows us to enumerate the entire solution set (this is not possible in general when using global optimization methods).

The discretization assumptions [2] are strongly based on the order in which the atoms of the molecule are considered. When considering the atoms of the molecule in their natural ordering, such assumptions are generally not satisfied. An interesting and fundamental pre-processing step for the discretization of DGPs is, therefore, the problem of identifying an order for the atoms that allows for the discretization. This problem can be solved in polynomial time, when all distances are supposed to be exact, as well as when some available distance is represented by a suitable interval. [10]

Software for distance geometry

Books and conferences

Crippen and Havel are two pioneers of DGP, and they co-authored the book "Distance Geometry and Molecular Conformation", [4] 1988. Much more recently, an edited book, collecting the most recent efforts from the scientific community for solving the DGP, was published by Springer. [3] See this web page for the list of contributions.

Various conferences and workshops are held every year, where the focus is on DGP-related topics. However, the very first workshop completely devoted to DGP and its applications was held in 2013 in Manaus, Brazil: DGA13.

See also

Related Research Articles

Linear programming

Linear programming is a method to achieve the best outcome in a mathematical model whose requirements are represented by linear relationships. Linear programming is a special case of mathematical programming.

In mathematics, matrix multiplication or matrix product is a binary operation that produces a matrix from two matrices with entries in a field, or, more generally, in a ring or even a semiring. The matrix product is designed for representing the composition of linear maps that are represented by matrices. Matrix multiplication is thus a basic tool of linear algebra, and as such has numerous applications in many areas of mathematics, as well as in applied mathematics, statistics, physics, economics, and engineering. In more detail, if A is an n × m matrix and B is an m × p matrix, their matrix product AB is an n × p matrix, in which the m entries across a row of A are multiplied with the m entries down a column of B and summed to produce an entry of AB. When two linear maps are represented by matrices, then the matrix product represents the composition of the two maps.

Parallelogram quadrilateral with two pairs of parallel sides

In Euclidean geometry, a parallelogram is a simple (non-self-intersecting) quadrilateral with two pairs of parallel sides. The opposite or facing sides of a parallelogram are of equal length and the opposite angles of a parallelogram are of equal measure. The congruence of opposite sides and opposite angles is a direct consequence of the Euclidean parallel postulate and neither condition can be proven without appealing to the Euclidean parallel postulate or one of its equivalent formulations.

Molecular dynamics Computer simulations to discover and understand chemical properties

Molecular dynamics (MD) is a computer simulation method for studying the physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamic evolution of the system. In the most common version, the trajectories of atoms and molecules are determined by numerically solving Newton's equations of motion for a system of interacting particles, where forces between the particles and their potential energies are often calculated using interatomic potentials or molecular mechanics force fields. The method was originally developed within the field of theoretical physics in the late 1950s but is applied today mostly in chemical physics, materials science and the modelling of biomolecules.

Molecular mechanics uses classical mechanics to model molecular systems

Molecular mechanics uses classical mechanics to model molecular systems. The Born–Oppenheimer approximation is assumed valid and the potential energy of all systems is calculated as a function of the nuclear coordinates using force fields. Molecular mechanics can be used to study molecule systems ranging in size and complexity from small to large biological systems or material assemblies with many thousands to millions of atoms.

Molecular geometry Study of the 3D shapes of molecules

Molecular geometry is the three-dimensional arrangement of the atoms that constitute a molecule. It includes the general shape of the molecule as well as bond lengths, bond angles, torsional angles and any other geometrical parameters that determine the position of each atom.

Potential energy surface

A potential energy surface (PES) describes the energy of a system, especially a collection of atoms, in terms of certain parameters, normally the positions of the atoms. The surface might define the energy as a function of one or more coordinates; if there is only one coordinate, the surface is called a potential energy curve or energy profile. An example is the Morse/Long-range potential.

Radical axis line determined by two circles

The radical axis of two circles is the locus of points at which tangents drawn to both circles have the same length. The radical axis is always a straight line and always perpendicular to the line connecting the centers of the circles, albeit closer to the circumference of the larger circle. If the circles intersect, the radical axis is the line passing through the intersection points; similarly, if the circles are tangent, the radical axis is simply the common tangent.

The term coordination geometry is used in a number of related fields of chemistry and solid state chemistry/physics.

Neuromuscular-blocking drug

Neuromuscular-blocking drugs block neuromuscular transmission at the neuromuscular junction, causing paralysis of the affected skeletal muscles. This is accomplished either by acting presynaptically via the inhibition of acetylcholine (ACh) synthesis or release or by acting postsynaptically at the acetylcholine receptors of the motor nerve end-plate. While some drugs act presynaptically, those of current clinical importance work postsynaptically.

Force field (chemistry) concept on molecular modeling

In the context of molecular modelling, a force field refers to the functional form and parameter sets used to calculate the potential energy of a system of atoms or coarse-grained particles in molecular mechanics and molecular dynamics simulations. The parameters of the energy functions may be derived from experiments in physics or chemistry, calculations in quantum mechanics, or both.

In geometry, collinearity of a set of points is the property of their lying on a single line. A set of points with this property is said to be collinear. In greater generality, the term has been used for aligned objects, that is, things being "in a line" or "in a row".

Cycloheptene chemical compound

Cycloheptene is a 7-membered cycloalkene with a flash point of −6.7 °C. It is a raw material in organic chemistry and a monomer in polymer synthesis. Cycloheptene can exist as either the cis- or the trans-isomer.

ICM stands for Internal Coordinate Mechanics and was first designed and built to predict low-energy conformations of molecules by sampling the space of internal coordinates defining molecular geometry. In ICM each molecule is constructed as a tree from an entry atom where each next atom is built iteratively from the preceding three atoms via three internal variables. The rings kept rigid or imposed via additional restraints.

Water model model to simulate effects of water in computational chemistry,

In computational chemistry, a water model is used to simulate and thermodynamically calculate water clusters, liquid water, and aqueous solutions with explicit solvent. The models are determined from quantum mechanics, molecular mechanics, experimental results, and these combinations. To imitate a specific nature of molecules, many types of models have been developed. In general, these can be classified by following three points; (i) the number of interaction points called site, (ii) whether the model is rigid or flexible, (iii) whether the model includes polarization effects.

Molecular symmetry Symmetry of chemical molecules

Molecular symmetry in chemistry describes the symmetry present in molecules and the classification of molecules according to their symmetry. Molecular symmetry is a fundamental concept in chemistry, as it can be used to predict or explain many of a molecule's chemical properties, such as its dipole moment and its allowed spectroscopic transitions. Many university level textbooks on physical chemistry, quantum chemistry, and inorganic chemistry devote a chapter to symmetry.

In the field of computational chemistry, energy minimization is the process of finding an arrangement in space of a collection of atoms where, according to some computational model of chemical bonding, the net inter-atomic force on each atom is acceptably close to zero and the position on the potential energy surface (PES) is a stationary point. The collection of atoms might be a single molecule, an ion, a condensed phase, a transition state or even a collection of any of these. The computational model of chemical bonding might, for example, be quantum mechanics.

Matrix (mathematics) Two-dimensional array of numbers with specific operations

In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions, arranged in rows and columns. For example, the dimensions of the matrix below are 2 × 3, because there are two rows and three columns:

Walsh diagram

Walsh diagrams, often called angular coordinate diagrams or correlation diagrams, are representations of calculated orbital binding energies of a molecule versus a distortion coordinate, used for making quick predictions about the geometries of small molecules. By plotting the change in molecular orbital levels of a molecule as a function of geometrical change, Walsh diagrams explain why molecules are more stable in certain spatial configurations.

References

  1. Yemini, Y. (1978). "The positioning problem — a draft of an intermediate summary". Conference on Distributed Sensor Networks, Pittsburgh.
  2. 1 2 Liberti, Leo; Lavor, Carlile; MacUlan, Nelson; Mucherino, Antonio (2014). "Euclidean Distance Geometry and Applications". SIAM Review. 56: 3–69. arXiv: 1205.0349 . doi:10.1137/120875909.
  3. 1 2 3 Mucherino, A.; Lavor, C.; Liberti, L.; Maculan, N. (2013). Distance Geometry: Theory, Methods and Applications.
  4. 1 2 3 Crippen, G.M.; Havel, T.F. (1988). Distance Geometry and Molecular Conformation. John Wiley & Sons.
  5. 1 2 Biswas, P.; Lian, T.; Wang, T.; Ye, Y. (2006). "Semidefinite programming based algorithms for sensor network localization". ACM Transactions in Sensor Networks. 2 (2): 188–220. doi:10.1145/1149283.1149286.
  6. Blumenthal, L.M. (1970). Theory and applications of distance geometry (2nd ed.). Bronx, New York: Chelsea Publishing Company. p. 347. ISBN   978-0-8284-0242-2. LCCN   79113117.
  7. More, J.J.; Wu, Z. (1999). "Distance Geometry Optimization for Protein Structures". Journal of Global Optimization. 15 (3): 219–223. doi:10.1023/A:1008380219900.
  8. Liberti, L.; Lavor, C.; Maculan, N. (2008). "A Branch-and-Prune Algorithm for the Molecular Distance Geometry Problem". International Transactions in Operational Research. 15: 1–17. doi:10.1111/j.1475-3995.2007.00622.x.
  9. Mucherino, A.; Liberti, L.; Lavor, C.; Maculan, N. (2009). "Comparisons between an Exact and a MetaHeuristic Algorithm for the Molecular Distance Geometry Problem". ACM Conference Proceedings, Genetic and Evolutionary Computation Conference (GECCO09): 333–340.
  10. Mucherino, A. (2013). On the Identification of Discretization Orders for Distance Geometry with Intervals. Lecture Notes in Computer Science. 8085. pp. 231–238. doi:10.1007/978-3-642-40020-9_24. ISBN   978-3-642-40019-3.