This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
In molecular biology, pseudo amino acid composition (PseACC) is a method introduced by Kuo-Chen Chou to convert the protein sequence into a numerical vector for enhancing pattern recognition techniques, such as during discrimination between classes of proteins based on their sequences (e.g. between membrane protein s, transmembrane protein s, cytosolic proteins, and other types). [1] This method represented an advance beyond using the immediate amino acid composition (AAC). Instead, the protein is characterized into a matrix of amino-acid frequencies. This matrix incorporates not only amino acid composition, but can also incorporate information from local features of the protein sequence. [2]
Due to the success and widespread application of the PseACC method, it was extended to address sequence-order effects in nucleotide compositions, giving rise to a comparative method called PseKNC. [3]
Two kinds of models are usually used to represent protein samples: the sequential and the discrete (or non-sequential) models. [4] The most elementary sequential model is to use the entire amino acid sequence, as expressed by:
where, P represents the amino acid sequence, is the number of amino acid residues, R1 is the first residue of the protein P, R2 is the second residue, and so forth.
The problem with this approach was that in some sequence-similarity-search-based tools, the query protein often lacked significant homology (or sequence similarity) with any other known protein in the database. To resolve this problem, discrete models for representing protein samples were proposed. The simplest discrete model is using the amino acid composition (AAC) to represent protein samples. Under the AAC model, the protein P of Eq.1 can also be expressed by
where are the normalized occurrence frequencies of the 20 native amino acids in P, and T is the transposing operator. [4]
The primary weakness of the discrete model that relies on the amino acid composition (AAC) is that the information on the frequencies of each amino acid from the sample alone involves a loss of sequence-order information, or information obtained by the order of the amino acid residues. To avoid this information loss, the concept of PseAAC (pseudo amino acid composition) was proposed. [1]
Under this new model, the first 20 discrete factors represent amino acid frequencies are retained, but additional discrete factors are included that also ascertain information about sequence order. The sequence order information is represented by what are called "pseudo components". The number of additional components, beyond the first 20 frequencies, is called λ (or upper-case Λ), and so 20+λ components are included in the model. The upper limit for λ is one less than the length of the shortest protein sample in the dataset. [1] The total number of components (20+λ) may be denoted Ω. Any additional factors can be incorporated so long as they, in some way, obtain or represent information about the sequence-order. Typically, these are a series of rank-different correlation factors along the protein chain. [4]
The additional factors are a series of rank-different correlation factors along a protein chain, but they can also be any combinations of other factors so long as they can reflect some sorts of sequence-order effects one way or the other. Therefore, the essence of PseAAC is that on one hand it covers the AA composition, but on the other hand it contains the information beyond the AA composition and hence can better reflect the feature of a protein sequence through a discrete model.
Meanwhile, various modes to formulate the PseAAC vector have also been developed, as summarized in a 2009 review article. [2]
According to the PseAAC model, the protein P of Eq.1 can be formulated as
where the () components are given by
where is the weight factor, and the -th tier correlation factor that reflects the sequence order correlation between all the -th most contiguous residues as formulated by
with
where is the -th function of the amino acid , and the total number of the functions considered. For example, in the original paper by Chou, [1] , and are respectively the hydrophobicity value, hydrophilicity value, and side chain mass of amino acid ; while , and the corresponding values for the amino acid . Therefore, the total number of functions considered there is . It can be seen from Eq.3 that the first 20 components, i.e. are associated with the conventional AA composition of protein, while the remaining components are the correlation factors that reflect the 1st tier, 2nd tier, ..., and the -th tier sequence order correlation patterns (Figure 1). It is through these additional factors that some important sequence-order effects are incorporated.
in Eq.3 is a parameter of integer and that choosing a different integer for will lead to a dimension-different PseAA composition. [5]
Using Eq.6 is just one of the many modes for deriving the correlation factors in PseAAC or its components. The others, such as the physicochemical distance mode [6] and amphiphilic pattern mode, [7] can also be used to derive different types of PseAAC, as summarized in a 2009 review article. [2] In 2011, the formulation of PseAAC (Eq.3) was extended to a form of the general PseAAC as given by: [8]
where the subscript is an integer, and its value and the components will depend on how to extract the desired information from the amino acid sequence of P in Eq.1.
The general PseAAC can be used to reflect any desired features according to the targets of research, including those core features such as functional domain, sequential evolution, and gene ontology to improve the prediction quality for the subcellular localization of proteins. [9] [10] as well as their many other important attributes.
In mathematics, specifically in group theory, the concept of a semidirect product is a generalization of a direct product. It is usually denoted with the symbol ⋉. There are two closely related concepts of semidirect product:
An ellipsoid is a surface that can be obtained from a sphere by deforming it by means of directional scalings, or more generally, of an affine transformation.
In quantum field theory, the Dirac spinor is the spinor that describes all known fundamental particles that are fermions, with the possible exception of neutrinos. It appears in the plane-wave solution to the Dirac equation, and is a certain combination of two Weyl spinors, specifically, a bispinor that transforms "spinorially" under the action of the Lorentz group.
Linear elasticity is a mathematical model as to how solid objects deform and become internally stressed by prescribed loading conditions. It is a simplification of the more general nonlinear theory of elasticity and a branch of continuum mechanics.
In geodesy, conversion among different geographic coordinate systems is made necessary by the different geographic coordinate systems in use across the world and over time. Coordinate conversion is composed of a number of different types of conversion: format change of geographic coordinates, conversion of coordinate systems, or transformation to different geodetic datums. Geographic coordinate conversion has applications in cartography, surveying, navigation and geographic information systems.
In probability theory, the Gram–Charlier A series, and the Edgeworth series are series that approximate a probability distribution in terms of its cumulants. The series are the same; but, the arrangement of terms differ. The key idea of these expansions is to write the characteristic function of the distribution whose probability density function f is to be approximated in terms of the characteristic function of a distribution with known and suitable properties, and to recover f through the inverse Fourier transform.
In mathematics, particularly in operator theory and C*-algebra theory, the continuous functional calculus is a functional calculus which allows the application of a continuous function to normal elements of a C*-algebra.
In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.
In theoretical physics, the (one-dimensional) nonlinear Schrödinger equation (NLSE) is a nonlinear variation of the Schrödinger equation. It is a classical field equation whose principal applications are to the propagation of light in nonlinear optical fibers and planar waveguides and to Bose–Einstein condensates confined to highly anisotropic, cigar-shaped traps, in the mean-field regime. Additionally, the equation appears in the studies of small-amplitude gravity waves on the surface of deep inviscid (zero-viscosity) water; the Langmuir waves in hot plasmas; the propagation of plane-diffracted wave beams in the focusing regions of the ionosphere; the propagation of Davydov's alpha-helix solitons, which are responsible for energy transport along molecular chains; and many others. More generally, the NLSE appears as one of universal equations that describe the evolution of slowly varying packets of quasi-monochromatic waves in weakly nonlinear media that have dispersion. Unlike the linear Schrödinger equation, the NLSE never describes the time evolution of a quantum state. The 1D NLSE is an example of an integrable model.
In numerical analysis, the Crank–Nicolson method is a finite difference method used for numerically solving the heat equation and similar partial differential equations. It is a second-order method in time. It is implicit in time, can be written as an implicit Runge–Kutta method, and it is numerically stable. The method was developed by John Crank and Phyllis Nicolson in the 1940s.
In continuum mechanics, the finite strain theory—also called large strain theory, or large deformation theory—deals with deformations in which strains and/or rotations are large enough to invalidate assumptions inherent in infinitesimal strain theory. In this case, the undeformed and deformed configurations of the continuum are significantly different, requiring a clear distinction between them. This is commonly the case with elastomers, plastically deforming materials and other fluids and biological soft tissue.
In cartography, a Tissot's indicatrix is a mathematical contrivance presented by French mathematician Nicolas Auguste Tissot in 1859 and 1871 in order to characterize local distortions due to map projection. It is the geometry that results from projecting a circle of infinitesimal radius from a curved geometric model, such as a globe, onto a map. Tissot proved that the resulting diagram is an ellipse whose axes indicate the two principal directions along which scale is maximal and minimal at that point on the map.
In numerical linear algebra, the method of successive over-relaxation (SOR) is a variant of the Gauss–Seidel method for solving a linear system of equations, resulting in faster convergence. A similar method can be used for any slowly converging iterative process.
Prony analysis was developed by Gaspard Riche de Prony in 1795. However, practical use of the method awaited the digital computer. Similar to the Fourier transform, Prony's method extracts valuable information from a uniformly sampled signal and builds a series of damped complex exponentials or damped sinusoids. This allows the estimation of frequency, amplitude, phase and damping components of a signal.
Local tangent plane coordinates (LTP) are part of a spatial reference system based on the tangent plane defined by the local vertical direction and the Earth's axis of rotation. They are also known as local ellipsoidal system, local geodetic coordinate system, local vertical, local horizontal coordinates (LVLH), or topocentric coordinates. It consists of three coordinates: one represents the position along the northern axis, one along the local eastern axis, and one represents the vertical position. Two right-handed variants exist: east, north, up (ENU) coordinates and north, east, down (NED) coordinates. They serve for representing state vectors that are commonly used in aviation and marine cybernetics.
In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.
Common integrals in quantum field theory are all variations and generalizations of Gaussian integrals to the complex plane and to multiple dimensions. Other integrals can be approximated by versions of the Gaussian integral. Fourier integrals are also considered.
Geographical distance or geodetic distance is the distance measured along the surface of the Earth, or the shortest arch length.
In mathematics, the method of steepest descent or saddle-point method is an extension of Laplace's method for approximating an integral, where one deforms a contour integral in the complex plane to pass near a stationary point, in roughly the direction of steepest descent or stationary phase. The saddle-point approximation is used with integrals in the complex plane, whereas Laplace’s method is used with real integrals.
The Pseudo K-tuple nucleotide composition or PseKNC, is a method for converting a nucleotide sequence into a numerical vector so as to be used in pattern recognition techniques. Generally, the K-tuple can refer to a dinucleotide or a trinucleotide. Depending on the instance, the technique can also be called PseDNC or PseTNC.