Syntactic pattern recognition

Last updated

Syntactic pattern recognition, or structural pattern recognition, is a form of pattern recognition in which each object can be represented by a variable-cardinality set of symbolic nominal features. This allows for representing pattern structures, taking into account more complex relationships between attributes than is possible in the case of flat, numerical feature vectors of fixed dimensionality that are used in statistical classification.

Syntactic pattern recognition can be used instead of statistical pattern recognition if clear structure exists in the patterns. One way to present such structure is via strings of symbols from a formal language. In this case, the differences in the structures of the classes are encoded as different grammars.

An example of this would be diagnosing heart problems with electrocardiogram (ECG) measurements. ECG waveforms can be approximated with diagonal and vertical line segments. If normal and unhealthy waveforms can be described as formal grammars, ECG signals can be classified as healthy or unhealthy by first describing them in terms of the basic line segments, and then trying to parse the descriptions according to the grammars. Another example is tessellation of tiling patterns.

A second way to represent relations are graphs, where nodes are linked if corresponding subpatterns are related. An item can be assigned a certain class label if its graph representation is isomorphic with prototype graphs of that class.

Typically, patterns are constructed from simpler sub-patterns in a hierarchical fashion. This helps divide the recognition task into easier subtasks of first identifying sub-patterns, and then the actual patterns.

Structural methods provide descriptions of items, which may be useful in their own right. For example, syntactic pattern recognition can be used to determine what objects are present in an image. Furthermore, structural methods are strong when applied to finding a "correspondence mapping" between two images of an object. Under natural conditions, corresponding features will be in different positions and/or may be occluded in the two images, due to camera attitude and perspective, as in face recognition. A graph matching algorithm will yield the optimal correspondence.

See also

Related Research Articles

A syntactic category is a syntactic unit that theories of syntax assume. Word classes, largely corresponding to traditional parts of speech, are syntactic categories. In phrase structure grammars, the phrasal categories are also syntactic categories. Dependency grammars, however, do not acknowledge phrasal categories.

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent patterns. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

Lexical semantics, as a subfield of linguistic semantics, is the study of word meanings. It includes the study of how words structure their meaning, how they act in grammar and compositionality, and the relationships between the distinct senses and uses of a word.

<span class="mw-page-title-main">Graph isomorphism</span> Bijection between the vertex set of two graphs

In graph theory, an isomorphism of graphsG and H is a bijection between the vertex sets of G and H

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

In computer science, geometric hashing is a method for efficiently finding two-dimensional objects represented by discrete points that have undergone an affine transformation, though extensions exist to other object representations and transformations. In an off-line step, the objects are encoded by treating each pair of points as a geometric basis. The remaining points can be represented in an invariant fashion with respect to this basis using two parameters. For each point, its quantized transformed coordinates are stored in the hash table as a key, and indices of the basis points as a value. Then a new pair of basis points is selected, and the process is repeated. In the on-line (recognition) step, randomly selected pairs of data points are considered as candidate bases. For each candidate basis, the remaining data points are encoded according to the basis and possible correspondences from the object are found in the previously constructed table. The candidate basis is accepted if a sufficiently large number of the data points index a consistent object basis.

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

In computer science, graph transformation, or graph rewriting, concerns the technique of creating a new graph out of an original graph algorithmically. It has numerous applications, ranging from software engineering to layout algorithms and picture generation.

In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating, and independent features is crucial to produce effective algorithms for pattern recognition, classification, and regression tasks. Features are usually numeric, but other types such as strings and graphs are used in syntactic pattern recognition, after some pre-processing step such as one-hot encoding. The concept of "features" is related to that of explanatory variables used in statistical techniques such as linear regression.

When classification is performed by a computer, statistical methods are normally used to develop the algorithm.

In computer vision and image processing, a feature is a piece of information about the content of an image; typically about whether a certain region of the image has certain properties. Features may be specific structures in the image such as points, edges or objects. Features may also be the result of a general neighborhood operation or feature detection applied to the image. Other examples of features are related to motion in image sequences, or to shapes defined in terms of curves or boundaries between different image regions.

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.

In computer vision, 3D object recognition involves recognizing and determining 3D information, such as the pose, volume, or shape, of user-chosen 3D objects in a photograph or range scan. Typically, an example of the object to be recognized is presented to a vision system in a controlled environment, and then for an arbitrary input such as a video stream, the system locates the previously presented object. This can be done either off-line, or in real-time. The algorithms for solving this problem are specialized for locating a single pre-identified object, and can be contrasted with algorithms which operate on general classes of objects, such as face recognition systems or 3D generic object recognition. Due to the low cost and ease of acquiring photographs, a significant amount of research has been devoted to 3D object recognition in photographs.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

In mathematics and computer science, graph edit distance (GED) is a measure of similarity between two graphs. The concept of graph edit distance was first formalized mathematically by Alberto Sanfeliu and King-Sun Fu in 1983. A major application of graph edit distance is in inexact graph matching, such as error-tolerant pattern recognition in machine learning.

The following outline is provided as an overview of, and topical guide to, machine learning:

Graph matching is the problem of finding a similarity between graphs.

References

Schalkoff, Robert (1992). Pattern recognition - statistical, structural and neural approaches. John Wiley & sons. ISBN   0-471-55238-0.

Bunke, Horst (1993). Structural and syntactic pattern recognition, Chen, Pau & Wang (Eds.) Handbook of pattern recognition & computer vision. World Scientific. pp. 163–209. ISBN   981-02-1136-8.

Flasinski, Mariusz (2019). Syntactic pattern recognition. World Scientific. ISBN   978-981-3278-46-2.