Automatic target recognition (ATR) is the ability for an algorithm or device to recognize targets or other objects based on data obtained from sensors.
Target recognition was initially done by using an audible representation of the received signal, where a trained operator who would decipher that sound to classify the target illuminated by the radar. While these trained operators had success, automated methods have been developed and continue to be developed that allow for more accuracy and speed in classification. ATR can be used to identify man-made objects such as ground and air vehicles as well as for biological targets such as animals, humans, and vegetative clutter. This can be useful for everything from recognizing an object on a battlefield to filtering out interference caused by large flocks of birds on Doppler weather radar.
Possible military applications include a simple identification system such as an IFF transponder, and is used in other applications such as unmanned aerial vehicles and cruise missiles. There has been more and more interest shown in using ATR for domestic applications as well. Research has been done into using ATR for border security, safety systems to identify objects or people on a subway track, automated vehicles, and many others.
Target recognition has existed almost as long as radar. Radar operators would identify enemy bombers and fighters through the audio representation that was received by the reflected signal (see Radar in World War II).
Target recognition was done for years by playing the baseband signal to the operator. Listening to this signal, trained radar operators can identify various pieces of information about the illuminated target, such as the type of vehicle it is, the size of the target, and can potentially even distinguish biological targets. However, there are many limitations to this approach. The operator must be trained for what each target will sound like, if the target is traveling at a high speed it may no longer be audible, and the human decision component makes the probability of error high. However, this idea of audibly representing the signal did provide a basis for automated classification of targets. Several classifications schemes that have been developed use features of the baseband signal that have been used in other audio applications such as speech recognition.
Radar determines the distance an object is away by timing how long it takes the transmitted signal to return from the target that is illuminated by this signal. When this object is not stationary, it causes a frequency shift known as the Doppler effect. In addition to the translational motion of the entire object, an additional frequency shift can be caused by the object vibrating or spinning. When this happens the Doppler shifted signal will become modulated. This additional Doppler effect causing the modulation of the signal is known as the micro-Doppler effect. This modulation can have a certain pattern, or signature, that will allow for algorithms to be developed for ATR. The micro-Doppler effect will change over time depending on the motion of the target, causing a time and frequency varying signal. [1]
Fourier transform analysis of this signal is not sufficient since the Fourier transform cannot account for the time varying component. The simplest method to obtain a function of frequency and time is to use the short-time Fourier transform (STFT). However, more robust methods such as the Gabor transform or the Wigner distribution function (WVD) can be used to provide a simultaneous representation of the frequency and time domain. In all these methods, however, there will be a trade off between frequency resolution and time resolution. [2]
Once this spectral information is extracted, it can be compared to an existing database containing information about the targets that the system will identify and a decision can be made as to what the illuminated target is. This is done by modeling the received signal then using a statistical estimation method such as maximum likelihood (ML), majority voting (MV) or maximum a posteriori (MAP) to make a decision about which target in the library best fits the model built using the received signal.
Studies have been done that take audio features used in speech recognition to build automated target recognition systems that will identify targets based on these audio inspired coefficients. These coefficients include the
The baseband signal is processed to obtain these coefficients, then a statistical process is used to decide which target in the database is most similar to the coefficients obtained. The choice of which features and which decision scheme to use depends on the system and application.
The features used to classify a target are not limited to speech inspired coefficients. A wide range of features and detection algorithms can be used to accomplish ATR.
In order for detection of targets to be automated, a training database needs to be created. This is usually done using experimental data collected when the target is known, and is then stored for use by the ATR algorithm.
An example of a detection algorithm is shown in the flowchart. This method uses M blocks of data, extracts the desired features from each (i.e. LPC coefficients, MFCC) then models them using a Gaussian mixture model (GMM). After a model is obtained using the data collected, conditional probability is formed for each target contained in the training database. In this example, there are M blocks of data. This will result in a collection of M probabilities for each target in the database. These probabilities are used to determine what the target is using a maximum likelihood decision. This method has been shown to be able to distinguish between vehicle types (wheeled vs tracked vehicles for example), and even decide how many people are present up to three people with a high probability of success. [3]
CNN-Based Target Recognition
Convolutional neural network (CNN)-based target recognition is able to outperform the conventional methods. [4] [5] It has been proved useful in recognizing targets (i.e. battle tanks) in infrared images of real scenes after training with synthetic images, since real images of those targets are scarce. Due to the limitation of the training set, how realistic the synthetic images are matters a lot when it comes to recognize the real scenes test set.
The overall CNN networks structure contains 7 convolution layers, 3 max pooling layers and a Softmax layer as output. Max pooling layers are located after the second, the forth and the fifth convolution layer. A Global average pooling is also applied before the output. All convolution layers use Leaky ReLU nonlinearity activation function. [6]
Radar is a system that uses radio waves to determine the distance (ranging), direction, and radial velocity of objects relative to the site. It is a radiodetermination method used to detect and track aircraft, ships, spacecraft, guided missiles, motor vehicles, map weather formations, and terrain.
A Doppler radar is a specialized radar that uses the Doppler effect to produce velocity data about objects at a distance. It does this by bouncing a microwave signal off a desired target and analyzing how the object's motion has altered the frequency of the returned signal. This variation gives direct and highly accurate measurements of the radial component of a target's velocity relative to the radar. The term applies to radar systems in many domains like aviation, police radar detectors, navigation, meteorology, etc.
Ultra-wideband is a radio technology that can use a very low energy level for short-range, high-bandwidth communications over a large portion of the radio spectrum. UWB has traditional applications in non-cooperative radar imaging. Most recent applications target sensor data collection, precise locating, and tracking. UWB support started to appear in high-end smartphones in 2019.
In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
A radar speed gun, also known as a radar gun, speed gun, or speed trap gun, is a device used to measure the speed of moving objects. It is commonly used by police to check the speed of moving vehicles while conducting traffic enforcement, and in professional sports to measure speeds such as those of baseball pitches, tennis serves, and cricket bowls.
Synthetic-aperture radar (SAR) is a form of radar that is used to create two-dimensional images or three-dimensional reconstructions of objects, such as landscapes. SAR uses the motion of the radar antenna over a target region to provide finer spatial resolution than conventional stationary beam-scanning radars. SAR is typically mounted on a moving platform, such as an aircraft or spacecraft, and has its origins in an advanced form of side looking airborne radar (SLAR). The distance the SAR device travels over a target during the period when the target scene is illuminated creates the large synthetic antenna aperture. Typically, the larger the aperture, the higher the image resolution will be, regardless of whether the aperture is physical or synthetic – this allows SAR to create high-resolution images with comparatively small physical antennas. For a fixed antenna size and orientation, objects which are further away remain illuminated longer – therefore SAR has the property of creating larger synthetic apertures for more distant objects, which results in a consistent spatial resolution over a range of viewing distances.
Direction finding (DF), or radio direction finding (RDF), is the use of radio waves to determine the direction to a radio source. The source may be a cooperating radio transmitter or may be an inadvertant source, a naturally-occurring radio source, or an illicit or enemy system. Radio direction finding differs from radar in that only the direction is determined by any one receiver; a radar system usually also gives a distance to the object of interest, as well as direction. By triangulation, the location of a radio source can be determined by measuring its direction from two or more locations. Radio direction finding is used in radio navigation for ships and aircraft, to locate emergency transmitters for search and rescue, for tracking wildlife, and to locate illegal or interfering transmitters. During the Second World War, radio direction finding was used by both sides to locate and direct aircraft, surface ships, and submarines.
Imaging radar is an application of radar which is used to create two-dimensional images, typically of landscapes. Imaging radar provides its light to illuminate an area on the ground and take a picture at radio wavelengths. It uses an antenna and digital computer storage to record its images. In a radar image, one can see only the energy that was reflected back towards the radar antenna. The radar moves along a flight path and the area illuminated by the radar, or footprint, is moved along the surface in a swath, building the image as it does so.
Beamforming or spatial filtering is a signal processing technique used in sensor arrays for directional signal transmission or reception. This is achieved by combining elements in an antenna array in such a way that signals at particular angles experience constructive interference while others experience destructive interference. Beamforming can be used at both the transmitting and receiving ends in order to achieve spatial selectivity. The improvement compared with omnidirectional reception/transmission is known as the directivity of the array.
A pulse-Doppler radar is a radar system that determines the range to a target using pulse-timing techniques, and uses the Doppler effect of the returned signal to determine the target object's velocity. It combines the features of pulse radars and continuous-wave radars, which were formerly separate due to the complexity of the electronics.
Continuous-wave radar is a type of radar system where a known stable frequency continuous wave radio energy is transmitted and then received from any reflecting objects. Individual objects can be detected using the Doppler effect, which causes the received signal to have a different frequency from the transmitted signal, allowing it to be detected by filtering out the transmitted frequency.
Passive radar is a class of radar systems that detect and track objects by processing reflections from non-cooperative sources of illumination in the environment, such as commercial broadcast and communications signals. It is a specific case of bistatic radar – passive bistatic radar (PBR) – which is a broad type also including the exploitation of cooperative and non-cooperative radar transmitters.
Inverse synthetic-aperture radar (ISAR) is a radar technique using radar imaging to generate a two-dimensional high resolution image of a target. It is analogous to conventional SAR, except that ISAR technology uses the movement of the target rather than the emitter to create the synthetic aperture. ISAR radars have a significant role aboard maritime patrol aircraft to provide them with radar image of sufficient quality to allow it to be used for target recognition purposes. In situations where other radars display only a single unidentifiable bright moving pixel, the ISAR image is often adequate to discriminate between various missiles, military aircraft, and civilian aircraft.
Radar jamming and deception is a form of electronic countermeasures (ECMs) that intentionally sends out radio frequency signals to interfere with the operation of radar by saturating its receiver with noise or false information. Concepts that blanket the radar with signals so its display cannot be read are normally known as jamming, while systems that produce confusing or contradictory signals are known as deception, but it is also common for all such systems to be referred to as jamming.
A motion detector is an electrical device that utilizes a sensor to detect nearby motion. Such a device is often integrated as a component of a system that automatically performs a task or alerts a user of motion in an area. They form a vital component of security, automated lighting control, home control, energy efficiency, and other useful systems. It can be achieved by either mechanical or electronic methods. When it is done by natural organisms, it is called motion perception.
Radar MASINT is a subdiscipline of measurement and signature intelligence (MASINT) and refers to intelligence gathering activities that bring together disparate elements that do not fit within the definitions of signals intelligence (SIGINT), imagery intelligence (IMINT), or human intelligence (HUMINT).
Moving target indication (MTI) is a mode of operation of a radar to discriminate a target against the clutter. It describes a variety of techniques used for finding moving objects, like an aircraft, and filter out unmoving ones, like hills or trees. It contrasts with the modern stationary target indication (STI) technique, which uses details of the signal to directly determine the mechanical properties of the reflecting objects and thereby find targets whether they are moving or not.
Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network.
Pulse-Doppler signal processing is a radar and CEUS performance enhancement strategy that allows small high-speed objects to be detected in close proximity to large slow moving objects. Detection improvements on the order of 1,000,000:1 are common. Small fast moving objects can be identified close to terrain, near the sea surface, and inside storms.
A track algorithm is a radar and sonar performance enhancement strategy. Tracking algorithms provide the ability to predict future position of multiple moving objects based on the history of the individual positions being reported by sensor systems. Historical information is accumulated and used to predict future position for use with air traffic control, threat estimation, combat system doctrine, gun aiming, missile guidance, and torpedo delivery. Position data is accumulated over the span of a few minutes to a few weeks.