Spectrogram

Last updated
Spectrogram of the spoken words "nineteenth century". Frequencies are shown increasing up the vertical axis, and time on the horizontal axis. The legend to the right shows that the color intensity increases with the density. Spectrogram-19thC.png
Spectrogram of the spoken words "nineteenth century". Frequencies are shown increasing up the vertical axis, and time on the horizontal axis. The legend to the right shows that the color intensity increases with the density.
A 3D spectrogram: The RF spectrum of a battery charger is shown over time 3D battery charger RF spectrum over time.jpg
A 3D spectrogram: The RF spectrum of a battery charger is shown over time

A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represented in a 3D plot they may be called waterfall displays .

Contents

Spectrograms are used extensively in the fields of music, linguistics, sonar, radar, speech processing, [1] seismology, ornithology, and others. Spectrograms of audio can be used to identify spoken words phonetically, and to analyse the various calls of animals.

A spectrogram can be generated by an optical spectrometer, a bank of band-pass filters, by Fourier transform or by a wavelet transform (in which case it is also known as a scaleogram or scalogram). [2]

Scaleograms from the DWT and CWT for an audio sample Scaleogram.png
Scaleograms from the DWT and CWT for an audio sample

A spectrogram is usually depicted as a heat map, i.e., as an image with the intensity shown by varying the colour or brightness.

Format

A common format is a graph with two geometric dimensions: one axis represents time, and the other axis represents frequency; a third dimension indicating the amplitude of a particular frequency at a particular time is represented by the intensity or color of each point in the image.

There are many variations of format: sometimes the vertical and horizontal axes are switched, so time runs up and down; sometimes as a waterfall plot where the amplitude is represented by height of a 3D surface instead of color or intensity. The frequency and amplitude axes can be either linear or logarithmic, depending on what the graph is being used for. Audio would usually be represented with a logarithmic amplitude axis (probably in decibels, or dB), and frequency would be linear to emphasize harmonic relationships, or logarithmic to emphasize musical, tonal relationships.

Sound spectrography of infrasound recording 30301

Generation

Spectrograms of light may be created directly using an optical spectrometer over time.

Spectrograms may be created from a time-domain signal in one of two ways: approximated as a filterbank that results from a series of band-pass filters (this was the only way before the advent of modern digital signal processing), or calculated from the time signal using the Fourier transform. These two methods actually form two different time–frequency representations, but are equivalent under some conditions.

The bandpass filters method usually uses analog processing to divide the input signal into frequency bands; the magnitude of each filter's output controls a transducer that records the spectrogram as an image on paper. [3]

Creating a spectrogram using the FFT is a digital process. Digitally sampled data, in the time domain, is broken up into chunks, which usually overlap, and Fourier transformed to calculate the magnitude of the frequency spectrum for each chunk. Each chunk then corresponds to a vertical line in the image; a measurement of magnitude versus frequency for a specific moment in time (the midpoint of the chunk). These spectrums or time plots are then "laid side by side" to form the image or a three-dimensional surface, [4] or slightly overlapped in various ways, i.e. windowing. This process essentially corresponds to computing the squared magnitude of the short-time Fourier transform (STFT) of the signal — that is, for a window width , . [5]

Limitations and resynthesis

From the formula above, it appears that a spectrogram contains no information about the exact, or even approximate, phase of the signal that it represents. For this reason, it is not possible to reverse the process and generate a copy of the original signal from a spectrogram, though in situations where the exact initial phase is unimportant it may be possible to generate a useful approximation of the original signal. The Analysis & Resynthesis Sound Spectrograph [6] is an example of a computer program that attempts to do this. The pattern playback was an early speech synthesizer, designed at Haskins Laboratories in the late 1940s, that converted pictures of the acoustic patterns of speech (spectrograms) back into sound.

In fact, there is some phase information in the spectrogram, but it appears in another form, as time delay (or group delay) which is the dual of the instantaneous frequency. [7]

The size and shape of the analysis window can be varied. A smaller (shorter) window will produce more accurate results in timing, at the expense of precision of frequency representation. A larger (longer) window will provide a more precise frequency representation, at the expense of precision in timing representation. This is an instance of the Heisenberg uncertainty principle, that the product of the precision in two conjugate variables is greater than or equal to a constant (B*T>=1 in the usual notation). [8]

Applications

See also

Related Research Articles

Additive synthesis is a sound synthesis technique that creates timbre by adding sine waves together.

Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a sequence of numbers that represent samples of a continuous variable in a domain such as time, space, or frequency. In digital electronics, a digital signal is represented as a pulse train, which is typically generated by the switching of a transistor.

<span class="mw-page-title-main">Signal processing</span> Field of electrical engineering

Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing signals, such as sound, images, potential fields, seismic signals, altimetry processing, and scientific measurements. Signal processing techniques are used to optimize transmissions, digital storage efficiency, correcting distorted signals, improve subjective video quality, and to detect or pinpoint components of interest in a measured signal.

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling is the opposite: the process of changing the pitch without affecting the speed. Pitch shift is pitch scaling implemented in an effects unit and intended for live performance. Pitch control is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording.

<span class="mw-page-title-main">Wavelet</span> Function for integral Fourier-like transform

A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the number and direction of its pulses. Wavelets are imbued with specific properties that make them useful for signal processing.

In signal processing and electronics, the frequency response of a system is the quantitative measure of the magnitude and phase of the output as a function of input frequency. The frequency response is widely used in the design and analysis of systems, such as audio and control systems, where they simplify mathematical analysis by converting governing differential equations into algebraic equations. In an audio system, it may be used to minimize audible distortion by designing components so that the overall response is as flat (uniform) as possible across the system's bandwidth. In control systems, such as a vehicle's cruise control, it may be used to assess system stability, often through the use of Bode plots. Systems with a specific frequency response can be designed using analog and digital filters.

<span class="mw-page-title-main">Spectrum analyzer</span> Electronic testing device

A spectrum analyzer measures the magnitude of an input signal versus frequency within the full frequency range of the instrument. The primary use is to measure the power of the spectrum of known and unknown signals. The input signal that most common spectrum analyzers measure is electrical; however, spectral compositions of other signals, such as acoustic pressure waves and optical light waves, can be considered through the use of an appropriate transducer. Spectrum analyzers for other types of signals also exist, such as optical spectrum analyzers which use direct optical techniques such as a monochromator to make measurements.

<span class="mw-page-title-main">Frequency domain</span> Signal representation

In mathematics, physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency, rather than time, as in time series. Put simply, a time-domain graph shows how a signal changes over time, whereas a frequency-domain graph shows how the signal is distributed within different frequency bands over a range of frequencies. A complex valued frequency-domain representation consists of both the magnitude and the phase of a set of sinusoids at the frequency components of the signal. Although it is common to refer to the magnitude portion as the frequency response of a signal, the phase portion is required to uniquely define the signal.

<span class="mw-page-title-main">Short-time Fourier transform</span> Fourier-related transform suited to signals that change rather quickly in time

The short-time Fourier transform (STFT) is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divide a longer time signal into shorter segments of equal length and then compute the Fourier transform separately on each shorter segment. This reveals the Fourier spectrum on each shorter segment. One then usually plots the changing spectra as a function of time, known as a spectrogram or waterfall plot, such as commonly used in software defined radio (SDR) based spectrum displays. Full bandwidth displays covering the whole range of an SDR commonly use fast Fourier transforms (FFTs) with 2^24 points on desktop computers.

Audio analysis refers to the extraction of information and meaning from audio signals for analysis, classification, storage, retrieval, synthesis, etc. The observation mediums and interpretation methods vary, as audio analysis can refer to the human ear and how people interpret the audible sound source, or it could refer to using technology such as an audio analyzer to evaluate other qualities of a sound source such as amplitude, distortion, frequency response. Once an audio source's information has been observed, the information revealed can then be processed for the logical, emotional, descriptive, or otherwise relevant interpretation by the user.

Stransform as a time–frequency distribution was developed in 1994 for analyzing geophysics data. In this way, the S transform is a generalization of the short-time Fourier transform (STFT), extending the continuous wavelet transform and overcoming some of its disadvantages. For one, modulation sinusoids are fixed with respect to the time axis; this localizes the scalable Gaussian window dilations and translations in S transform. Moreover, the S transform doesn't have a cross-term problem and yields a better signal clarity than Gabor transform. However, the S transform has its own disadvantages: the clarity is worse than Wigner distribution function and Cohen's class distribution function.

A phase vocoder is a type of vocoder-purposed algorithm which can interpolate information present in the frequency and time domains of audio signals by using phase information extracted from a frequency transform. The computer algorithm allows frequency-domain modifications to a digital sound file.

<span class="mw-page-title-main">Wavelet transform</span> Mathematical technique used in data compression and analysis

In mathematics, a wavelet series is a representation of a square-integrable function by a certain orthonormal series generated by a wavelet. This article provides a formal, mathematical definition of an orthonormal wavelet and of the integral wavelet transform.

<span class="mw-page-title-main">Reassignment method</span>

The method of reassignment is a technique for sharpening a time-frequency representation by mapping the data to time-frequency coordinates that are nearer to the true region of support of the analyzed signal. The method has been independently introduced by several parties under various names, including method of reassignment, remapping, time-frequency reassignment, and modified moving-window method. The method of reassignment sharpens blurry time-frequency data by relocating the data according to local estimates of instantaneous frequency and group delay. This mapping to reassigned time-frequency coordinates is very precise for signals that are separable in time and frequency with respect to the analysis window.

The Hilbert–Huang transform (HHT) is a way to decompose a signal into so-called intrinsic mode functions (IMF) along with a trend, and obtain instantaneous frequency data. It is designed to work well for data that is nonstationary and nonlinear. In contrast to other common transforms like the Fourier transform, the HHT is an algorithm that can be applied to a data set, rather than a theoretical tool.

Automatic target recognition (ATR) is the ability for an algorithm or device to recognize targets or other objects based on data obtained from sensors.

Fault detection, isolation, and recovery (FDIR) is a subfield of control engineering which concerns itself with monitoring a system, identifying when a fault has occurred, and pinpointing the type of fault and its location. Two approaches can be distinguished: A direct pattern recognition of sensor readings that indicate a fault and an analysis of the discrepancy between the sensor readings and expected values, derived from some model. In the latter case, it is typical that a fault is said to be detected if the discrepancy or residual goes above a certain threshold. It is then the task of fault isolation to categorize the type of fault and its location in the machinery. Fault detection and isolation (FDI) techniques can be broadly classified into two categories. These include model-based FDI and signal processing based FDI.

Time–frequency analysis for music signals is one of the applications of time–frequency analysis. Musical sound can be more complicated than human vocal sound, occupying a wider band of frequency. Music signals are time-varying signals; while the classic Fourier transform is not sufficient to analyze them, time–frequency analysis is an efficient tool for such use. Time–frequency analysis is extended from the classic Fourier approach. Short-time Fourier transform (STFT), Gabor transform (GT) and Wigner distribution function (WDF) are famous time–frequency methods, useful for analyzing music signals such as notes played on a piano, a flute or a guitar.

<span class="mw-page-title-main">Audio forensics</span>

Audio forensics is the field of forensic science relating to the acquisition, analysis, and evaluation of sound recordings that may ultimately be presented as admissible evidence in a court of law or some other official venue.

Perceptual-based 3D sound localization is the application of knowledge of the human auditory system to develop 3D sound localization technology.

References

  1. JL Flanagan, Speech Analysis, Synthesis and Perception, Springer- Verlag, New York, 1972
  2. Sejdic, E.; Djurovic, I.; Stankovic, L. (August 2008). "Quantitative Performance Analysis of Scalogram as Instantaneous Frequency Estimator". IEEE Transactions on Signal Processing. 56 (8): 3837–3845. Bibcode:2008ITSP...56.3837S. doi:10.1109/TSP.2008.924856. ISSN   1053-587X. S2CID   16396084.
  3. "Spectrograph". www.sfu.ca. Retrieved 7 April 2018.
  4. "Spectrograms". ccrma.stanford.edu. Retrieved 7 April 2018.
  5. "STFT Spectrograms VI – NI LabVIEW 8.6 Help". zone.ni.com. Retrieved 7 April 2018.
  6. "The Analysis & Resynthesis Sound Spectrograph". arss.sourceforge.net. Retrieved 7 April 2018.
  7. Boashash, B. (1992). "Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals". Proceedings of the IEEE. 80 (4). Institute of Electrical and Electronics Engineers (IEEE): 520–538. doi:10.1109/5.135376. ISSN   0018-9219.
  8. "Heisenberg Uncertainty Principle". Archived from the original on 2019-01-25. Retrieved 2019-02-05.
  9. "BIRD SONGS AND CALLS WITH SPECTROGRAMS ( SONOGRAMS ) OF SOUTHERN TUSCANY ( Toscana – Italy )". www.birdsongs.it. Retrieved 7 April 2018.
  10. Saunders, Frank A.; Hill, William A.; Franklin, Barbara (1 December 1981). "A wearable tactile sensory aid for profoundly deaf children". Journal of Medical Systems. 5 (4): 265–270. doi:10.1007/BF02222144. PMID   7320662. S2CID   26620843.
  11. "Spectrogram Reading". ogi.edu. Archived from the original on 27 April 1999. Retrieved 7 April 2018.
  12. "Praat: doing Phonetics by Computer". www.fon.hum.uva.nl. Retrieved 7 April 2018.
  13. "The Aphex Face – bastwood". www.bastwood.com. Retrieved 7 April 2018.
  14. "SRC Comparisons". src.infinitewave.ca. Retrieved 7 April 2018.
  15. "constantwave.com – constantwave Resources and Information". www.constantwave.com. Retrieved 7 April 2018.
  16. "Spectrograms for vector network analyzers". Archived from the original on 2012-08-10.
  17. "Real-time Spectrogram Displays". earthquake.usgs.gov. Retrieved 7 April 2018.
  18. "IRIS: MUSTANG: Noise-Spectrogram: Docs: v. 1: Help".
  19. Geitgey, Adam (2016-12-24). "Machine Learning is Fun Part 6: How to do Speech Recognition with Deep Learning". Medium. Retrieved 2018-03-21.
  20. See also Praat.
  21. "China's enormous surveillance state is still growing" . The Economist . November 23, 2023. ISSN   0013-0613 . Retrieved 2023-11-25.
  22. "What is a Spectrogram?" . Retrieved 2023-12-18.
  23. T., Arias-Vergara; P., Klumpp; J. C., Vasquez-Correa; E., Nöth; J. R., Orozco-Arroyave; M., Schuster (2021). "Multi-channel spectrograms for speech processing applications using deep learning methods". Pattern Analysis and Applications. 24 (2): 423–431. doi: 10.1007/s10044-020-00921-5 .
  24. Jia, Yanjie; Chen, Xi; Yu, Jieqiong; Wang, Lianming; Xu, Yuanzhe; Liu, Shaojin; Wang, Yonghui (2021). "Speaker recognition based on characteristic spectrograms and an improved self-organizing feature map neural network". Complex & Intelligent Systems. 7 (4): 1749–1757. doi: 10.1007/s40747-020-00172-1 .
  25. Yalamanchili, Arpitha; Madhumathi, G. L.; Balaji, N. (2022). "Spectrogram analysis of ECG signal and classification efficiency using MFCC feature extraction technique". Journal of Ambient Intelligence and Humanized Computing. 13 (2): 757–767. doi:10.1007/s12652-021-02926-2. S2CID   233657057.
  26. Ge, Junfeng; Wang, Li; Gui, Kang; Ye, Lin (30 September 2023). "Temperature interpretation method for temperature indicating paint based on spectrogram". Measurement. 219. Bibcode:2023Meas..21913317G. doi:10.1016/j.measurement.2023.113317. S2CID   259871198.
  27. Park, Cheolhyeong; Lee, Deokwoo (11 February 2022). "Classification of Respiratory States Using Spectrogram with Convolutional Neural Network". Applied Sciences. 12 (4): 1895. doi: 10.3390/app12041895 .