PhyCV is the first computer vision library which utilizes algorithms directly derived from the equations of physics governing physical phenomena. The algorithms appearing in the first release emulate the propagation of light through a physical medium with natural and engineered diffractive properties followed by coherent detection. Unlike traditional algorithms that are a sequence of hand-crafted empirical rules, physics-inspired algorithms leverage physical laws of nature as blueprints. In addition, these algorithms can, in principle, be implemented in real physical devices for fast and efficient computation in the form of analog computing. [1]
Currently PhyCV has three algorithms, Phase-Stretch Transform (PST) and Phase-Stretch Adaptive Gradient-Field Extractor (PAGE), and Vision Enhancement via Virtual diffraction and coherent Detection (VEViD). All algorithms have CPU and GPU versions. PhyCV is now available on GitHub and can be installed from pip.
Algorithms in PhyCV are inspired by the physics of the photonic time stretch [2] [3] (a hardware technique for ultrafast and single-shot data acquisition). PST is an edge detection algorithm that was open-sourced in 2016 and has 800+ stars and 200+ forks on GitHub. PAGE is a directional edge detection algorithm that was open-sourced in February, 2022. PhyCV was originally developed and open-sourced by Jalali-Lab @ UCLA in May 2022. In the initial release of PhyCV, the original open-sourced code of PST and PAGE is significantly refactored and improved to be modular, more efficient, GPU-accelerated and object-oriented. VEViD is a low-light and color enhancement algorithm that was added to PhyCV in November 2022.
Phase-Stretch Transform (PST) is a computationally efficient edge and texture detection algorithm with exceptional performance in visually impaired images. [4] [5] [6] The algorithm transforms the image by emulating propagation of light through a device with engineered diffractive property followed by coherent detection. It has been applied in improving the resolution of MRI image, [7] extracting blood vessels in retina images, [8] dolphin identification, [9] and waste water treatment, [10] single molecule biological imaging, [11] and classification of UAV using micro Doppler imaging. [12]
Phase-Stretch Adaptive Gradient-Field Extractor (PAGE) is a physics-inspired algorithm for detecting edges and their orientations in digital images at various scales. [13] [14] The algorithm is based on the diffraction equations of optics. Metaphorically speaking, PAGE emulates the physics of birefringent (orientation-dependent) diffractive propagation through a physical device with a specific diffractive structure. The propagation converts a real-valued image into a complex function. Related information is contained in the real and imaginary components of the output. The output represents the phase of the complex function.
Vision Enhancement via Virtual diffraction and coherent Detection (VEViD) a efficient and interpretable low-light and color enhancement algorithm that reimagines a digital image as a spatially varying metaphoric light field and then subjects the field to the physical processes akin to diffraction and coherent detection. [15] The term “Virtual” captures the deviation from the physical world. The light field is pixelated and the propagation imparts a phase with an arbitrary dependence on frequency which can be different from the quadratic behavior of physical diffraction. VEViD can be further accelerated through mathematical approximations that reduce the computation time without appreciable sacrifice in image quality. A closed-form approximation for VEViD which we call VEViD-lite can achieve up to 200 FPS for 4K video enhancement.
Featuring low-dimensionality and high-efficiency, PhyCV is ideal for edge computing applications. In this section, we demonstrate running PhyCV on NVIDIA Jetson Nano in real-time.
NVIDIA Jetson Nano Developer Kit is a small- sized and power-efficient platform for edge computing applications. It is equipped with an NVIDIA Maxwell architecture GPU with 128 CUDA cores, a quad-core ARM Cortex-A57 CPU, 4GB 64-bit LPDDR4 RAM, and supports video encoding and decoding up to 4K resolution. Jetson Nano also offers a variety of interfaces for connectivity and expansion, making it ideal for a wide range of AI and IoT applications. In our setup, we connect a USB camera to the Jetson Nano to acquire videos and demonstrate using PhyCV to process the videos in real-time.
We use the Jetson Nano (4GB) with NVIDIA JetPack SDK version 4.6.1, which comes with pre- installed Python 3.6, CUDA 10.2, and OpenCV 4.1.1. We further install PyTorch 1.10 to enable the GPU accelerated PhyCV. We demonstrate the results and metrics of running PhyCV on Jetson Nano in real-time for edge detection and low-light enhancement tasks. For 480p videos, both operations achieve beyond 38 FPS, which is sufficient for most cameras that capture videos at 30 FPS. For 720p videos, PhyCV low-light enhancement can operate at 24 FPS and PhyCV edge detection can operate at 17 FPS.
PhyCV Edge Detection | PhyCV Low-light Enhancement | |
---|---|---|
480p (640 x 480) | 25.9 ms | 24.5 ms |
720p (1280 x 720) | 58.5 ms | 41.1 ms |
The code in PhyCV has a modular design which faithfully follows the physical process from which the algorithm was originated. Both PST and PAGE modules in the PhyCV library emulate the propagation of the input signal (original digital image) through a device with engineered diffractive property followed by coherent (phase) detection. The dispersive propagation applies a phase kernel to the frequency domain of the original image. This process has three steps in general, loading the image, initializing the kernel and applying the kernel. In the implementation of PhyCV, each algorithm is represented as a class in Python and each class has methods that simulate the steps described above. The modular code architecture follows the physics behind the algorithm. Please refer to the source code on GitHub for more details.
PhyCV supports GPU acceleration. The GPU versions of PST and PAGE are built on PyTorch accelerated by the CUDA toolkit. The acceleration is beneficial for applying the algorithms in real-time image video processing and other deep learning tasks. The running time per frame of PhyCV algorithms on CPU (Intel i9-9900K) and GPU (NVIDIA TITAN RTX) for videos at different resolutions are shown below. Note that the PhyCV low-light enhancement operates in the HSV color space, so the running time also includes RGB to HSV conversion. However, for all running times using GPUs, we ignore the time of moving data from CPUs to GPUs and count the algorithm operation time only.
CPU | GPU | |
---|---|---|
1080p (1920x1080) | 550 ms | 4.6 ms |
2K (2560 x 1440) | 1000 ms | 8.2 ms |
4K (3840 x 2160) | 2290 ms | 18.5 ms |
CPU | GPU | |
---|---|---|
1080p (1920x1080) | 2800 ms | 48.5 ms |
2K (2560 x 1440) | 5000 ms | 87 ms |
4K (3840 x 2160) | 11660 ms | 197 ms |
CPU | GPU | |
---|---|---|
1080p (1920x1080) | 175 ms | 4.3 ms |
2K (2560 x 1440) | 320 ms | 7.8 ms |
4K (3840 x 2160) | 730 ms | 17.9 ms |
CPU | GPU | |
---|---|---|
1080p (1920x1080) | 60 ms | 2.1 ms |
2K (2560 x 1440) | 110 ms | 3.5 ms |
4K (3840 x 2160) | 245 ms | 7.4 ms |
Please refer to the GitHub README file for a detailed technical documentation.
When dealing with real-time video streams from cameras, the frames are captured and buffered in CPU and have to be moved to GPU to run the GPU-accelerated PhyCV algorithms. This process is time-consuming and it is a common bottleneck for real-time video-processing algorithms.
Currently, the parameters of PhyCV algorithms have to be manually tuned for different images. Although a set of pre-selected parameters work relatively well for a wide range of images, the lack of parameter adaptivity for different images remains a limitation for now.
Diffraction is the interference or bending of waves around the corners of an obstacle or through an aperture into the region of geometrical shadow of the obstacle/aperture. The diffracting object or aperture effectively becomes a secondary source of the propagating wave. Italian scientist Francesco Maria Grimaldi coined the word diffraction and was the first to record accurate observations of the phenomenon in 1660.
In physics, a plasmon is a quantum of plasma oscillation. Just as light consists of photons, the plasma oscillation consists of plasmons. The plasmon can be considered as a quasiparticle since it arises from the quantization of plasma oscillations, just like phonons are quantizations of mechanical vibrations. Thus, plasmons are collective oscillations of the free electron gas density. For example, at optical frequencies, plasmons can couple with a photon to create another quasiparticle called a plasmon polariton.
General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.
A physics engine is computer software that provides an approximate simulation of certain physical systems, such as rigid body dynamics, soft body dynamics, and fluid dynamics, of use in the domains of computer graphics, video games and film (CGI). Their main uses are in video games, in which case the simulations are in real-time. The term is sometimes used more generally to describe any software system for simulating physical phenomena, such as high-performance scientific simulation.
A physics processing unit (PPU) is a dedicated microprocessor designed to handle the calculations of physics, especially in the physics engine of video games. It is an example of hardware acceleration.
Medical optical imaging is the use of light as an investigational imaging technique for medical applications, pioneered by American Physical Chemist Britton Chance. Examples include optical microscopy, spectroscopy, endoscopy, scanning laser ophthalmoscopy, laser Doppler imaging, and optical coherence tomography. Because light is an electromagnetic wave, similar phenomena occur in X-rays, microwaves, and radio waves.
A superlens, or super lens, is a lens which uses metamaterials to go beyond the diffraction limit. The diffraction limit is a feature of conventional lenses and microscopes that limits the fineness of their resolution depending on the illumination wavelength and the numerical aperture (NA) of the objective lens. Many lens designs have been proposed that go beyond the diffraction limit in some way, but constraints and obstacles face each of them.
Compute Unified Device Architecture (CUDA) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations (like moving data between the CPU and the GPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.
Computer-generated holography (CGH) is a technique that uses computer algorithms to generate holograms. It involves generating holographic interference patterns. A computer-generated hologram can be displayed on a dynamic holographic display, or it can be printed onto a mask or film using lithography. When a hologram is printed onto a mask or film, it is then illuminated by a coherent light source to display the holographic images.
Coherent diffractive imaging (CDI) is a "lensless" technique for 2D or 3D reconstruction of the image of nanoscale structures such as nanotubes, nanocrystals, porous nanocrystalline layers, defects, potentially proteins, and more. In CDI, a highly coherent beam of X-rays, electrons or other wavelike particle or photon is incident on an object.
Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics processing unit (GPU), northbridge, southbridge, and memory controller onto one package. Early Tegra SoCs are designed as efficient multimedia processors. The Tegra-line evolved to emphasize performance for gaming and machine learning applications without sacrificing power efficiency, before taking a drastic shift in direction towards platforms that provide vehicular automation with the applied "Nvidia Drive" brand name on reference boards and its semiconductors; and with the "Nvidia Jetson" brand name for boards adequate for AI applications within e.g. robots or drones, and for various smart high level automation purposes.
Ptychography is a computational method of microscopic imaging. It generates images by processing many coherent interference patterns that have been scattered from an object of interest. Its defining characteristic is translational invariance, which means that the interference patterns are generated by one constant function moving laterally by a known amount with respect to another constant function. The interference patterns occur some distance away from these two components, so that the scattered waves spread out and "fold" into one another as shown in the figure.
The time-stretch analog-to-digital converter (TS-ADC), also known as the time-stretch enhanced recorder (TiSER), is an analog-to-digital converter (ADC) system that has the capability of digitizing very high bandwidth signals that cannot be captured by conventional electronic ADCs. Alternatively, it is also known as the photonic time-stretch (PTS) digitizer, since it uses an optical frontend. It relies on the process of time-stretch, which effectively slows down the analog signal in time before it can be digitized by a standard electronic ADC.
Phase-contrast X-ray imaging or phase-sensitive X-ray imaging is a general term for different technical methods that use information concerning changes in the phase of an X-ray beam that passes through an object in order to create its images. Standard X-ray imaging techniques like radiography or computed tomography (CT) rely on a decrease of the X-ray beam's intensity (attenuation) when traversing the sample, which can be measured directly with the assistance of an X-ray detector. However, in phase contrast X-ray imaging, the beam's phase shift caused by the sample is not measured directly, but is transformed into variations in intensity, which then can be recorded by the detector.
An electromagnetic metasurface refers to a kind of artificial sheet material with sub-wavelength thickness. Metasurfaces can be either structured or unstructured with subwavelength-scaled patterns in the horizontal dimensions.
Fourier ptychography is a computational imaging technique based on optical microscopy that consists in the synthesis of a wider numerical aperture from a set of full-field images acquired at various coherent illumination angles, resulting in increased resolution compared to a conventional microscope.
Phase stretch transform (PST) is a computational approach to signal and image processing. One of its utilities is for feature detection and classification. PST is related to time stretch dispersive Fourier transform. It transforms the image by emulating propagation through a diffractive medium with engineered 3D dispersive property. The operation relies on symmetry of the dispersion profile and can be understood in terms of dispersive eigenfunctions or stretch modes. PST performs similar functionality as phase-contrast microscopy, but on digital images. PST can be applied to digital images and temporal data. It is a physics-based feature engineering algorithm.
Nano-FTIR is a scanning probe technique that utilizes as a combination of two techniques: Fourier transform infrared spectroscopy (FTIR) and scattering-type scanning near-field optical microscopy (s-SNOM). As s-SNOM, nano-FTIR is based on atomic-force microscopy (AFM), where a sharp tip is illuminated by an external light source and the tip-scattered light is detected as a function of tip position. A typical nano-FTIR setup thus consists of an atomic force microscope, a broadband infrared light source used for tip illumination, and a Michelson interferometer acting as Fourier-transform spectrometer. In nano-FTIR, the sample stage is placed in one of the interferometer arms, which allows for recording both amplitude and phase of the detected light. Scanning the tip allows for performing hyperspectral imaging with nanoscale spatial resolution determined by the tip apex size. The use of broadband infrared sources enables the acquisition of continuous spectra, which is a distinctive feature of nano-FTIR compared to s-SNOM. Nano-FTIR is capable of performing infrared (IR) spectroscopy of materials in ultrasmall quantities and with nanoscale spatial resolution. The detection of a single molecular complex and the sensitivity to a single monolayer has been shown. Recording infrared spectra as a function of position can be used for nanoscale mapping of the sample chemical composition, performing a local ultrafast IR spectroscopy and analyzing the nanoscale intermolecular coupling, among others. A spatial resolution of 10 nm to 20 nm is routinely achieved.
Nvidia Jetson is a series of embedded computing boards from Nvidia. The Jetson TK1, TX1 and TX2 models all carry a Tegra processor from Nvidia that integrates an ARM architecture central processing unit (CPU). Jetson is a low-power system and is designed for accelerating machine learning applications.
John Marius Rodenburg is emeritus professor in the Department of Electronic and Electrical Engineering at the University of Sheffield. He was elected a Fellow of the Royal Society (FRS) in 2019 for "internationally recognised... work on revolutionising the imaging capability of light, X-ray and electron transmission microscopes".