Digital image processing is the use of a digital computer to process digital images through an algorithm. [1] [2] As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and distortion during processing. Since images are defined over two dimensions (perhaps more) digital image processing may be modeled in the form of multidimensional systems. The generation and development of digital image processing are mainly affected by three factors: first, the development of computers; [3] second, the development of mathematics (especially the creation and improvement of discrete mathematics theory); [4] third, the demand for a wide range of applications in environment, agriculture, military, industry and medical science has increased. [5]
Many of the techniques of digital image processing, or digital picture processing as it often was called, were developed in the 1960s, at Bell Laboratories, the Jet Propulsion Laboratory, Massachusetts Institute of Technology, University of Maryland, and a few other research facilities, with application to satellite imagery, wire-photo standards conversion, medical imaging, videophone, character recognition, and photograph enhancement. [6] The purpose of early image processing was to improve the quality of the image. It was aimed for human beings to improve the visual effect of people. In image processing, the input is a low-quality image, and the output is an image with improved quality. Common image processing include image enhancement, restoration, encoding, and compression. The first successful application was the American Jet Propulsion Laboratory (JPL). They useD image processing techniques such as geometric correction, gradation transformation, noise removal, etc. on the thousands of lunar photos sent back by the Space Detector Ranger 7 in 1964, taking into account the position of the Sun and the environment of the Moon. The impact of the successful mapping of the Moon's surface map by the computer has been a success. Later, more complex image processing was performed on the nearly 100,000 photos sent back by the spacecraft, so that the topographic map, color map and panoramic mosaic of the Moon were obtained, which achieved extraordinary results and laid a solid foundation for human landing on the Moon. [7]
The cost of processing was fairly high, however, with the computing equipment of that era. That changed in the 1970s, when digital image processing proliferated as cheaper computers and dedicated hardware became available. This led to images being processed in real-time, for some dedicated problems such as television standards conversion. As general-purpose computers became faster, they started to take over the role of dedicated hardware for all but the most specialized and computer-intensive operations. With the fast computers and signal processors available in the 2000s, digital image processing has become the most common form of image processing, and is generally used because it is not only the most versatile method, but also the cheapest.
The basis for modern image sensors is metal–oxide–semiconductor (MOS) technology, [8] invented at Bell Labs between 1955 and 1960, [9] [10] [11] [12] [13] [14] This led to the development of digital semiconductor image sensors, including the charge-coupled device (CCD) and later the CMOS sensor. [8]
The charge-coupled device was invented by Willard S. Boyle and George E. Smith at Bell Labs in 1969. [15] While researching MOS technology, they realized that an electric charge was the analogy of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. [8] The CCD is a semiconductor circuit that was later used in the first digital video cameras for television broadcasting. [16]
The NMOS active-pixel sensor (APS) was invented by Olympus in Japan during the mid-1980s. This was enabled by advances in MOS semiconductor device fabrication, with MOSFET scaling reaching smaller micron and then sub-micron levels. [17] [18] The NMOS APS was fabricated by Tsutomu Nakamura's team at Olympus in 1985. [19] The CMOS active-pixel sensor (CMOS sensor) was later developed by Eric Fossum's team at the NASA Jet Propulsion Laboratory in 1993. [20] By 2007, sales of CMOS sensors had surpassed CCD sensors. [21]
MOS image sensors are widely used in optical mouse technology. The first optical mouse, invented by Richard F. Lyon at Xerox in 1980, used a 5 μm NMOS integrated circuit sensor chip. [22] [23] Since the first commercial optical mouse, the IntelliMouse introduced in 1999, most optical mouse devices use CMOS sensors. [24] [25]
An important development in digital image compression technology was the discrete cosine transform (DCT), a lossy compression technique first proposed by Nasir Ahmed in 1972. [26] DCT compression became the basis for JPEG, which was introduced by the Joint Photographic Experts Group in 1992. [27] JPEG compresses images down to much smaller file sizes, and has become the most widely used image file format on the Internet. [28] Its highly efficient DCT compression algorithm was largely responsible for the wide proliferation of digital images and digital photos, [29] with several billion JPEG images produced every day as of 2015 [update] . [30]
Medical imaging techniques produce very large amounts of data, especially from CT, MRI and PET modalities. As a result, storage and communications of electronic image data are prohibitive without the use of compression. [31] [32] JPEG 2000 image compression is used by the DICOM standard for storage and transmission of medical images. The cost and feasibility of accessing large image data sets over low or various bandwidths are further addressed by use of another DICOM standard, called JPIP, to enable efficient streaming of the JPEG 2000 compressed image data. [33]
Electronic signal processing was revolutionized by the wide adoption of MOS technology in the 1970s. [34] MOS integrated circuit technology was the basis for the first single-chip microprocessors and microcontrollers in the early 1970s, [35] and then the first single-chip digital signal processor (DSP) chips in the late 1970s. [36] [37] DSP chips have since been widely used in digital image processing. [36]
The discrete cosine transform (DCT) image compression algorithm has been widely implemented in DSP chips, with many companies developing DSP chips based on DCT technology. DCTs are widely used for encoding, decoding, video coding, audio coding, multiplexing, control signals, signaling, analog-to-digital conversion, formatting luminance and color differences, and color formats such as YUV444 and YUV411. DCTs are also used for encoding operations such as motion estimation, motion compensation, inter-frame prediction, quantization, perceptual weighting, entropy encoding, variable encoding, and motion vectors, and decoding operations such as the inverse operation between different color formats (YIQ, YUV and RGB) for display purposes. DCTs are also commonly used for high-definition television (HDTV) encoder/decoder chips. [38]
In 1972, engineer Godfrey Hounsfield from the British company EMI invented the X-ray computed tomography (CT) device for head diagnosis, which is commonly referred to as CT (computed tomography). The CT nucleus method is based on the projecting X-rays through a section of the human head, which are then processed by computer to reconstruct the cross-sectional image, known as image reconstruction. In 1975, EMI successfully developed a CT device for the entire body, enabling the clear acquisition of tomographic images of various parts of the human body. This revolutionary diagnostic technique earned Hounsfield and physicist Allan Cormack the Nobel Prize in Physiology or Medicine in 1979. [7] Digital image processing technology for medical applications was inducted into the Space Foundation's Space Technology Hall of Fame in 1994. [39]
By 2010, over 5 billion medical imaging studies had been conducted worldwide. [40] [41] Radiation exposure from medical imaging in 2006 accounted for about 50% of total ionizing radiation exposure in the United States. [42] Medical imaging equipment is manufactured using technology from the semiconductor industry, including CMOS integrated circuit chips, power semiconductor devices, sensors such as image sensors (particularly CMOS sensors) and biosensors, as well as processors like microcontrollers, microprocessors, digital signal processors, media processors and system-on-chip devices. As of 2015 [update] , annual shipments of medical imaging chips reached 46 million units, generating a market value of $1.1 billion. [43] [44]
Digital image processing allows the use of much more complex algorithms, and hence, can offer both more sophisticated performance at simple tasks, and the implementation of methods which would be impossible by analogue means.
In particular, digital image processing is a concrete application of, and a practical technology based on:
Some techniques which are used in digital image processing include:
Digital filters are used to blur and sharpen digital images. Filtering can be performed by:
The following examples show both methods: [46]
Filter type | Kernel or mask | Example |
---|---|---|
Original Image | ||
Spatial Lowpass | ||
Spatial Highpass | ||
Fourier Representation | Pseudo-code: image = checkerboard F = Fourier Transform of image Show Image: log(1+Absolute Value(F)) | |
Fourier Lowpass | ||
Fourier Highpass | ||
Images are typically padded before being transformed to the Fourier space, the highpass filtered images below illustrate the consequences of different padding techniques:
Zero padded | Repeated edge padded |
---|---|
Notice that the highpass filter shows extra edges when zero padded compared to the repeated edge padding.
MATLAB example for spatial domain highpass filtering.
img=checkerboard(20);% generate checkerboard% ************************** SPATIAL DOMAIN ***************************klaplace=[0-10;-15-1;0-10];% Laplacian filter kernelX=conv2(img,klaplace);% convolve test img with% 3x3 Laplacian kernelfigure()imshow(X,[])% show Laplacian filteredtitle('Laplacian Edge Detection')
Affine transformations enable basic image transformations including scale, rotate, translate, mirror and shear as is shown in the following examples: [46]
Transformation Name | Affine Matrix | Example |
---|---|---|
Identity | ||
Reflection | ||
Scale | ||
Rotate | where θ = π/6 =30° | |
Shear | ||
To apply the affine matrix to an image, the image is converted to matrix in which each entry corresponds to the pixel intensity at that location. Then each pixel's location can be represented as a vector indicating the coordinates of that pixel in the image, [x, y], where x and y are the row and column of a pixel in the image matrix. This allows the coordinate to be multiplied by an affine-transformation matrix, which gives the position that the pixel value will be copied to in the output image.
However, to allow transformations that require translation transformations, 3 dimensional homogeneous coordinates are needed. The third dimension is usually set to a non-zero constant, usually 1, so that the new coordinate is [x, y, 1]. This allows the coordinate vector to be multiplied by a 3 by 3 matrix, enabling translation shifts. So the third dimension, which is the constant 1, allows translation.
Because matrix multiplication is associative, multiple affine transformations can be combined into a single affine transformation by multiplying the matrix of each individual transformation in the order that the transformations are done. This results in a single matrix that, when applied to a point vector, gives the same result as all the individual transformations performed on the vector [x, y, 1] in sequence. Thus a sequence of affine transformation matrices can be reduced to a single affine transformation matrix.
For example, 2 dimensional coordinates only allow rotation about the origin (0, 0). But 3 dimensional homogeneous coordinates can be used to first translate any point to (0, 0), then perform the rotation, and lastly translate the origin (0, 0) back to the original point (the opposite of the first translation). These 3 affine transformations can be combined into a single matrix, thus allowing rotation around any point in the image. [47]
Mathematical morphology is suitable for denoising images. Structuring element are important in Mathematical morphology.
The following examples are about Structuring elements. The denoise function, image as I, and structuring element as B are shown as below and table.
e.g.
Define Dilation(I, B)(i,j) = . Let Dilation(I,B) = D(I,B)
D(I', B)(1,1) =
Define Erosion(I, B)(i,j) = . Let Erosion(I,B) = E(I,B)
E(I', B)(1,1) =
After dilation After erosion
An opening method is just simply erosion first, and then dilation while the closing method is vice versa. In reality, the D(I,B) and E(I,B) can implemented by Convolution
Structuring element | Mask | Code | Example |
---|---|---|---|
Original Image | None | Use Matlab to read Original image original=imread('scene.jpg');image=rgb2gray(original);[r,c,channel]=size(image);se=logical([111;111;111]);[p,q]=size(se);halfH=floor(p/2);halfW=floor(q/2);time=3;% denoising 3 times with all method | |
Dilation | Use Matlab to dilation imwrite(image,"scene_dil.jpg")extractmax=zeros(size(image),class(image));fori=1:timedil_image=imread('scene_dil.jpg');forcol=(halfW+1):(c-halfW)forrow=(halfH+1):(r-halfH)dpointD=row-halfH;dpointU=row+halfH;dpointL=col-halfW;dpointR=col+halfW;dneighbor=dil_image(dpointD:dpointU,dpointL:dpointR);filter=dneighbor(se);extractmax(row,col)=max(filter);endendimwrite(extractmax,"scene_dil.jpg");end | ||
Erosion | Use Matlab to erosion imwrite(image,'scene_ero.jpg');extractmin=zeros(size(image),class(image));fori=1:timeero_image=imread('scene_ero.jpg');forcol=(halfW+1):(c-halfW)forrow=(halfH+1):(r-halfH)pointDown=row-halfH;pointUp=row+halfH;pointLeft=col-halfW;pointRight=col+halfW;neighbor=ero_image(pointDown:pointUp,pointLeft:pointRight);filter=neighbor(se);extractmin(row,col)=min(filter);endendimwrite(extractmin,"scene_ero.jpg");end | ||
Opening | Use Matlab to Opening imwrite(extractmin,"scene_opening.jpg")extractopen=zeros(size(image),class(image));fori=1:timedil_image=imread('scene_opening.jpg');forcol=(halfW+1):(c-halfW)forrow=(halfH+1):(r-halfH)dpointD=row-halfH;dpointU=row+halfH;dpointL=col-halfW;dpointR=col+halfW;dneighbor=dil_image(dpointD:dpointU,dpointL:dpointR);filter=dneighbor(se);extractopen(row,col)=max(filter);endendimwrite(extractopen,"scene_opening.jpg");end | ||
Closing | Use Matlab to Closing imwrite(extractmax,"scene_closing.jpg")extractclose=zeros(size(image),class(image));fori=1:timeero_image=imread('scene_closing.jpg');forcol=(halfW+1):(c-halfW)forrow=(halfH+1):(r-halfH)dpointD=row-halfH;dpointU=row+halfH;dpointL=col-halfW;dpointR=col+halfW;dneighbor=ero_image(dpointD:dpointU,dpointL:dpointR);filter=dneighbor(se);extractclose(row,col)=min(filter);endendimwrite(extractclose,"scene_closing.jpg");end | ||
Digital cameras generally include specialized digital image processing hardware – either dedicated chips or added circuitry on other chips – to convert the raw data from their image sensor into a color-corrected image in a standard image file format. Additional post processing techniques increase edge sharpness or color saturation to create more naturally looking images.
Westworld (1973) was the first feature film to use the digital image processing to pixellate photography to simulate an android's point of view. [48] Image processing is also vastly used to produce the chroma key effect that replaces the background of actors with natural or artistic scenery.
Face detection can be implemented with Mathematical morphology, Discrete cosine transform which is usually called DCT, and horizontal Projection (mathematics).
General method with feature-based method
The feature-based method of face detection is using skin tone, edge detection, face shape, and feature of a face (like eyes, mouth, etc.) to achieve face detection. The skin tone, face shape, and all the unique elements that only the human face have can be described as features.
Process explanation
Image quality can be influenced by camera vibration, over-exposure, gray level distribution too centralized, and noise, etc. For example, noise problem can be solved by Smoothing method while gray level distribution problem can be improved by histogram equalization.
Smoothing method
In drawing, if there is some dissatisfied color, taking some color around dissatisfied color and averaging them. This is an easy way to think of Smoothing method.
Smoothing method can be implemented with mask and Convolution. Take the small image and mask for instance as below.
image is
mask is
After Convolution and smoothing, image is
Oberseving image[1, 1], image[1, 2], image[2, 1], and image[2, 2].
The original image pixel is 1, 4, 28, 30. After smoothing mask, the pixel becomes 9, 10, 9, 9 respectively.
new image[1, 1] = * (image[0,0]+image[0,1]+image[0,2]+image[1,0]+image[1,1]+image[1,2]+image[2,0]+image[2,1]+image[2,2])
new image[1, 1] = floor( * (2+5+6+3+1+4+1+28+30)) = 9
new image[1, 2] = floor({ * (5+6+5+1+4+6+28+30+2)) = 10
new image[2, 1] = floor( * (3+1+4+1+28+30+7+3+2)) = 9
new image[2, 2] = floor( * (1+4+6+28+30+2+3+2+2)) = 9
Gray Level Histogram method
Generally, given a gray level histogram from an image as below. Changing the histogram to uniform distribution from an image is usually what we called Histogram equalization.
In discrete time, the area of gray level histogram is (see figure 1) while the area of uniform distribution is (see figure 2). It is clear that the area will not change, so .
From the uniform distribution, the probability of is while the
In continuous time, the equation is .
Moreover, based on the definition of a function, the Gray level histogram method is like finding a function that satisfies f(p)=q.
Improvement method | Issue | Before improvement | Process | After improvement |
---|---|---|---|---|
Smoothing method | noise with Matlab, salt & pepper with 0.01 parameter is added |
| ||
Histogram Equalization | Gray level distribution too centralized | Refer to the Histogram equalization | ||
JPEG is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality. Since its introduction in 1992, JPEG has been the most widely used image compression standard in the world, and the most widely used digital image format, with several billion JPEG images produced every day as of 2015.
2D computer graphics is the computer-based generation of digital images—mostly from two-dimensional models and by techniques specific to them. It may refer to the branch of computer science that comprises such techniques or to the models themselves.
In Euclidean geometry, an affine transformation or affinity is a geometric transformation that preserves lines and parallelism, but not necessarily Euclidean distances and angles.
Motion compensation in computing is an algorithmic technique used to predict a frame in a video given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video data for video compression, for example in the generation of MPEG-2 files. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture may be previous in time or even from the future. When images can be accurately synthesized from previously transmitted/stored images, the compression efficiency can be improved.
A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images, digital video, digital audio, digital television, digital radio, and speech coding. DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.
Edge detection includes a variety of mathematical methods that aim at identifying edges, defined as curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuities in one-dimensional signals is known as step detection and the problem of finding signal discontinuities over time is known as change detection. Edge detection is a fundamental tool in image processing, machine vision and computer vision, particularly in the areas of feature detection and feature extraction.
A video camera is an optical instrument that captures videos, as opposed to a movie camera, which records images on film. Video cameras were initially developed for the television industry but have since become widely used for a variety of other purposes.
The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a computational theory of edge detection explaining why the technique works.
A digital image is an image composed of picture elements, also known as pixels, each with finite, discrete quantities of numeric representation for its intensity or gray level that is an output from its two-dimensional functions fed as input by its spatial coordinates denoted with x, y on the x-axis and y-axis, respectively. Depending on whether the image resolution is fixed, it may be of vector or raster type. By itself, the term "digital image" usually refers to raster images or bitmapped images.
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
Quantization, involved in image processing, is a lossy compression technique achieved by compressing a range of values to a single quantum (discrete) value. When the number of discrete symbols in a given stream is reduced, the stream becomes more compressible. For example, reducing the number of colors required to represent a digital image makes it possible to reduce its file size. Specific applications include DCT data quantization in JPEG and DWT data quantization in JPEG 2000.
The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local features in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.
In computer vision and image processing, motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion happens in three dimensions (3D) but the images are a projection of the 3D scene onto a 2D plane. The motion vectors may relate to the whole image or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel. The motion vectors may be represented by a translational model or many other models that can approximate the motion of a real video camera, such as rotation and translation in all three dimensions and zoom.
The median filter is a non-linear digital filtering technique, often used to remove noise from an image or signal. Such noise reduction is a typical pre-processing step to improve the results of later processing. Median filtering is very widely used in digital image processing because, under certain conditions, it preserves edges while removing noise, also having applications in signal processing.
An image sensor or imager is a sensor that detects and conveys information used to form an image. It does so by converting the variable attenuation of light waves into signals, small bursts of current that convey the information. The waves can be light or other electromagnetic radiation. Image sensors are used in electronic imaging devices of both analog and digital types, which include digital cameras, camera modules, camera phones, optical mouse devices, medical imaging equipment, night vision equipment such as thermal imaging devices, radar, sonar, and others. As technology changes, electronic and digital imaging tends to replace chemical and analog imaging.
The following are common definitions related to the machine vision field.
The following outline is provided as an overview of and topical guide to computer vision:
Camera resectioning is the process of estimating the parameters of a pinhole camera model approximating the camera that produced a given photograph or video; it determines which incoming light ray is associated with each pixel on the resulting image. Basically, the process determines the pose of the pinhole camera.
In image processing, histogram matching or histogram specification is the transformation of an image so that its histogram matches a specified histogram. The well-known histogram equalization method is a special case in which the specified histogram is uniformly distributed.
In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.
the Cyclops was the first digital camera
Medical imaging systems produce increasingly accurate images with improved quality using higher spatial resolutions and color bit-depth. Such improvements increase the amount of information that needs to be stored, processed, and transmitted.
Because of the large amount of medical imaging data, the transmission process becomes complicated in telemedicine applications. Thus, in order to adapt the data bit streams to the constraints related to the limitation of the bandwidths a reduction of the size of the data by compression of the images is essential.
The metal–oxide–semiconductor field-effect transistor (MOSFET) is the most commonly used active device in the very large-scale integration of digital integrated circuits (VLSI). During the 1970s these components revolutionized electronic signal processing, control systems and computers.
{{cite book}}
: |website=
ignored (help){{cite book}}
: CS1 maint: location missing publisher (link)