Seam carving

Last updated
Original image to be made narrower Broadway tower edit.jpg
Original image to be made narrower
Scaling is undesirable because the castle is distorted. Broadway tower edit scale.png
Scaling is undesirable because the castle is distorted.
Cropping is undesirable because part of the castle is removed. Broadway tower edit cropped.png
Cropping is undesirable because part of the castle is removed.
Seam carving Broadway tower edit Seam Carving.png
Seam carving

Seam carving (or liquid rescaling) is an algorithm for content-aware image resizing, developed by Shai Avidan, of Mitsubishi Electric Research Laboratories (MERL), and Ariel Shamir, of the Interdisciplinary Center and MERL. It functions by establishing a number of seams (paths of least importance) in an image and automatically removes seams to reduce image size or inserts seams to extend it. Seam carving also allows manually defining areas in which pixels may not be modified, and features the ability to remove whole objects from photographs.

Contents

The purpose of the algorithm is image retargeting, which is the problem of displaying images without distortion on media of various sizes (cell phones, projection screens) using document standards, like HTML, that already support dynamic changes in page layout and text but not images. [1]

Image Retargeting was invented by Vidya Setlur, Saeko Takage, Ramesh Raskar, Michael Gleicher and Bruce Gooch in 2005. [2] The work by Setlur et al. won the 10-year impact award in 2015[ where? ].

Seams

Seams can be either vertical or horizontal. A vertical seam is a path of pixels connected from top to bottom in an image with one pixel in each row. [1] A horizontal seam is similar with the exception of the connection being from left to right. The importance/energy function values a pixel by measuring its contrast with its neighbor pixels.

Process

The below example describes the process of seam carving:

StepImage
1) Start with an image.
BroadwayTowerSeamCarvingA.png
2) Calculate the weight/density/energy of each pixel. This can be done by various algorithms: gradient magnitude, entropy, visual saliency, eye-gaze movement. [1] Here we use gradient magnitude.
BroadwayTowerSeamCarvingB.png
3) From the energy, make a list of seams. Seams are ranked by energy, with low energy seams being of least importance to the content of the image. Seams can be calculated via the dynamic programming approach below.
BroadwayTowerSeamCarvingC.png
4) Remove low-energy seams as needed.
BroadwayTowerSeamCarvingF.png
5) Final image.
BroadwayTowerSeamCarvingE.png

The seams to remove depends only on the dimension (height or width) one wants to shrink. It is also possible to invert step 4 so the algorithm enlarges in one dimension by copying a low energy seam and averaging its pixels with its neighbors. [1]

Computing seams

Computing a seam consists of finding a path of minimum energy cost from one end of the image to another. This can be done via Dijkstra's algorithm, dynamic programming, greedy algorithm or graph cuts among others. [1]

Dynamic programming

Dynamic programming is a programming method that stores the results of sub-calculations in order to simplify calculating a more complex result. Dynamic programming can be used to compute seams. If attempting to compute a vertical seam (path) of lowest energy, for each pixel in a row we compute the energy of the current pixel plus the energy of one of the three possible pixels above it.

The images below depict a DP process to compute one optimal seam. [1] Each square represents a pixel, with the top-left value in red representing the energy value of that pixel. The value in black represents the cumulative sum of energies leading up to and including that pixel.

The energy calculation is trivially parallelized for simple functions. The calculation of the DP array can also be parallelized with some interprocess communication. However, the problem of making multiple seams at the same time is harder for two reasons: the energy needs to be regenerated for each removal for correctness and simply tracing back multiple seams can form overlaps. Avidan 2007 computes all seams by removing each seam iteratively and storing an "index map" to record all the seams generated. The map holds a "nth seam" number for each pixel on the image, and can be used later for size adjustment. [1]

If one ignores both issues however, a greedy approximation for parallel seam carving is possible. To do so, one starts with the minimum-energy pixel at one end, and keep choosing the minimum energy path to the other end. The used pixels are marked so that they are not picked again. [3] Local seams can also be computed for smaller parts of the image in parallel for a good approximation. [4]

Issues

  1. The algorithm may need user-provided information to reduce errors. This can consist of painting the regions which are to be preserved. With human faces it is possible to use face detection.
  2. Sometimes the algorithm, by removing a low energy seam, may end up inadvertently creating a seam of higher energy. The solution to this is to simulate a removal of a seam, and then check the energy delta to see if the energy increases (forward energy). If it does, prefer other seams instead. [5]

Implementations

Interactive SVG demonstrating seam-carving using ImageMagick's liquid-rescale function. In the SVG file, hover over the percentages to compare the original image (top), its width rescaled to the percentage using seam-carving (middle), and rescaled to the same size using interpolation (bottom). Broadway tower seam carving interactive.svg
Interactive SVG demonstrating seam-carving using ImageMagick's liquid-rescale function. In the SVG file, hover over the percentages to compare the original image (top), its width rescaled to the percentage using seam-carving (middle), and rescaled to the same size using interpolation (bottom).
Interactive SVG demonstrating seam-carving using ImageMagick's liquid-rescale function. In the SVG file, hover over the percentages as above. Note that the faces are affected less than their surroundings. Creation of Adam seam carving interactive.svg
Interactive SVG demonstrating seam-carving using ImageMagick's liquid-rescale function. In the SVG file, hover over the percentages as above. Note that the faces are affected less than their surroundings.

Adobe Systems acquired a non-exclusive license to seam carving technology from MERL, [6] and implemented it as a feature in Photoshop CS4, where it is called Content Aware Scaling. [7] As the license is non-exclusive, other popular computer graphics applications (e. g. GIMP, digiKam, and ImageMagick) as well as some stand-alone programs (e. g. iResizer) [8] also have implementations of this technique, some of which are released as free and open source software. [9] [10] [11]

Improvements and extensions

A 2010 review of eight image retargeting methods found that seam carving produced output that was ranked among the worst of the tested algorithms. It was, however, a part of one of the highest-ranking algorithms: the multi-operator extension mentioned above (combined with cropping and scaling). [15]

See also

Related Research Articles

<span class="mw-page-title-main">Rendering (computer graphics)</span> Process of generating an image from a model

Rendering or image synthesis is the process of generating a photorealistic or non-photorealistic image from a 2D or 3D model by means of a computer program. The resulting image is referred to as a rendering. Multiple models can be defined in a scene file containing objects in a strictly defined language or data structure. The scene file contains geometry, viewpoint, textures, lighting, and shading information describing the virtual scene. The data contained in the scene file is then passed to a rendering program to be processed and output to a digital image or raster graphics image file. The term "rendering" is analogous to the concept of an artist's impression of a scene. The term "rendering" is also used to describe the process of calculating effects in a video editing program to produce the final video output.

<span class="mw-page-title-main">Digital compositing</span>

Digital compositing is the process of digitally assembling multiple images to make a final image, typically for print, motion pictures or screen display. It is the digital analogue of optical film compositing. It's part of VFX processing.

<span class="mw-page-title-main">Global illumination</span> Group of rendering algorithms used in 3D computer graphics

Global illumination (GI), or indirect illumination, is a group of algorithms used in 3D computer graphics that are meant to add more realistic lighting to 3D scenes. Such algorithms take into account not only the light that comes directly from a light source, but also subsequent cases in which light rays from the same source are reflected by other surfaces in the scene, whether reflective or not.

<span class="mw-page-title-main">Radiosity (computer graphics)</span> Computer graphics rendering method using diffuse reflection

In 3D computer graphics, radiosity is an application of the finite element method to solving the rendering equation for scenes with surfaces that reflect light diffusely. Unlike rendering methods that use Monte Carlo algorithms, which handle all types of light paths, typical radiosity only account for paths which leave a light source and are reflected diffusely some number of times before hitting the eye. Radiosity is a global illumination algorithm in the sense that the illumination arriving on a surface comes not just directly from the light sources, but also from other surfaces reflecting light. Radiosity is viewpoint independent, which increases the calculations involved, but makes them useful for all viewpoints.

<span class="mw-page-title-main">Ray tracing (graphics)</span> Rendering method

In 3D computer graphics, ray tracing is a technique for modeling light transport for use in a wide variety of rendering algorithms for generating digital images.

<span class="mw-page-title-main">Ray casting</span> Methodological basis for 3D CAD/CAM solid modeling and image rendering

Ray casting is the methodological basis for 3D CAD/CAM solid modeling and image rendering. It is essentially the same as ray tracing for computer graphics where virtual light rays are "cast" or "traced" on their path from the focal point of a camera through each pixel in the camera sensor to determine what is visible along the ray in the 3D scene. The term "Ray Casting" was introduced by Scott Roth while at the General Motors Research Labs from 1978–1980. His paper, "Ray Casting for Modeling Solids", describes modeled solid objects by combining primitive solids, such as blocks and cylinders, using the set operators union (+), intersection (&), and difference (-). The general idea of using these binary operators for solid modeling is largely due to Voelcker and Requicha's geometric modelling group at the University of Rochester. See solid modeling for a broad overview of solid modeling methods. This figure on the right shows a U-Joint modeled from cylinders and blocks in a binary tree using Roth's ray casting system in 1979.

<span class="mw-page-title-main">Volume rendering</span> Representing a 3D-modeled object or dataset as a 2D projection

In scientific visualization and computer graphics, volume rendering is a set of techniques used to display a 2D projection of a 3D discretely sampled data set, typically a 3D scalar field.

A light field, or lightfield, is a vector function that describes the amount of light flowing in every direction through every point in a space. The space of all possible light rays is given by the five-dimensional plenoptic function, and the magnitude of each ray is given by its radiance. Michael Faraday was the first to propose that light should be interpreted as a field, much like the magnetic fields on which he had been working. The term light field was coined by Andrey Gershun in a classic 1936 paper on the radiometric properties of light in three-dimensional space.

General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.

In parallel computing, an embarrassingly parallel workload or problem is one where little or no effort is needed to split the problem into a number of parallel tasks. This is due to minimal or no dependency upon communication between the parallel tasks, or for results between them.

<span class="mw-page-title-main">Pixel-art scaling algorithms</span> Upscaling filters for pixel art graphics

Pixel art scaling algorithms are graphical filters that attempt to enhance the appearance of hand-drawn 2D pixel art graphics. These algorithms are a form of automatic image enhancement. Pixel art scaling algorithms employ methods significantly different than the common methods of image rescaling, which have the goal of preserving the appearance of images.

<span class="mw-page-title-main">Path tracing</span> Computer graphics method

Path tracing is a computer graphics Monte Carlo method of rendering images of three-dimensional scenes such that the global illumination is faithful to reality. Fundamentally, the algorithm is integrating over all the illuminance arriving to a single point on the surface of an object. This illuminance is then reduced by a surface reflectance function (BRDF) to determine how much of it will go towards the viewpoint camera. This integration procedure is repeated for every pixel in the output image. When combined with physically accurate models of surfaces, accurate models of real light sources, and optically correct cameras, path tracing can produce still images that are indistinguishable from photographs.

Texture synthesis is the process of algorithmically constructing a large digital image from a small digital sample image by taking advantage of its structural content. It is an object of research in computer graphics and is used in many fields, amongst others digital image editing, 3D computer graphics and post-production of films.

<span class="mw-page-title-main">Image scaling</span> Changing the resolution of a digital image

In computer graphics and digital imaging, imagescaling refers to the resizing of a digital image. In video technology, the magnification of digital material is known as upscaling or resolution enhancement.

<span class="mw-page-title-main">Autostereoscopy</span> A method of displaying stereoscopic images without the use of special headgear or glasses

Autostereoscopy is any method of displaying stereoscopic images without the use of special headgear, glasses, something that affects vision, or anything for eyes on the part of the viewer. Because headgear is not required, it is also called "glasses-free 3D" or "glassesless 3D".

<span class="mw-page-title-main">Livewire Segmentation Technique</span>

Livewire, is a segmentation technique which allows a user to select regions of interest to be extracted quickly and accurately, using simple mouse clicks. It is based on the lowest cost path algorithm, by Edsger W. Dijkstra. Firstly convolve the image with a Sobel filter to extract edges. Each pixel of the resulting image is a vertex of the graph and has edges going to the 4 pixels around it, as up, down, left, right. The edge costs are defined based on a cost function. In 1995, Eric N. Mortensen and William A. Barrett made some extension work on livewire segmentation tool, which is known as Intelligent Scissors.

<span class="mw-page-title-main">ImageMagick</span> Free and open-source image manipulation software

ImageMagick, invoked from the command line as magick, is a free and open-source cross-platform software suite for displaying, creating, converting, modifying, and editing raster images. ImageMagick was created by John Cristy in 1987, it can read and write over 200 image file formats. It is widely used in open-source applications.

<span class="mw-page-title-main">Watershed (image processing)</span> Transformation defined on a grayscale image

In the study of image processing, a watershed is a transformation defined on a grayscale image. The name refers metaphorically to a geological watershed, or drainage divide, which separates adjacent drainage basins. The watershed transformation treats the image it operates upon like a topographic map, with the brightness of each point representing its height, and finds the lines that run along the tops of ridges.

<span class="mw-page-title-main">Widest path problem</span> Path-finding using high-weight graph edges

In graph algorithms, the widest path problem is the problem of finding a path between two designated vertices in a weighted graph, maximizing the weight of the minimum-weight edge in the path. The widest path problem is also known as the maximum capacity path problem. It is possible to adapt most shortest path algorithms to compute widest paths, by modifying them to use the bottleneck distance instead of path length. However, in many cases even faster algorithms are possible.

<span class="mw-page-title-main">Saliency map</span> Type of image

In computer vision, a saliency map is an image that highlights either the region on which people's eyes focus first or the most relevant regions for machine learning models. The goal of a saliency map is to reflect the degree of importance of a pixel to the human visual system or an otherwise opaque ML model.

References

  1. 1 2 3 4 5 6 7 Avidan, Shai; Shamir, Ariel (July 2007). "Seam carving for content-aware image resizing". ACM SIGGRAPH 2007 papers. p. 10. doi: 10.1145/1275808.1276390 . ISBN   978-1-4503-7836-9.
  2. Vidya Setlur; Saeko Takage; Ramesh Raskar; Michael Gleicher; Bruce Gooch (December 2005). "Automatic image retargeting". Proceedings of the 4th international conference on Mobile and ubiquitous multimedia - MUM '05. pp. 59–68. doi: 10.1145/1149488.1149499 . ISBN   0-473-10658-2.
  3. Bist; Palakkode (2016). "Parallel Seam Carving". www.andrew.cmu.edu.
  4. 1 2 Chen-Kuo Chiang; Shu-Fan Wang; Yi-Ling Chen; Shang-Hong Lai (November 2009). "Fast JND-Based Video Carving With GPU Acceleration for Real-Time Video Retargeting". IEEE Transactions on Circuits and Systems for Video Technology. 19 (11): 1588–1597. doi:10.1109/TCSVT.2009.2031462. S2CID   15124131.
  5. 1 2 Improved Seam Carving for Video Retargeting. Michael Rubinstein, Ariel Shamir, Shai Avidan. SIGGRAPH 2008.
  6. Mitsubishi Electric press release, Business Wire, December 16, 2008.
  7. Adobe Photoshop CS4 new feature list.
  8. iResizer Content aware image resizing software by Teorex
  9. Liquid Rescale, seam carving plug-in for GIMP
  10. Announcement of inclusion in digiKam
  11. Seam carving capability included in ImageMagick
  12. "Improved seam carving with forward energy".
  13. Multi-operator Media Retargeting. Michael Rubinstein, Ariel Shamir, Shai Avidan. SIGGRAPH 2009.
  14. Real-time content-aware image resizing Science in China Series F: Information Sciences, 2009 SCIENCE IN CHINA PRESS. Archived July 7, 2011, at the Wayback Machine
  15. Rubinstein, Michael; Gutierrez, Diego; Sorkine, Olga; Shamir, Ariel (2010). "A Comparative Study of Image Retargeting" (PDF). ACM Transactions on Graphics. 29 (5): 1–10. doi:10.1145/1882261.1866186. See also the RetargetMe benchmark.