DeepDream

Last updated
The Mona Lisa with DeepDream effect using VGG16 network trained on ImageNet "Mona Lisa" with DeepDream effect using VGG16 network trained on ImageNet.jpg
The Mona Lisa with DeepDream effect using VGG16 network trained on ImageNet

DeepDream is a computer vision program created by Google engineer Alexander Mordvintsev that uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dream-like appearance reminiscent of a psychedelic experience in the deliberately overprocessed images. [1] [2] [3]

Contents

Google's program popularized the term (deep) "dreaming" to refer to the generation of images that produce desired activations in a trained deep network, and the term now refers to a collection of related approaches.

History

The DeepDream software, originated in a deep convolutional network codenamed "Inception" after the film of the same name, [1] [2] [3] was developed for the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) in 2014 [3] and released in July 2015.

The dreaming idea and name became popular on the internet in 2015 thanks to Google's DeepDream program. The idea dates from early in the history of neural networks, [4] and similar methods have been used to synthesize visual textures. [5] Related visualization ideas were developed (prior to Google's work) by several research groups. [6] [7]

After Google published their techniques and made their code open-source, [8] a number of tools in the form of web services, mobile applications, and desktop software appeared on the market to enable users to transform their own photos. [9]

Process

Aurelia-aurita-3 (cropped).jpg
Aurelia-aurita-3-0009.jpg
Aurelia-aurita-3-0049.jpg
The original image (top) after applying ten (middle) and fifty (bottom) iterations of DeepDream, the network having been trained to perceive dogs and then run backwards

The software is designed to detect faces and other patterns in images, with the aim of automatically classifying images. [10] However, once trained, the network can also be run in reverse, being asked to adjust the original image slightly so that a given output neuron (e.g. the one for faces or certain animals) yields a higher confidence score. This can be used for visualizations to understand the emergent structure of the neural network better, and is the basis for the DeepDream concept. This reversal procedure is never perfectly clear and unambiguous because it utilizes a one-to-many mapping process. [11] However, after enough reiterations, even imagery initially devoid of the sought features will be adjusted enough that a form of pareidolia results, by which psychedelic and surreal images are generated algorithmically. The optimization resembles backpropagation; however, instead of adjusting the network weights, the weights are held fixed and the input is adjusted.

For example, an existing image can be altered so that it is "more cat-like", and the resulting enhanced image can be again input to the procedure. [2] This usage resembles the activity of looking for animals or other patterns in clouds.

Applying gradient descent independently to each pixel of the input produces images in which adjacent pixels have little relation and thus the image has too much high frequency information. The generated images can be greatly improved by including a prior or regularizer that prefers inputs that have natural image statistics (without a preference for any particular image), or are simply smooth. [7] [12] [13] For example, Mahendran et al. [12] used the total variation regularizer that prefers images that are piecewise constant. Various regularizers are discussed further in Yosinski et al. [13] An in-depth, visual exploration of feature visualization and regularization techniques was published more recently. [14]

The cited resemblance of the imagery to LSD- and psilocybin-induced hallucinations is suggestive of a functional resemblance between artificial neural networks and particular layers of the visual cortex. [15]

Neural networks such as DeepDream have biological analogies providing insight into brain processing and the formation of consciousness. Hallucinogens such as DMT alter the function of the serotonergic system which is present within the layers of the visual cortex. Neural networks are trained on input vectors and are altered by internal variations during the training process. The input and internal modifications represent the processing of exogenous and endogenous signals respectively in the visual cortex. As internal variations are modified in deep neural networks the output image reflect these changes. This specific manipulation demonstrates how inner brain mechanisms are analogous to internal layers of neural networks. Internal noise level modifications represent how hallucinogens omit external sensory information leading internal preconceived conceptions to strongly influence visual perception. [16]

Usage

A heavily DeepDream-processed photograph of three men in a pool Deep Dreamscope (19822170718).jpg
A heavily DeepDream-processed photograph of three men in a pool

The dreaming idea can be applied to hidden (internal) neurons other than those in the output, which allows exploration of the roles and representations of various parts of the network. [13] It is also possible to optimize the input to satisfy either a single neuron (this usage is sometimes called Activity Maximization) [17] or an entire layer of neurons.

While dreaming is most often used for visualizing networks or producing computer art, it has recently been proposed that adding "dreamed" inputs to the training set can improve training times for abstractions in Computer Science. [18]

The DeepDream model has also been demonstrated to have application in the field of art history. [19]

DeepDream was used for Foster the People's music video for the song "Doing It for the Money". [20]

In 2017, a research group out of the University of Sussex created a Hallucination Machine, applying the DeepDream algorithm to a pre-recorded panoramic video, allowing users to explore virtual reality environments to mimic the experience of psychoactive substances and/or psychopathological conditions. [21] They were able to demonstrate that the subjective experiences induced by the Hallucination Machine differed significantly from control (non-‘hallucinogenic’) videos, while bearing phenomenological similarities to the psychedelic state (following administration of psilocybin).

In 2021, a study published in the journal Entropy demonstrated the similarity between DeepDream and actual psychedelic experience with neuroscientific evidence. [22] The authors recorded Electroencephalography (EEG) of human participants during passive vision of a movie clip and its DeepDream-generated counterpart. They found that DeepDream video triggered a higher entropy in the EEG signal and a higher level of functional connectivity between brain areas, [22] both well-known biomarkers of actual psychedelic experience. [23]

In 2022, a research group coordinated by the University of Trento "measure[d] participants’ cognitive flexibility and creativity after the exposure to virtual reality panoramic videos and their hallucinatory-like counterparts generated by the DeepDream algorithm ... following the simulated psychedelic exposure, individuals exhibited ... an attenuated contribution of the automatic process and chaotic dynamics underlying their decision processes, presumably due to a reorganization in the cognitive dynamics that facilitates the exploration of uncommon decision strategies and inhibits automated choices." [24]

See also

Related Research Articles

<span class="mw-page-title-main">Artificial neural network</span> Computational model used in machine learning, based on connected, hierarchical functions

Artificial neural networks are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

<span class="mw-page-title-main">Object detection</span> Computer technology related to computer vision and image processing

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.

<span class="mw-page-title-main">Computational creativity</span> Multidisciplinary endeavour

Computational creativity is a multidisciplinary endeavour that is located at the intersection of the fields of artificial intelligence, cognitive psychology, philosophy, and the arts.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is the subset of machine learning methods based on artificial neural networks with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">Feature learning</span> Set of learning techniques in machine learning

In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.

<span class="mw-page-title-main">AlexNet</span> Convolutional neural network

AlexNet is the name of a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky's Ph.D. advisor at the University of Toronto.

A Siamese neural network is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors. Often one of the output vectors is precomputed, thus forming a baseline against which the other output vector is compared. This is similar to comparing fingerprints but can be described more technically as a distance function for locality-sensitive hashing.

<span class="mw-page-title-main">Neural style transfer</span> Type of software algorithm for image manipulation

Neural style transfer (NST) refers to a class of software algorithms that manipulate digital images, or videos, in order to adopt the appearance or visual style of another image. NST algorithms are characterized by their use of deep neural networks for the sake of image transformation. Common uses for NST are the creation of artificial artwork from photographs, for example by transferring the appearance of famous paintings to user-supplied photographs. Several notable mobile apps use NST techniques for this purpose, including DeepArt and Prisma. This method has been used by artists and designers around the globe to develop new artwork based on existent style(s).

Energy-based generative neural networks is a class of generative models, which aim to learn explicit probability distributions of data in the form of energy-based models whose energy functions are parameterized by modern deep neural networks. Its name is due to the fact that this model can be derived from the discriminative neural networks. The parameter of the neural network in this model is trained in a generative manner by Markov chain Monte Carlo(MCMC)-based maximum likelihood estimation. The learning process follows an ''analysis by synthesis'' scheme, where within each learning iteration, the algorithm samples the synthesized examples from the current model by a gradient-based MCMC method, e.g., Langevin dynamics, and then updates the model parameters based on the difference between the training examples and the synthesized ones. This process can be interpreted as an alternating mode seeking and mode shifting process, and also has an adversarial interpretation. The first energy-based generative neural network is the generative ConvNet proposed in 2016 for image patterns, where the neural network is a convolutional neural network. The model has been generalized to various domains to learn distributions of videos, and 3D voxels. They are made more effective in their variants. They have proven useful for data generation, data recovery, data reconstruction.

In the domain of physics and probability, the filters, random fields, and maximum entropy (FRAME) model is a Markov random field model of stationary spatial processes, in which the energy function is the sum of translation-invariant potential functions that are one-dimensional non-linear transformations of linear filter responses. The FRAME model was originally developed by Song-Chun Zhu, Ying Nian Wu, and David Mumford for modeling stochastic texture patterns, such as grasses, tree leaves, brick walls, water waves, etc. This model is the maximum entropy distribution that reproduces the observed marginal histograms of responses from a bank of filters, where for each filter tuned to a specific scale and orientation, the marginal histogram is pooled over all the pixels in the image domain. The FRAME model is also proved to be equivalent to the micro-canonical ensemble, which was named the Julesz ensemble. Gibbs sampler is adopted to synthesize texture images by drawing samples from the FRAME model.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

<span class="mw-page-title-main">Layer (deep learning)</span>

A layer in a deep learning model is a structure or network topology in the model's architecture, which takes information from the previous layers and then passes it to the next layer.

<span class="mw-page-title-main">Self-supervised learning</span> A paradigm in machine learning

Self-supervised learning (SSL) is a paradigm in machine learning where a model is trained on a task using the data itself to generate supervisory signals, rather than relying on external labels provided by humans. In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals. SSL tasks are designed so that solving it requires capturing essential features or relationships in the data. The input data is typically augmented or transformed in a way that creates pairs of related samples. One sample serves as the input, and the other is used to formulate the supervisory signal. This augmentation can involve introducing noise, cropping, rotation, or other transformations. Self-supervised learning more closely imitates the way humans learn to classify objects.

Tensor informally refers in machine learning to two different concepts that organize and represent data. Data may be organized in a multidimensional array (M-way array) that is informally referred to as a "data tensor"; however in the strict mathematical sense, a tensor is a multilinear mapping over a set of domain vector spaces to a range vector space. Observations, such as images, movies, volumes, sounds, and relationships among words and concepts, stored in an M-way array ("data tensor") may be analyzed either by artificial neural networks or tensor methods.

References

  1. 1 2 Mordvintsev, Alexander; Olah, Christopher; Tyka, Mike (2015). "DeepDream - a code example for visualizing Neural Networks". Google Research. Archived from the original on 2015-07-08.
  2. 1 2 3 Mordvintsev, Alexander; Olah, Christopher; Tyka, Mike (2015). "Inceptionism: Going Deeper into Neural Networks". Google Research. Archived from the original on 2015-07-03.
  3. 1 2 3 Szegedy, Christian; Liu, Wei; Jia, Yangqing; Sermanet, Pierre; Reed, Scott E.; Anguelov, Dragomir; Erhan, Dumitru; Vanhoucke, Vincent; Rabinovich, Andrew (2015). "Going deeper with convolutions". IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015. IEEE Computer Society. pp. 1–9. arXiv: 1409.4842 . doi:10.1109/CVPR.2015.7298594.
  4. Lewis, J.P. (1988). "Creation by refinement: a creativity paradigm for gradient descent learning networks". IEEE International Conference on Neural Networks. IEEE International Conference on Neural Networks. pp. 229-233 vol.2. doi:10.1109/ICNN.1988.23933. ISBN   0-7803-0999-5.
  5. Portilla, J; Simoncelli, Eero (2000). "A parametric texture model based on joint statistics of complex wavelet coefficients". International Journal of Computer Vision. 40: 49–70. doi:10.1023/A:1026553619983. S2CID   2475577.
  6. Erhan, Dumitru. (2009). Visualizing Higher-Layer Features of a Deep Network. International Conference on Machine Learning Workshop on Learning Feature Hierarchies. S2CID   15127402.
  7. 1 2 Simonyan, Karen; Vedaldi, Andrea; Zisserman, Andrew (2014). Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. International Conference on Learning Representations Workshop. arXiv: 1312.6034 .
  8. deepdream on GitHub
  9. Daniel Culpan (2015-07-03). "These Google "Deep Dream" Images Are Weirdly Mesmerising". Wired. Retrieved 2015-07-25.
  10. Rich McCormick (7 July 2015). "Fear and Loathing in Las Vegas is terrifying through the eyes of a computer". The Verge. Retrieved 2015-07-25.
  11. Hayes, Brian (2015). "Computer Vision and Computer Hallucinations". American Scientist. 103 (6): 380. doi: 10.1511/2015.117.380 . ISSN   0003-0996.
  12. 1 2 Mahendran, Aravindh; Vedaldi, Andrea (2015). "Understanding Deep Image Representations by Inverting Them". 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Conference on Computer Vision and Pattern Recognition. pp. 5188–5196. arXiv: 1412.0035 . doi:10.1109/CVPR.2015.7299155. ISBN   978-1-4673-6964-0.
  13. 1 2 3 Yosinski, Jason; Clune, Jeff; Nguyen, Anh; Fuchs, Thomas (2015). Understanding Neural Networks Through Deep Visualization. Deep Learning Workshop, International Conference on Machine Learning (ICML) Deep Learning Workshop. arXiv: 1506.06579 .
  14. Olah, Chris; Mordvintsev, Alexander; Schubert, Ludwig (2017-11-07). "Feature Visualization". Distill. 2 (11). doi: 10.23915/distill.00007 . ISSN   2476-0757.
  15. LaFrance, Adrienne (2015-09-03). "When Robots Hallucinate". The Atlantic. Retrieved 24 September 2015.
  16. Timmermann, Christopher (2020-12-12). "Neural Network Models for DMT-induced Visual Hallucinations". Neuroscience of Consciousness. NIH. 2020 (1): niaa024. doi:10.1093/nc/niaa024. PMC   7734438 . PMID   33343929.
  17. Nguyen, Anh; Dosovitskiy, Alexey; Yosinski, Jason; Brox, Thomas (2016). Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. arxiv. arXiv: 1605.09304 . Bibcode:2016arXiv160509304N.
  18. Arora, Sanjeev; Liang, Yingyu; Tengyu, Ma (2016). Why are deep nets reversible: A simple theory, with implications for training. arxiv. arXiv: 1511.05653 . Bibcode:2015arXiv151105653A.
  19. Spratt, Emily L. (2017). "Dream Formulations and Deep Neural Networks: Humanistic Themes in the Iconology of the Machine-Learned Image" (PDF). Kunsttexte. Humboldt-Universität zu Berlin. 4. arXiv: 1802.01274 . Bibcode:2018arXiv180201274S.
  20. fosterthepeopleVEVO (2017-08-11), Foster The People - Doing It for the Money , retrieved 2017-08-15
  21. Suzuki, Keisuke (22 November 2017). "A Deep-Dream Virtual Reality Platform for Studying Altered Perceptual Phenomenology". Sci Rep. 7 (1): 15982. Bibcode:2017NatSR...715982S. doi:10.1038/s41598-017-16316-2. PMC   5700081 . PMID   29167538.
  22. 1 2 Greco, Antonino; Gallitto, Giuseppe; D’Alessandro, Marco; Rastelli, Clara (July 2021). "Increased Entropic Brain Dynamics during DeepDream-Induced Altered Perceptual Phenomenology". Entropy. 23 (7): 839. Bibcode:2021Entrp..23..839G. doi: 10.3390/e23070839 . ISSN   1099-4300. PMC   8306862 . PMID   34208923.
  23. Carhart-Harris, Robin; Leech, Robert; Hellyer, Peter; Shanahan, Murray; Feilding, Amanda; Tagliazucchi, Enzo; Chialvo, Dante; Nutt, David (2014). "The entropic brain: a theory of conscious states informed by neuroimaging research with psychedelic drugs". Frontiers in Human Neuroscience. 8: 20. doi: 10.3389/fnhum.2014.00020 . ISSN   1662-5161. PMC   3909994 . PMID   24550805.
  24. Rastelli, Clara; Greco, Antonino; Kennett, Yoed; Finocchiaro, Chiara; De Pisapia, Nicola (7 March 2022). "Simulated visual hallucinations in virtual reality enhance cognitive flexibility". Sci Rep. 12 (1): 4027. Bibcode:2022NatSR..12.4027R. doi:10.1038/s41598-022-08047-w. PMC   8901713 . PMID   35256740.
External videos
Nuvola apps kaboodle.svg Deep Dream (Google) - Computerphile by Michael Pound