DreamBooth

Last updated

Demonstration of the use of DreamBooth to fine-tune the Stable Diffusion v1.5 diffusion model, using training data obtained from Category:Jimmy Wales on Wikimedia Commons. Depicted here are algorithmically generated images of Jimmy Wales, co-founder of Wikipedia, performing bench press exercises at a fitness gym. Demonstration of DreamBooth AI model fine-tuning for Stable Diffusion using Jimmy Wales training data from Wikimedia Commons.png
Demonstration of the use of DreamBooth to fine-tune the Stable Diffusion v1.5 diffusion model, using training data obtained from Category:Jimmy Wales on Wikimedia Commons. Depicted here are algorithmically generated images of Jimmy Wales, co-founder of Wikipedia, performing bench press exercises at a fitness gym.

DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject. [1] [2] [3]

Contents

Technology

Pretrained text-to-image diffusion models, while often capable of offering a diverse range of different image output types, lack the specificity required to generate images of lesser-known subjects, and are limited in their ability to render known subjects in different situations and contexts. [1] The methodology used to run implementations of DreamBooth involves the fine-tuning the full UNet component of the diffusion model using a few images (usually 3--5) depicting a specific subject. Images are paired with text prompts that contain the name of the class the subject belongs to, plus a unique identifier. As an example, a photograph of a [Nissan R34 GTR] car, with car being the class); a class-specific prior preservation loss is applied to encourage the model to generate diverse instances of the subject based on what the model is already trained on for the original class. [1] Pairs of low-resolution and high-resolution images taken from the set of input images are used to fine-tune the super-resolution components, allowing the minute details of the subject to be maintained. [1]

Usage

DreamBooth can be used to fine-tune models such as Stable Diffusion, where it may alleviate a common shortcoming of Stable Diffusion not being able to adequately generate images of specific individual people. [4] Such a use case is quite VRAM intensive, however, and thus cost-prohibitive for hobbyist users. [4] The Stable Diffusion adaptation of DreamBooth in particular is released as a free and open-source project based on the technology outlined by the original paper published by Ruiz et. al. in 2022. [5] Concerns have been raised regarding the ability for bad actors to utilise DreamBooth to generate misleading images for malicious purposes, and that its open-source nature allows anyone to utilise or even make improvements to the technology. [6] In addition, artists have expressed their apprehension regarding the ethics of using DreamBooth to train model checkpoints that are specifically aimed at imitating specific art styles associated with human artists; one such critic is Hollie Mengert, an illustrator for Disney and Penguin Random House who has had her art style trained into a checkpoint model via DreamBooth and shared online, without her consent. [7] [8]

Related Research Articles

Artificial intelligence and music (AIM) is a common subject in the International Computer Music Conference, the Computing Society Conference and the International Joint Conference on Artificial Intelligence. The first International Computer Music Conference (ICMC) was held in 1974 at Michigan State University. Current research includes the application of AI in music composition, performance, theory and digital sound processing.

Ai Yoshikawa is a Japanese actress. While working as a child actress she was represented by Moon the Child Agency. Her representative works includes the television series Oh! My Girl!!, Hanayome to Papa and Yamada Taro Monogatari. She is represented by Ken-On. Her former stage name was Riko Yoshida.

<span class="mw-page-title-main">Nozomi Bando</span> Japanese dancer, model and actress

Nozomi Bando is a Japanese dancer, model and actress. She is a former member of Flower and E-girls, former model for Hanachu and an exclusive model for Seventeen. Nozomi is represented with LDH.

<span class="mw-page-title-main">Nanase Nishino</span> Japanese actress, model, and television host

Nanase Nishino is a Japanese actress, model, television host, and former member of Japanese idol girl group Nogizaka46. Her lead roles in TV and film have included Asahi Tōjima in Asahinagu and Ai Amano in Denei Shojo: Video Girl Ai 2018. She co-hosts the Fuji TV variety show Lion no Goo Touch.

<span class="mw-page-title-main">Mugi Kadowaki</span> Japanese actress (born 1992)

Mugi Kadowaki is a Japanese actress.

<span class="mw-page-title-main">Tina Tamashiro</span> Japanese actress and model (born 1997)

Tina Tamashiro is a Japanese actress and model. Tamashiro was born to an American father and a Japanese mother. She was scouted by the president of her agency when she was walking home with her friends. In July 2012, Tamashiro won the Kodansha sponsored Miss iD 2013 Grand Prix. In a Vivi interview she said she wanted to be a "model idol".

<span class="mw-page-title-main">Mizuki Yamashita</span> Japanese singer, model, and actress

Mizuki Yamashita is a Japanese idol singer, model, and actress. She is a member of Nogizaka46 and an exclusive model for CanCam magazine. Her acting roles include the supporting role of Mami Ichinose in the BS TV Tokyo series Cheers to Miki Clinic, the lead role of Mai Kamio in the 2019 TV Tokyo adaptation of Video Girl Ai, and the lead role of Reiko Nonoyama in the 2021 TV Tokyo drama The Other Woman.

<span class="mw-page-title-main">Artificial intelligence art</span> Machine application of knowledge of human aesthetic expressions

Artificial intelligence art is any visual artwork created through the use of artificial intelligence (AI) programs.

<span class="mw-page-title-main">Yūki Yoda</span> Japanese singer, model, and actress

Yūki Yoda is a Japanese idol singer, model, and actress. She is a member of Nogizaka46 and an exclusive model for the magazines MAQUIA and bis. She has played supporting roles in the television dramas Mob Psycho 100 and Zambi.

<span class="mw-page-title-main">Asuka Saitō</span> Japanese Tarento, actress and model

Asuka Saitō is a Burmese-Japanese tarento, actress, fashion model and former idol. She is a former first generation member of the Japanese girl group Nogizaka46 and a regular model for the fashion magazine sweet. Her lead roles as an actress have included Mana Hayase in the Japanese remake of You Are the Apple of My Eye, and Midori Asakusa in both the film and TV adaptations of Keep Your Hands Off Eizouken!.

The year 2020 in Japanese music.

The year 2021 in Japanese music.

<span class="mw-page-title-main">Anna Yamada</span> Japanese actress

Anna Yamada is a Japanese actress represented by Amuse.

<i>JO1 the Movie: Unfinished - Go to the Top</i> 2022 Japanese film

JO1 the Movie: Unfinished - Go to the Top is a 2022 Japanese documentary film directed by Tetsurō Inagaki, featuring Japanese boy band JO1. It was released on March 11, 2022, and documented the first two years of the group's career. The film list eighth on the national box office ranking in its opening weekend.

<span class="mw-page-title-main">Stable Diffusion</span> Image-generating machine learning model

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt. It was developed by researchers from the CompVis Group at Ludwig Maximilian University of Munich and Runway with a compute donation by Stability AI and training data from non-profit organizations.

<span class="mw-page-title-main">Text-to-image model</span> Machine learning model

A text-to-image model is a machine learning model which takes an input natural language description and produces an image matching that description. Such models began to be developed in the mid-2010s, as a result of advances in deep neural networks. In 2022, the output of state of the art text-to-image models, such as OpenAI's DALL-E 3, Google Brain's Imagen, StabilityAI's Stable Diffusion, and Midjourney began to approach the quality of real photographs and human-drawn art.

<span class="mw-page-title-main">NovelAI</span> Online service for AI media creation

NovelAI is an online cloud-based, SaaS model, paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature later implemented on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware.

<span class="mw-page-title-main">Riffusion</span> Music-generating machine learning model

Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio. It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms. This results in a model which uses text prompts to generate image files, which can be put through an inverse Fourier transform and converted into audio files. While these files are only several seconds long, the model can also use latent space between outputs to interpolate different files together. This is accomplished using a functionality of the Stable Diffusion model known as img2img.

Text-to-Image personalization is a task in deep learning for computer graphics that augments pre-trained text-to-image generative models. In this task, a generative model that was trained on large-scale data, is adapted such that it can generate images of novel, user-provided concepts. These concepts are typically unseen during training, and may represent specific objects or more abstract categories.

References

  1. 1 2 3 4 Ruiz, Nataniel; Li, Yuanzhen; Jampani, Varun; Pritch, Yael; Rubinstein, Michael; Aberman, Kfir (August 25, 2022). "DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation". arXiv: 2208.12242 [cs.CV].
  2. Yuki Yamashita (September 1, 2022). "愛犬の合成画像を生成できるAI 文章で指示するだけでコスプレ 米Googleが開発". ITmedia Inc. (in Japanese). Archived from the original on August 31, 2022. 米Google Researchと米ボストン大学の研究チームが開発した...数枚の被写体画像とテキスト入力を使って、与えられた被写体が溶け込んだ新たな合成画像を作成する被写体駆動型Text-to-Imageモデルだ。[... developed by a research team from Google Research and Boston University, is a subject-driven text-to-image model that takes several images of a subject and text prompts to create newly generated images featuring the subject.]
  3. Brendan Murphy (October 13, 2022). "AI image generation is advancing at astronomical speeds. Can we still tell if a picture is fake?". The Conversation. Archived from the original on October 30, 2022. Recently, Google has released Dream Booth, an alternative, more sophisticated method for injecting specific people, objects or even art styles into text-to-image AI systems.
  4. 1 2 Ryo Shimizu (October 26, 2022). "まさに「世界変革」──この2カ月で画像生成AIに何が起きたのか?". Yahoo! News Japan (in Japanese). Archived from the original on October 26, 2022. Stable Diffusionは、一般に個人の写真や特定の人物を出すのが苦手だが、自分のペットや友人の写真をわずかな枚数から学習させる「Dreambooth」という技術が開発され、これも話題を呼んだ。ただし、Dreamboothでは、巨大なGPUメモリが必要になり、個人ユーザーが趣味の範囲で買えるGPUでは事実上実行不可能なのがネックとされていた。[Stable Diffusion is generally inadequate at generating personal photographs or specific individuals, however the development of "Dreambooth" allows training from a small number of photos featuring your pets or friends, causing quite a stir. However, the drawback is that Dreambooth requires a large amount of GPU memory, making it practically unfeasible to run on GPUs that individual users can afford within their hobbyist price range.]
  5. Benj Edwards (December 9, 2022). "AI image generation tech can now create life-wrecking deepfakes with ease". Ars Technica . Archived from the original on December 12, 2022. But not long after its announcement, someone adapted the Dreambooth technique to work with Stable Diffusion and released the code freely as an open source project.
  6. Kevin Jiang (December 1, 2022). "These AI images look just like me. What does that mean for the future of deepfakes?". Toronto Star . Archived from the original on December 8, 2022. For example, DreamBooth could be used to copy signatures or official signage to fake documents, create misleading photos or videos of politicians, manufacture revenge porn of individuals and more... A specific issue with DreamBooth and Stable Diffusion is that they're open source, Gupta continued. Unlike centralized AI-generation models that can impose regulations and barriers to image creation, the decentralized models like DreamBooth mean anyone can access and improve on the technology.
  7. Isabel Berwick; Sophia Smith (December 14, 2022). "Will AI replace human workers?" . Financial Times . Illustrator Hollie Mengert, whose artwork was used to train an AI model without her consent, spoke publicly against the practice of training AI models on artists' work without permission.
  8. "Генеративные нейросети и этика: появилась модель, копирующая стиль конкретного художника". DTF (in Russian). November 9, 2022. Archived from the original on November 9, 2022. Так, совсем недавно известная художница и иллюстратор Холли Менгерт стала своеобразным датасетом для новой нейросети (не давая на то согласия)... «В первую очередь мне показалось бестактным то, что моё имя фигурировало в этом инструменте. Я ничего о нём не знала и меня об этом не спрашивали. А если бы меня спросили, можно ли это сделать, я бы не согласилась».[So, quite recently, the artist and illustrator Hollie Mengert became the data source for a new neural network (without giving her consent)... "My initial reaction was that it felt invasive that my name was on this tool, I didn’t know anything about it and wasn’t asked about it. If I had been asked if they could do this, I wouldn’t have said yes."]