Automatic1111

Last updated
AUTOMATIC1111 Stable Diffusion Web UI
Original author(s) AUTOMATIC1111
Developer(s) AUTOMATIC1111 and community
Initial releaseAugust 22, 2022;2 years ago (2022-08-22) [1]
Repository github.com/AUTOMATIC1111/stable-diffusion-webui
Written in Python
License AGPL-3.0 [2]

AUTOMATIC1111 Stable Diffusion Web UI (SD WebUI, A1111, or Automatic1111 [3] ) is an open source generative artificial intelligence program that allows users to generate images from a text prompt. [4] It uses Stable Diffusion as the base model for its image capabilities together with a large set of extensions and features to customize its output. [5]

Contents

History

SD WebUI was released on GitHub on August 22, 2022, by AUTOMATIC1111, [1] 1 month after the initial release of Stable Diffusion. [6] At the time, Stable Diffusion could only be run via the command line. [5] SD WebUI quickly rose in popularity and has been described as "the most popular tool for running diffusion models locally." [4] [7] A user study of six StableDiffusion users showed that all participants had used SD WebUI at least once. [3] The study showed that users ascribe SD WebUI's popularity to its ease of installation and support for open source tools. [3] SD WebUI is one of the most popular user interfaces for Stable Diffusion, together with ComfyUI. [8] In February 2024, a book was published by ja:Gijutsu Hyoronsha on using Stable Diffusion with SD WebUI in Japanese. [9] [10] As of July 2024, the project had 136,000 stars on GitHub. [11]

Features

SD WebUI uses Gradio for its user interface. [12] [13] [14] Each parameter in the Stable Diffusion program is exposed via a UI interface within SD WebUI. SD WebUI contains additional parameters not included in Stable Diffusion itself, such as support for Low-rank adaptations, ControlNet and custom variational autoencoders. [12] [13] [15] SD WebUI supports prompt weighting, image-to-image based generation, inpainting, outpainting and image scaling. [16] It supports over 20 samplers including DDIM, Euler, Euler a, DPM++ 2M Karras, and UniPC. [16] [17] It is also used for its various optimizations over the base Stable Diffusion. [5]

Stable Diffusion WebUI Forge

Stable Diffusion WebUI Forge (Forge) is a notable fork of SD WebUI started by Lvmin Zhang, who is also the creator of ControlNet and Fooocus. [18] [19] The initial goal of Forge was to improve the performance and features of SD WebUI with the intention to upstream changes back to SD WebUI. [18] [19] One of Forge's optimizations allowed users with low VRAM to generate images faster on some versions of Stable Diffusion. [18] It improved generation speed for users with 8GB and 6GB VRAM by 30-45% and 60-75%, respectively. [18] [19] Forge also includes extra features such as support for more samplers than standard SD WebUI. [20] Some of Forge's optimizations were borrowed from ComfyUI, and others were developed by the Forge team. [19] In August 2024, Forge added support for the Flux diffusion model developed by Black Forest Labs, which is not yet supported by SD WebUI. [21]

Related Research Articles

<span class="mw-page-title-main">Graphical user interface</span> User interface allowing interaction through graphical icons and visual indicators

A graphical user interface, or GUI, is a form of user interface that allows users to interact with electronic devices through graphical icons and visual indicators such as secondary notation. In many applications, GUIs are used instead of text-based UIs, which are based on typed command labels or text navigation. GUIs were introduced in reaction to the perceived steep learning curve of command-line interfaces (CLIs), which require commands to be typed on a computer keyboard.

<span class="mw-page-title-main">Glade Interface Designer</span> Graphical user interface builder

Glade Interface Designer is a graphical user interface builder for GTK, with additional components for GNOME. In its third version, Glade is programming language–independent, and does not produce code for events, but rather an XML file that is then used with an appropriate binding.

Music and artificial intelligence (AI) is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.

<span class="mw-page-title-main">Artificial intelligence art</span> Visual media created with AI

Artificial intelligence art is visual artwork created or enhanced through the use of artificial intelligence (AI) programs.

<span class="mw-page-title-main">DALL-E</span> Image-generating deep-learning model

DALL-E, DALL-E 2, and DALL-E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as prompts.

<span class="mw-page-title-main">Midjourney</span> Image-generating machine learning model

Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.

Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

<span class="mw-page-title-main">Stable Diffusion</span> Image-generating machine learning model

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.

<span class="mw-page-title-main">Text-to-image model</span> Machine learning model

A text-to-image model is a machine learning model which takes an input natural language description and produces an image matching that description.

<span class="mw-page-title-main">NovelAI</span> Online service for AI media creation

NovelAI is an online cloud-based, SaaS model, and a paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature being implemented later on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware.

<span class="mw-page-title-main">DreamBooth</span> Deep learning generation model

DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject.

<span class="mw-page-title-main">Riffusion</span> Music-generating machine learning model

Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio. It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms. This results in a model which uses text prompts to generate image files, which can be put through an inverse Fourier transform and converted into audio files. While these files are only several seconds long, the model can also use latent space between outputs to interpolate different files together. This is accomplished using a functionality of the Stable Diffusion model known as img2img.

Text-to-Image personalization is a task in deep learning for computer graphics that augments pre-trained text-to-image generative models. In this task, a generative model that was trained on large-scale data, is adapted such that it can generate images of novel, user-provided concepts. These concepts are typically unseen during training, and may represent specific objects or more abstract categories.

Runway AI, Inc. is an American company headquartered in New York City that specializes in generative artificial intelligence research and technologies. The company is primarily focused on creating products and models for generating videos, images, and various multimedia content. It is most notable for developing the commercial text-to-video and video generative AI models Gen-1, Gen-2 and Gen-3 Alpha.

<i>The Dog & the Boy</i> 2023 Japanese animated science fiction short film

The Dog & the Boy is a 2023 animated science fiction short film directed by Ryōtarō Makihara. Distributed by Netflix and released on YouTube on 31 January 2023, the film follows the friendship between a robot dog and a boy. It became notable for its use of artificial intelligence art to create its background artwork, which was widely negatively received online.

<span class="mw-page-title-main">Fooocus</span> Open source generative artificial intelligence UI

Fooocus is an open source generative artificial intelligence program that allows users to generate images from a text prompt. It uses Stable Diffusion as the base model for its image capabilities as well as a collection of default settings and prompts to make the image generation process more streamlined.

<span class="mw-page-title-main">ComfyUI</span> Open source generative artificial intelligence UI

ComfyUI is an open source, node-based program that allows users to generate images from a series of text prompts. It uses free diffusion models such as Stable Diffusion as the base model for its image capabilities combined with other tools such as ControlNet and LCM Low-rank adaptation with each tool being represented by a node in the program.

Generative AI pornography or simply AI pornography refers to digitally created explicit content produced through generative artificial intelligence (AI) technologies. Unlike traditional pornography, which involves real actors and cameras, this content is synthesized entirely by AI algorithms. These algorithms, including Generative adversarial network (GANs) and text-to-image models, generate lifelike images, videos, or animations from textual descriptions or datasets.

<span class="mw-page-title-main">Flux (text-to-image model)</span> Image-generating machine learning model

Flux is a text-to-image model developed by Black Forest Labs, based in Freiburg im Breisgau, Germany. Black Forest Labs were founded by former employees of Stability AI. As with other text-to-image models, Flux generates images from natural language descriptions, called prompts.

Civitai is an online platform and marketplace for generative AI content, primarily focused on AI-generated images and models.

References

  1. 1 2 AUTOMATIC1111 (Aug 22, 2022). "Initial commit". github.{{cite web}}: CS1 maint: numeric names: authors list (link)
  2. AUTOMATIC1111 (Jan 15, 2023). "add license file". github. Retrieved 11 July 2024.{{cite web}}: CS1 maint: numeric names: authors list (link)
  3. 1 2 3 Brade, Stephen; Wang, Bryan; Sousa, Mauricio; Oore, Sageev; Grossman, Tovi (29 October 2023). "Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models". Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery. pp. 1–14. arXiv: 2304.09337 . doi:10.1145/3586183.3606725. ISBN   979-8-4007-0132-0.
  4. 1 2 Mann, Tobias (29 Jun 2024). "A friendly guide to local AI image gen with Stable Diffusion and Automatic1111". The Register .
  5. 1 2 3 Lewis, Nick (16 September 2022). "How to Run Stable Diffusion Locally With a GUI on Windows". How-To Geek. Retrieved 11 July 2024.
  6. "Announcing SDXL 1.0". Stability AI. July 26, 2023.
  7. Zhu, Andrew (2024). Using Stable Diffusion with Python: Leverage Python to control and automate high-quality AI image generation using Stable Diffusion. Packt Publishing. ISBN   978-1835084311. Stable Diffusion WebUI from AUTO MATIC1111: This might be the most popular web-based application currently that allows users to generate images and text using Stable Diffusion. It provides a GUI interface that makes it easy to experiment with different settings and parameters
  8. Hu, Qihan; Xu, Zhenghui; Du, Peng; Zeng, Hao; Ma, Tongqing; Zhao, Youbing; Xie, Hao; Zhang, Peng; Liu, Shuting; Zang, Tongnian; Wang, Xuemei (2024). "CanFuUI: A Canvas-Centric Web User Interface for Iterative Image Generation with Diffusion Models and ControlNet". AI-generated Content. 1946. Springer Nature Singapore: 128–138. doi:10.1007/978-981-99-7587-7_11. Currently, the most popular user interfaces for Stable Diffusion are Stable Diffusion WebUI and ComfyUI.
  9. 大崎, 顕; 水口, 瑛介 (23 March 2024). はじめてでもここまでできる Stable Diffusion画像生成[本格]活用ガイド (in Japanese). ja:技術評論社. ISBN   978-4-297-14083-0.
  10. あわしろいくや (12 June 2024). "第817回 参考書を片手にUbuntuでもStable Diffusion WebUIを動作させ、画像を生成する". gihyo.jp (in Japanese). ja:技術評論社.
  11. AUTOMATIC1111 (August 2022). "Stable Diffusion Web UI". github.{{cite web}}: CS1 maint: numeric names: authors list (link)
  12. 1 2 Wang, Chenghao; Chung, Jeanhun (30 June 2023). "Research on AI Painting Generation Technology Based on the [Stable Diffusion]". International Journal of Advanced Smart Convergence. 12 (2): 90–95. doi:10.7236/IJASC.2023.12.2.90. Stable Diffusion Web UI is a browser interface based on the Gradio library,
  13. 1 2 Kim, Seonuk; Ko, Taeyoung; Kwon, Yousang; Lee, Kyungho (9 October 2023). "Designing interfaces for text-to-image prompt engineering using stable diffusion models: a human-AI interaction approach". IASDR Conference Series. doi: 10.21606/iasdr.2023.448 . ISBN   978-1-912294-59-6.
  14. Hook, Steve (10 January 2024). "Stable Diffusion WebUI - Run SDXL locally with the AUTOMATIC1111 GUI". PC Guide.
  15. Pocock, Kevin (16 August 2023). "Stable Diffusion: How to Use VAE". PC Guide. Retrieved 11 July 2024.
  16. 1 2 Phoenix, James; Taylor, Mike (2024). "AUTOMATIC1111 Web User Interface". Prompt engineering for generative AI: future-proof inputs for reliable AI outputs at scale (First ed.). Beijing Boston: O'Reilly. ISBN   978-1098153434.
  17. Zhang, Jing; Jiang, Yan (June 2023). "Style Transfer Technology of Batik Pattern Based on Deep Learning". Journal of Fiber Bioengineering and Informatics. 16 (1): 57–67. doi:10.3993/jfbim02171.
  18. 1 2 3 4 西川 和久 (14 February 2024). "【西川和久の不定期コラム】 VRAMが少ないGPUで画像生成AIを諦めていた人に。「Stable Diffusion WebUI Forge」登場!". PC Watch (in Japanese).
  19. 1 2 3 4 新清士 (February 26, 2024). "画像生成AI、安いPCでも高速に 衝撃の「Stable Diffusion WebUI Forge」 (1/4)". ASCII.jp (in Japanese).
  20. Horsey, Julian (14 February 2024). "Stable Diffusion WebUI Forge up to 75% faster than Automatic 1111 and ComfyUI". Geeky Gadgets.
  21. 田口和裕 (August 18, 2024). "話題の画像生成AI「FLUX.1」をStable Diffusion用の「WebUI Forge」で動かす(高速化も試してみました) (1/6)". ASCII.jp (in Japanese).