Original author(s) | AUTOMATIC1111 |
---|---|
Developer(s) | AUTOMATIC1111 and community |
Initial release | August 22, 2022 [1] |
Repository | github |
Written in | Python |
License | AGPL-3.0 [2] |
AUTOMATIC1111 Stable Diffusion Web UI (SD WebUI, A1111, or Automatic1111 [3] ) is an open source generative artificial intelligence program that allows users to generate images from a text prompt. [4] It uses Stable Diffusion as the base model for its image capabilities together with a large set of extensions and features to customize its output. [5]
SD WebUI was released on GitHub on August 22, 2022, by AUTOMATIC1111, [1] 1 month after the initial release of Stable Diffusion. [6] At the time, Stable Diffusion could only be run via the command line. [5] SD WebUI quickly rose in popularity and has been described as "the most popular tool for running diffusion models locally." [4] [7] A user study of six StableDiffusion users showed that all participants had used SD WebUI at least once. [3] The study showed that users ascribe SD WebUI's popularity to its ease of installation and support for open source tools. [3] SD WebUI is one of the most popular user interfaces for Stable Diffusion, together with ComfyUI. [8] In February 2024, a book was published by ja:Gijutsu Hyoronsha on using Stable Diffusion with SD WebUI in Japanese. [9] [10] As of July 2024, the project had 136,000 stars on GitHub. [11]
SD WebUI uses Gradio for its user interface. [12] [13] [14] Each parameter in the Stable Diffusion program is exposed via a UI interface within SD WebUI. SD WebUI contains additional parameters not included in Stable Diffusion itself, such as support for Low-rank adaptations, ControlNet and custom variational autoencoders. [12] [13] [15] SD WebUI supports prompt weighting, image-to-image based generation, inpainting, outpainting and image scaling. [16] It supports over 20 samplers including DDIM, Euler, Euler a, DPM++ 2M Karras, and UniPC. [16] [17] It is also used for its various optimizations over the base Stable Diffusion. [5]
Stable Diffusion WebUI Forge (Forge) is a notable fork of SD WebUI started by Lvmin Zhang, who is also the creator of ControlNet and Fooocus. [18] [19] The initial goal of Forge was to improve the performance and features of SD WebUI with the intention to upstream changes back to SD WebUI. [18] [19] One of Forge's optimizations allowed users with low VRAM to generate images faster on some versions of Stable Diffusion. [18] It improved generation speed for users with 8GB and 6GB VRAM by 30-45% and 60-75%, respectively. [18] [19] Forge also includes extra features such as support for more samplers than standard SD WebUI. [20] Some of Forge's optimizations were borrowed from ComfyUI, and others were developed by the Forge team. [19] In August 2024, Forge added support for the Flux diffusion model developed by Black Forest Labs, which is not yet supported by SD WebUI. [21]
A graphical user interface, or GUI, is a form of user interface that allows users to interact with electronic devices through graphical icons and visual indicators such as secondary notation. In many applications, GUIs are used instead of text-based UIs, which are based on typed command labels or text navigation. GUIs were introduced in reaction to the perceived steep learning curve of command-line interfaces (CLIs), which require commands to be typed on a computer keyboard.
Glade Interface Designer is a graphical user interface builder for GTK, with additional components for GNOME. In its third version, Glade is programming language–independent, and does not produce code for events, but rather an XML file that is then used with an appropriate binding.
Music and artificial intelligence (AI) is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.
Artificial intelligence art is visual artwork created or enhanced through the use of artificial intelligence (AI) programs.
DALL-E, DALL-E 2, and DALL-E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as prompts.
Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.
Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.
Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.
A text-to-image model is a machine learning model which takes an input natural language description and produces an image matching that description.
NovelAI is an online cloud-based, SaaS model, and a paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature being implemented later on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware.
DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject.
Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio. It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms. This results in a model which uses text prompts to generate image files, which can be put through an inverse Fourier transform and converted into audio files. While these files are only several seconds long, the model can also use latent space between outputs to interpolate different files together. This is accomplished using a functionality of the Stable Diffusion model known as img2img.
Text-to-Image personalization is a task in deep learning for computer graphics that augments pre-trained text-to-image generative models. In this task, a generative model that was trained on large-scale data, is adapted such that it can generate images of novel, user-provided concepts. These concepts are typically unseen during training, and may represent specific objects or more abstract categories.
Runway AI, Inc. is an American company headquartered in New York City that specializes in generative artificial intelligence research and technologies. The company is primarily focused on creating products and models for generating videos, images, and various multimedia content. It is most notable for developing the commercial text-to-video and video generative AI models Gen-1, Gen-2 and Gen-3 Alpha.
The Dog & the Boy is a 2023 animated science fiction short film directed by Ryōtarō Makihara. Distributed by Netflix and released on YouTube on 31 January 2023, the film follows the friendship between a robot dog and a boy. It became notable for its use of artificial intelligence art to create its background artwork, which was widely negatively received online.
Fooocus is an open source generative artificial intelligence program that allows users to generate images from a text prompt. It uses Stable Diffusion as the base model for its image capabilities as well as a collection of default settings and prompts to make the image generation process more streamlined.
ComfyUI is an open source, node-based program that allows users to generate images from a series of text prompts. It uses free diffusion models such as Stable Diffusion as the base model for its image capabilities combined with other tools such as ControlNet and LCM Low-rank adaptation with each tool being represented by a node in the program.
Generative AI pornography or simply AI pornography refers to digitally created explicit content produced through generative artificial intelligence (AI) technologies. Unlike traditional pornography, which involves real actors and cameras, this content is synthesized entirely by AI algorithms. These algorithms, including Generative adversarial network (GANs) and text-to-image models, generate lifelike images, videos, or animations from textual descriptions or datasets.
Flux is a text-to-image model developed by Black Forest Labs, based in Freiburg im Breisgau, Germany. Black Forest Labs were founded by former employees of Stability AI. As with other text-to-image models, Flux generates images from natural language descriptions, called prompts.
Civitai is an online platform and marketplace for generative AI content, primarily focused on AI-generated images and models.
{{cite web}}
: CS1 maint: numeric names: authors list (link){{cite web}}
: CS1 maint: numeric names: authors list (link)Stable Diffusion WebUI from AUTO MATIC1111: This might be the most popular web-based application currently that allows users to generate images and text using Stable Diffusion. It provides a GUI interface that makes it easy to experiment with different settings and parameters
Currently, the most popular user interfaces for Stable Diffusion are Stable Diffusion WebUI and ComfyUI.
{{cite web}}
: CS1 maint: numeric names: authors list (link)Stable Diffusion Web UI is a browser interface based on the Gradio library,