Original author(s) | comfyanonymous |
---|---|
Initial release | January 16, 2023 [1] |
Repository | github |
Written in | Python |
License | GPLv3 [2] |
Website | www |
ComfyUI is an open source, node-based program that allows users to generate images from a series of text prompts. It uses free diffusion models such as Stable Diffusion as the base model for its image capabilities combined with other tools such as ControlNet and LCM Low-rank adaptation with each tool being represented by a node in the program.
ComfyUI was released on GitHub in January 2023. According to comfyanonymous, the creator, a major goal of the project was to improve on existing software designs in terms of the user interface. [3] The creator had been involved with Stability AI but by 3 June 2024 that involvement had ended and an organization called Comfy Org had been created along with the core developers. [4] In July 2024, Nvidia announced support for ComfyUI within its RTX Remix modding software. [5] In August 2024, support was added for the Flux diffusion model developed by Black Forest Labs and Comfy Org joined the Open Model Initiative created by the Linux Foundation. [6] [7] As of November 2024, the project has 58.6k stars on GitHub. [8]
ComfyUI's main feature is that it is node based. [9] [10] Each node has a function such as "load a model" or "write a prompt". [11] The nodes are connected to form a control-flow graph called a workflow. [12] When a prompt is queued, a highlighted frame appears around the currently executing node, starting from "load checkpoint" and ending with the final image and its save location. [11] Workflows commonly consist of tens of nodes, forming a complex directed acyclic graph. [12] Node types include loading a model, specifying prompts, samplers, schedulers, VAE decoders, face restoration and upscaling models, LoRAs, embeddings, and ControlNets. [13] [14] Several samplers are supported, such as Euler, Euler_a, dpmpp_2m_sde and dpmpp_3m_sde. [14] Workflows can be saved to a file, allowing users to re-use node workflows and share them with other users. [13] [15] [16] The file format for the workflows is in JSON and can be embedded in the generated images. [17] Users have also created custom extensions to the base system which are exposed as new nodes, [13] [18] such as the extension for AnimateDiff, which aims to create videos. [19] [20] ComfyUI has been described as more complex compared to other diffusion UIs such as Automatic1111. [21] [22] A default node group is also included with the program. [11] As of December 2024, 1,674 nodes were supported. [23] ComfyUI Supports multiple text-to-image models including, Stable Diffusion, Flux and Tencent's Hunyuan-DiT, as well as custom models from Civitai like Pony. [23]
In June 2024, a hacker group called "Nullbulge" compromised an extension of ComfyUI to add malicious code to it. [24] The compromised extension, called ComfyUI_LLMVISION, was used for integrating the interface with AI language models GPT-4 and Claude 3, and was hosted on GitHub. Nullbulge hosted a list of hundreds of ComfyUI users' login details across multiple services on its website, while users of the extension reported receiving numerous login notifications. vpnMentor conducted security research on the extension and claimed it could "steal crypto wallets, screenshot the user’s screen, expose device information and IP addresses, and steal files that contain certain keywords or extensions".
Nullbulge's website claims they targeted users who committed "one of our sins", which included AI-art generation, art theft, promoting cryptocurrency, and any other kind of theft from artists such as from Patreon. They claimed that they were "a collective of individuals who believe in the importance of protecting artists' rights and ensuring fair compensation for their work" and that they believed that "AI-generated artwork is detrimental to the creative industry and should be discouraged". [24]
Glade Interface Designer is a graphical user interface builder for GTK, with additional components for GNOME. In its third version, Glade is programming language–independent, and does not produce code for events, but rather an XML file that is then used with an appropriate binding.
OpenSceneGraph is an open-source 3D graphics application programming interface, used by application developers in fields such as visual simulation, computer games, virtual reality, scientific visualization and modeling.
Git is a distributed version control system that tracks versions of files. It is often used to control source code by programmers who are developing software collaboratively.
Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for exploratory qualitative data analysis and interactive data visualization.
Music and artificial intelligence (AI) is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.
KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with minimal, programming.
ilastik is a user-friendly free open source software for image classification and segmentation. No previous experience in image processing is required to run the software. Since 2018 ilastik is further developed and maintained by Anna Kreshuk's group at European Molecular Biology Laboratory.
Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages.
Artificial intelligence art is visual artwork created or enhanced through the use of artificial intelligence (AI) programs.
Stylus is a user style manager, a browser extension for changing the look and feel of pages.
Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.
Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.
NovelAI is an online cloud-based, SaaS model, and a paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature being implemented later on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware.
DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject.
Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio. It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms. This results in a model which uses text prompts to generate image files, which can be put through an inverse Fourier transform and converted into audio files. While these files are only several seconds long, the model can also use latent space between outputs to interpolate different files together. This is accomplished using a functionality of the Stable Diffusion model known as img2img.
The Dog & the Boy is a 2023 animated science fiction short film directed by Ryōtarō Makihara. Distributed by Netflix and released on YouTube on 31 January 2023, the film follows the friendship between a robot dog and a boy. It became notable for its use of artificial intelligence art to create its background artwork, which was widely negatively received online.
Fooocus is an open source generative artificial intelligence program that allows users to generate images from a text prompt. It uses Stable Diffusion as the base model for its image capabilities as well as a collection of default settings and prompts to make the image generation process more streamlined.
AUTOMATIC1111 Stable Diffusion Web UI is an open source generative artificial intelligence program that allows users to generate images from a text prompt. It uses Stable Diffusion as the base model for its image capabilities together with a large set of extensions and features to customize its output.
Flux is a text-to-image model developed by Black Forest Labs, based in Freiburg im Breisgau, Germany. Black Forest Labs was founded by Robin Rombach, Andreas Blattmann, and Patrick Esser. As with other text-to-image models, Flux generates images from natural language descriptions, called prompts.
Civitai is an online platform and marketplace for generative AI content, primarily focused on AI-generated images and models.
ComfyUI is a node-based UI that utilizes Stable Diffusion. It allows users to construct tailored workflows, including image post-processing and conversions. It is a potent and adaptable graphical user interface for Stable Diffusion, characterized by its node-based design.
Advanced users may also want to explore ComfyUI, as it supports more advanced workflows and increased flexibility (including image-to-video), but we deemed this too complex for the majority of use cases, which can easily be handled by AUTOMATIC1111.