ComfyUI

Last updated
ComfyUI
Original author(s) comfyanonymous
Initial releaseJanuary 16, 2023;22 months ago (2023-01-16) [1]
Repository github.com/comfyanonymous/ComfyUI
Written in Python
License GPLv3 [2]
Website www.comfy.org

ComfyUI is an open source, node-based program that allows users to generate images from a series of text prompts. It uses free diffusion models such as Stable Diffusion as the base model for its image capabilities combined with other tools such as ControlNet and LCM Low-rank adaptation with each tool being represented by a node in the program.

Contents

History

ComfyUI was released on GitHub in January 2023. According to comfyanonymous, the creator, a major goal of the project was to improve on existing software designs in terms of the user interface. [3] The creator had been involved with Stability AI but by 3 June 2024 that involvement had ended and an organization called Comfy Org had been created along with the core developers. [4] In July 2024, Nvidia announced support for ComfyUI within its RTX Remix modding software. [5] In August 2024, support was added for the Flux diffusion model developed by Black Forest Labs and Comfy Org joined the Open Model Initiative created by the Linux Foundation. [6] [7] As of November 2024, the project has 58.6k stars on GitHub. [8]

Features

ComfyUI's main feature is that it is node based. [9] [10] Each node has a function such as "load a model" or "write a prompt". [11] The nodes are connected to form a control-flow graph called a workflow. [12] When a prompt is queued, a highlighted frame appears around the currently executing node, starting from "load checkpoint" and ending with the final image and its save location. [11] Workflows commonly consist of tens of nodes, forming a complex directed acyclic graph. [12] Node types include loading a model, specifying prompts, samplers, schedulers, VAE decoders, face restoration and upscaling models, LoRAs, embeddings, and ControlNets. [13] [14] Several samplers are supported, such as Euler, Euler_a, dpmpp_2m_sde and dpmpp_3m_sde. [14] Workflows can be saved to a file, allowing users to re-use node workflows and share them with other users. [13] [15] [16] The file format for the workflows is in JSON and can be embedded in the generated images. [17] Users have also created custom extensions to the base system which are exposed as new nodes, [13] [18] such as the extension for AnimateDiff, which aims to create videos. [19] [20] ComfyUI has been described as more complex compared to other diffusion UIs such as Automatic1111. [21] [22] A default node group is also included with the program. [11] As of December 2024, 1,674 nodes were supported. [23] ComfyUI Supports multiple text-to-image models including, Stable Diffusion, Flux and Tencent's Hunyuan-DiT, as well as custom models from Civitai like Pony. [23]

LLMVision extension compromise

In June 2024, a hacker group called "Nullbulge" compromised an extension of ComfyUI to add malicious code to it. [24] The compromised extension, called ComfyUI_LLMVISION, was used for integrating the interface with AI language models GPT-4 and Claude 3, and was hosted on GitHub. Nullbulge hosted a list of hundreds of ComfyUI users' login details across multiple services on its website, while users of the extension reported receiving numerous login notifications. vpnMentor conducted security research on the extension and claimed it could "steal crypto wallets, screenshot the user’s screen, expose device information and IP addresses, and steal files that contain certain keywords or extensions".

Nullbulge's website claims they targeted users who committed "one of our sins", which included AI-art generation, art theft, promoting cryptocurrency, and any other kind of theft from artists such as from Patreon. They claimed that they were "a collective of individuals who believe in the importance of protecting artists' rights and ensuring fair compensation for their work" and that they believed that "AI-generated artwork is detrimental to the creative industry and should be discouraged". [24]

Related Research Articles

<span class="mw-page-title-main">Glade Interface Designer</span> Graphical user interface builder

Glade Interface Designer is a graphical user interface builder for GTK, with additional components for GNOME. In its third version, Glade is programming language–independent, and does not produce code for events, but rather an XML file that is then used with an appropriate binding.

<span class="mw-page-title-main">OpenSceneGraph</span>

OpenSceneGraph is an open-source 3D graphics application programming interface, used by application developers in fields such as visual simulation, computer games, virtual reality, scientific visualization and modeling.

<span class="mw-page-title-main">Git</span> Distributed version control software system

Git is a distributed version control system that tracks versions of files. It is often used to control source code by programmers who are developing software collaboratively.

<span class="mw-page-title-main">Orange (software)</span> Open-source data analysis software

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for exploratory qualitative data analysis and interactive data visualization.

Music and artificial intelligence (AI) is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.

KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with minimal, programming.

ilastik is a user-friendly free open source software for image classification and segmentation. No previous experience in image processing is required to run the software. Since 2018 ilastik is further developed and maintained by Anna Kreshuk's group at European Molecular Biology Laboratory.

<span class="mw-page-title-main">Project Jupyter</span> Open source data science software

Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages.

<span class="mw-page-title-main">Artificial intelligence art</span> Machine application of knowledge of human aesthetic expressions

Artificial intelligence art is visual artwork created or enhanced through the use of artificial intelligence (AI) programs.

<span class="mw-page-title-main">Stylus (browser extension)</span> User style manager browser extension

Stylus is a user style manager, a browser extension for changing the look and feel of pages.

<span class="mw-page-title-main">Midjourney</span> Image-generating machine learning model

Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.

<span class="mw-page-title-main">Stable Diffusion</span> Image-generating machine learning model

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.

<span class="mw-page-title-main">NovelAI</span> Online service for AI media creation

NovelAI is an online cloud-based, SaaS model, and a paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature being implemented later on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware.

<span class="mw-page-title-main">DreamBooth</span> Deep learning generation model

DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject.

<span class="mw-page-title-main">Riffusion</span> Music-generating machine learning model

Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio. It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms. This results in a model which uses text prompts to generate image files, which can be put through an inverse Fourier transform and converted into audio files. While these files are only several seconds long, the model can also use latent space between outputs to interpolate different files together. This is accomplished using a functionality of the Stable Diffusion model known as img2img.

<i>The Dog & the Boy</i> 2023 Japanese animated science fiction short film

The Dog & the Boy is a 2023 animated science fiction short film directed by Ryōtarō Makihara. Distributed by Netflix and released on YouTube on 31 January 2023, the film follows the friendship between a robot dog and a boy. It became notable for its use of artificial intelligence art to create its background artwork, which was widely negatively received online.

<span class="mw-page-title-main">Fooocus</span> Open source generative artificial intelligence UI

Fooocus is an open source generative artificial intelligence program that allows users to generate images from a text prompt. It uses Stable Diffusion as the base model for its image capabilities as well as a collection of default settings and prompts to make the image generation process more streamlined.

<span class="mw-page-title-main">Automatic1111</span> Open source generative artificial intelligence UI

AUTOMATIC1111 Stable Diffusion Web UI is an open source generative artificial intelligence program that allows users to generate images from a text prompt. It uses Stable Diffusion as the base model for its image capabilities together with a large set of extensions and features to customize its output.

<span class="mw-page-title-main">Flux (text-to-image model)</span> Image-generating machine learning model

Flux is a text-to-image model developed by Black Forest Labs, based in Freiburg im Breisgau, Germany. Black Forest Labs was founded by Robin Rombach, Andreas Blattmann, and Patrick Esser. As with other text-to-image models, Flux generates images from natural language descriptions, called prompts.

Civitai is an online platform and marketplace for generative AI content, primarily focused on AI-generated images and models.

References

  1. comfyanonymous. "Initial commit". github. Retrieved 10 July 2024.
  2. comfyanonymous. "LICENSE". github. Retrieved 10 July 2024.
  3. comfyanonymous (18 May 2023). "ComfyUI is now 4 months old!". ComfyUI blog. Retrieved 11 July 2024.
  4. "ComfyUI 作者团队成立 Comfy Org- DoNews快讯". DoNews .
  5. Harper, Christopher (4 July 2024). "Nvidia's RTX Remix goes open source —chipmaker adds Rest API to interface with ComfyUI for AI remastering or generating new graphics in real time". Tom's Hardware . Retrieved 11 July 2024.
  6. 田口和裕 (August 7, 2024). "画像生成AI「Stable Diffusion」の代替に? 話題の「FLUX.1」を試した (1/7)". ASCII.jp (in Japanese).
  7. Wheatley, Mike (12 August 2024). "Linux Foundation's latest initiative aims to promote 'irrevocable' open-source AI models". SiliconANGLE.
  8. comfyanonymous. "ComfyUI". github. Retrieved 10 July 2024.
  9. Zhu, Andrew (2024). Using Stable Diffusion with Python: Leverage Python to control and automate high-quality AI image generation using Stable Diffusion. Packt Publishing. ISBN   978-1835084311. ComfyUI is a node-based UI that utilizes Stable Diffusion. It allows users to construct tailored workflows, including image post-processing and conversions. It is a potent and adaptable graphical user interface for Stable Diffusion, characterized by its node-based design.
  10. 故渊 (November 25, 2023). "7 年老显卡 GTX 1080 能跑,图片生成视频模型 Stable Video Diffusion 更新 - IT之家". ithome.
  11. 1 2 3 田口, 和裕. "画像生成AI「Stable Diffusion」使い倒すならコレ! 「ComfyUI」基本の使い方 (1/3)". ascii.jp (in Japanese).
  12. 1 2 Xue, Xiangyuan; Lu, Zeyu; Huang, Di; Ouyang, Wanli; Bai, Lei (2 Sep 2024). "GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI". arXiv: 2409.01392 [cs.CL].
  13. 1 2 3 Gal, Rinon; Haviv, Adi; Alaluf, Yuval; Bermano, Amit H.; Cohen-Or, Daniel; Chechik, Gal (2024). "ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation". arXiv: 2410.01731 [cs.CL].
  14. 1 2 Zeman, Benjamin (4 December 2024). "How to build basic workflows in ComfyUI". XDA.
  15. 白鲸出海 (23 May 2024). "一家成都游戏公司,做出了一款千万月访问量的AI图像产品-36氪". 36氪 (in Chinese).
  16. 田口, 和裕 (27 March 2024). "Macで始める画像生成AI 「Stable Diffusion」ComfyUIの使い方 (3/5)". ascii.jp (in Japanese).
  17. しらいはかせ (18 December 2023). "画像生成AIを使い倒す!「Stability Matrix」で使えるWebUIを紹介【生成AIストリーム】". Impress Watch (in Japanese).
  18. 机器之心 (16 November 2023). "当韩国女团BLACKPINK进军二次元,清华叉院AI神器原来还能这么玩-36氪". 36氪 (in Chinese).
  19. 新, 清士. "アニメの常識、画像生成AIが変える可能性「AnimateDiff」のすごい進化". ascii.jp (in Japanese).
  20. Guo, Yuwei; Yang, Ceyuan; Rao, Anyi; Liang, Zhengyang; Wang, Yaohui; Qiao, Yu; Agrawala, Maneesh; Lin, Dahua; Dai, Bo (May 2024). "AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning". International Conference on Learning Representations. arXiv: 2307.04725 .
  21. Phoenix, James; Taylor, Mike (2024). "AUTOMATIC1111 Web User Interface". Prompt engineering for generative AI: future-proof inputs for reliable AI outputs at scale (First ed.). Beijing Boston: O'Reilly. ISBN   978-1098153434. Advanced users may also want to explore ComfyUI, as it supports more advanced workflows and increased flexibility (including image-to-video), but we deemed this too complex for the majority of use cases, which can easily be handled by AUTOMATIC1111.
  22. Pérez-Colado, Iván J.; Freire-Morán, Manuel; Calvo-Morata, Antonio; Pérez-Colado, Víctor M.; Fernández-Manjón, Baltasar (8 May 2024). "AI Asyet Another Tool in Undergraduate Student Projects: Preliminary Results". 2024 IEEE Global Engineering Education Conference (EDUCON). pp. 1–7. doi:10.1109/EDUCON60312.2024.10578883. ISBN   979-8-3503-9402-3.
  23. 1 2 Zeman, Benjamin (6 December 2024). "Adobe Photoshop's Firefly vs. ComfyUI and Stable Diffusion". XDA.
  24. 1 2 Maiberg, Emanuel (2024-06-11). "Hackers Target AI Users With Malicious Stable Diffusion Tool on GitHub to Protest 'Art Theft'". 404 Media . Retrieved 2024-06-14.