Fooocus

Last updated
Fooocus
Original author(s) Lvmin Zhang
Initial releaseAugust 9, 2023;15 months ago (2023-08-09) [1]
Repository github.com/lllyasviel/Fooocus
Written in Python
License GPLv3 [2]

Fooocus is an open source generative artificial intelligence program that allows users to generate images from a text prompt. [3] [4] It uses Stable Diffusion as the base model for its image capabilities as well as a collection of default settings and prompts to make the image generation process more streamlined. [3] [4] [5]

Contents

History

Fooocus was created by Lvmin Zhang, a doctoral student at Stanford University who previously studied at the Chinese University of Hong Kong and Soochow University. [6] [7] He is also the main author of ControlNet, [8] [6] [9] which has been adopted by many other Stable Diffusion interfaces such as AUTOMATIC1111 Stable Diffusion Web UI and ComfyUI. As of 9 July 2024, the project had 38.1k stars on GitHub. [10]

Features

Fooocus' main feature is that it is easy to set up and does not require users to manually configure model parameters to achieve desirable results. [6] [3] [11] [12] According to the project, it uses GPT-2 to automatically add more detail to the user's prompts. [13] It includes common extensions such LCM Low-rank adaptation by default which allows for faster generation speed. [4] [14] Fooocus prefers a photographic style by default, with a list of predefined styles to chose from. [3] [15] While Fooocus aims to provide good results out of the box, it also includes an "advanced" tab that allows for user customization. [16] The user interface is based on Gradio. [17]

Related Research Articles

<span class="mw-page-title-main">Pentadactyl</span> Firefox extension

Pentadactyl was a Firefox extension forked from the Vimperator and designed to provide a more efficient user interface for keyboard-fluent users. The design is heavily inspired by the Vim text editor, and the authors try to maintain consistency with it wherever possible. It is now maintained as a Pale Moon extension.

<span class="mw-page-title-main">Transmission (BitTorrent client)</span> BitTorrent client

Transmission is a BitTorrent client which features a variety of user interfaces on top of a cross-platform back-end. Transmission is free software licensed under the terms of the GNU General Public License, with parts under the MIT License.

<span class="mw-page-title-main">GitHub</span> Hosting service for software projects

GitHub is a developer platform that allows developers to create, store, manage and share their code. It uses Git software, which provides distributed version control of access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project. Headquartered in California, it has been a subsidiary of Microsoft since 2018.

Music and artificial intelligence (AI) is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.

Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.

ilastik is a user-friendly free open source software for image classification and segmentation. No previous experience in image processing is required to run the software. Since 2018 ilastik is further developed and maintained by Anna Kreshuk's group at European Molecular Biology Laboratory.

<span class="mw-page-title-main">Julia (programming language)</span> Dynamic programming language

Julia is a high-level, general-purpose dynamic programming language, still designed to be fast and productive, for e.g. data science, artificial intelligence, machine learning, modeling and simulation, most commonly used for numerical analysis and computational science.

<span class="mw-page-title-main">GraphQL</span> Data query language developed by Facebook

GraphQL is a data query and manipulation language for APIs that allows a client to specify what data it needs. A GraphQL server can fetch data from separate sources for a single client query and present the results in a unified graph. It is not tied to any specific database or storage engine.

<span class="mw-page-title-main">Project Jupyter</span> Open source data science software

Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages.

Lean is a proof assistant and a functional programming language. It is based on the calculus of constructions with inductive types. It is an open-source project hosted on GitHub. It was developed primarily by Leonardo de Moura while employed by Microsoft Research and now Amazon Web Services, and has had significant contributions from other coauthors and collaborators during its history. Development is currently supported by the non-profit Lean Focused Research Organization (FRO).

<span class="mw-page-title-main">Contrastive Language-Image Pre-training</span> Technique in neural networks for learning joint representations of text and images

Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective. This method has enabled broad applications across multiple domains, including cross-modal retrieval, text-to-image generation, aesthetic ranking, and image captioning.

<span class="mw-page-title-main">Midjourney</span> Image-generating machine learning model

Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.

Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

<span class="mw-page-title-main">Stable Diffusion</span> Image-generating machine learning model

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.

<span class="mw-page-title-main">NovelAI</span> Online service for AI media creation

NovelAI is an online cloud-based, SaaS model, and a paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature being implemented later on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware.

<span class="mw-page-title-main">DreamBooth</span> Deep learning generation model

DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject.

<span class="mw-page-title-main">Riffusion</span> Music-generating machine learning model

Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio. It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms. This results in a model which uses text prompts to generate image files, which can be put through an inverse Fourier transform and converted into audio files. While these files are only several seconds long, the model can also use latent space between outputs to interpolate different files together. This is accomplished using a functionality of the Stable Diffusion model known as img2img.

<span class="mw-page-title-main">ComfyUI</span> Open source node-based generative artificial intelligence UI

ComfyUI is an open source, node-based program that allows users to generate images from a series of text prompts. It uses free diffusion models such as Stable Diffusion as the base model for its image capabilities combined with other tools such as ControlNet and LCM Low-rank adaptation with each tool being represented by a node in the program.

<span class="mw-page-title-main">Automatic1111</span> Open source generative artificial intelligence UI

AUTOMATIC1111 Stable Diffusion Web UI is an open source generative artificial intelligence program that allows users to generate images from a text prompt. It uses Stable Diffusion as the base model for its image capabilities together with a large set of extensions and features to customize its output.

The Latent Diffusion Model (LDM) is a diffusion model architecture developed by the CompVis group at LMU Munich.

References

  1. Zhang, Lvmin. "i". github.com. Retrieved 9 July 2024.
  2. Zhang, Lvmin. "Create LICENSE". github.com. Retrieved 9 July 2024.
  3. 1 2 3 4 Hachman, Mark. "Fooocus is the easiest way to create AI art on your PC". PC World . Retrieved 9 July 2024.
  4. 1 2 3 西川, 和久 (20 November 2023). "【西川和久の不定期コラム】 Stable Diffusion高速化の決定版登場!?品質落とさず制限もほぼなしで2~3倍速に". Impress Watch (in Japanese). Retrieved 9 July 2024.
  5. Pande, Ayush (17 September 2024). "I turned my old PC into an AI image generation server - here's how I did it". XDA. Finally, I went with the Fooocus model (yup, that's the name) by developer lllyasviel, as it had the simplest installation procedure out of everything else I tried on my PC.
  6. 1 2 3 克雷西; 萧箫 (26 August 2023). "Fooocus登顶GitHub热榜:4G显存低配畅玩AIGC". 虎嗅 (in Simplified Chinese). Retrieved 9 July 2024.
  7. Zhang, Lvmin. "Lvmin Zhang (Lyumin Zhang)". lllyasviel.github.io. Retrieved 9 July 2024.
  8. Zhang, Lvmin; Rao, Anyi; Agrawala, Maneesh (2023). "Adding Conditional Control to Text-to-Image Diffusion Models". arXiv: 2302.05543 [cs.CV].
  9. 新, 清士. "画像生成AIに2度目の革命を起こした「ControlNet」 (1/4)". ascii.jp (in Japanese). Retrieved 9 July 2024.
  10. "lllyasviel/Fooocus". github.com. 9 July 2024. Retrieved 9 July 2024.
  11. Morton, Brad (21 April 2024). "Want Powerful Local AI Image Generation on Windows? Use This Tool". How-To Geek.
  12. Lanz, Jose Antonio (9 December 2023). "Conoce Las Mejores Herramientas de IA Para Generar Imágenes: Guía Detallada". Decrypt (in Spanish).
  13. Zhang, Lvmin (10 July 2024). "List of "Hidden" Tricks". GitHub.com. Retrieved 10 July 2024.
  14. Luo, Simian (2023). "LCM-LoRA: A Universal Stable-Diffusion Acceleration Module". arXiv: 2311.05556 [cs.CV].
  15. Esteban, Félix (13 February 2024). "Las cinco mejores IA gratuitas para crear y editar imágenes". Business Insider España (in Spanish).
  16. Horsey, Julian (16 August 2023). "Amazing Fooocus SDXL user interface for AI art generation". Geeky Gadgets. Retrieved 9 July 2024.
  17. Çıtak, Emre (30 November 2023). "Fooocus: Features And How To Install - Dataconomy". dataconomy.com. Retrieved 9 July 2024.