Original author(s) | Lvmin Zhang |
---|---|
Initial release | August 9, 2023 [1] |
Repository | github |
Written in | Python |
License | GPLv3 [2] |
Fooocus is an open source generative artificial intelligence program that allows users to generate images from a text prompt. [3] [4] It uses Stable Diffusion as the base model for its image capabilities as well as a collection of default settings and prompts to make the image generation process more streamlined. [3] [4] [5]
Fooocus was created by Lvmin Zhang, a doctoral student at Stanford University who previously studied at the Chinese University of Hong Kong and Soochow University. [6] [7] He is also the main author of ControlNet, [8] [6] [9] which has been adopted by many other Stable Diffusion interfaces such as AUTOMATIC1111 Stable Diffusion Web UI and ComfyUI. As of 9 July 2024, the project had 38.1k stars on GitHub. [10]
Fooocus' main feature is that it is easy to set up and does not require users to manually configure model parameters to achieve desirable results. [6] [3] [11] [12] According to the project, it uses GPT-2 to automatically add more detail to the user's prompts. [13] It includes common extensions such LCM Low-rank adaptation by default which allows for faster generation speed. [4] [14] Fooocus prefers a photographic style by default, with a list of predefined styles to chose from. [3] [15] While Fooocus aims to provide good results out of the box, it also includes an "advanced" tab that allows for user customization. [16] The user interface is based on Gradio. [17]
Pentadactyl was a Firefox extension forked from the Vimperator and designed to provide a more efficient user interface for keyboard-fluent users. The design is heavily inspired by the Vim text editor, and the authors try to maintain consistency with it wherever possible. It is now maintained as a Pale Moon extension.
Transmission is a BitTorrent client which features a variety of user interfaces on top of a cross-platform back-end. Transmission is free software licensed under the terms of the GNU General Public License, with parts under the MIT License.
GitHub is a developer platform that allows developers to create, store, manage and share their code. It uses Git software, which provides distributed version control of access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project. Headquartered in California, it has been a subsidiary of Microsoft since 2018.
Music and artificial intelligence (AI) is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.
Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.
ilastik is a user-friendly free open source software for image classification and segmentation. No previous experience in image processing is required to run the software. Since 2018 ilastik is further developed and maintained by Anna Kreshuk's group at European Molecular Biology Laboratory.
Julia is a high-level, general-purpose dynamic programming language, still designed to be fast and productive, for e.g. data science, artificial intelligence, machine learning, modeling and simulation, most commonly used for numerical analysis and computational science.
GraphQL is a data query and manipulation language for APIs that allows a client to specify what data it needs. A GraphQL server can fetch data from separate sources for a single client query and present the results in a unified graph. It is not tied to any specific database or storage engine.
Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages.
Lean is a proof assistant and a functional programming language. It is based on the calculus of constructions with inductive types. It is an open-source project hosted on GitHub. It was developed primarily by Leonardo de Moura while employed by Microsoft Research and now Amazon Web Services, and has had significant contributions from other coauthors and collaborators during its history. Development is currently supported by the non-profit Lean Focused Research Organization (FRO).
Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective. This method has enabled broad applications across multiple domains, including cross-modal retrieval, text-to-image generation, aesthetic ranking, and image captioning.
Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.
Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.
Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.
NovelAI is an online cloud-based, SaaS model, and a paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature being implemented later on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware.
DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject.
Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio. It was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms. This results in a model which uses text prompts to generate image files, which can be put through an inverse Fourier transform and converted into audio files. While these files are only several seconds long, the model can also use latent space between outputs to interpolate different files together. This is accomplished using a functionality of the Stable Diffusion model known as img2img.
ComfyUI is an open source, node-based program that allows users to generate images from a series of text prompts. It uses free diffusion models such as Stable Diffusion as the base model for its image capabilities combined with other tools such as ControlNet and LCM Low-rank adaptation with each tool being represented by a node in the program.
AUTOMATIC1111 Stable Diffusion Web UI is an open source generative artificial intelligence program that allows users to generate images from a text prompt. It uses Stable Diffusion as the base model for its image capabilities together with a large set of extensions and features to customize its output.
The Latent Diffusion Model (LDM) is a diffusion model architecture developed by the CompVis group at LMU Munich.
Finally, I went with the Fooocus model (yup, that's the name) by developer lllyasviel, as it had the simplest installation procedure out of everything else I tried on my PC.