Flux (text-to-image model)

Last updated

Flux
Original author(s) Black Forest Labs
Developer(s) Black Forest Labs
Initial releaseAugust 2024
Stable release
Flux 1.1 Pro (model) [1] / 2 October 2024
Repository
Type Text-to-image model
License
Website blackforestlabs.ai

Flux (also known as FLUX.1) is a text-to-image model developed by Black Forest Labs, based in Freiburg im Breisgau, Germany. Black Forest Labs was founded by former employees of Stability AI. As with other text-to-image models, Flux generates images from natural language descriptions, called prompts .

Contents

History

Black Forest Labs was founded in 2024 by Robin Rombach, Andreas Blattmann, and Patrick Esser, former employees of Stability AI. [2] [3] All three founders had previously researched the artificial intelligence image generation at Ludwig Maximillian University of Munich as research assistants under Björn Ommer. [4] [5] [6] They published their research results on image generation in 2022, which resulted in Stable Diffusion. [6] [7] Investors in Black Forest Labs included venture capital firm Andreessen Horowitz, Brendan Iribe, Michael Ovitz, Garry Tan, and Vladlen Koltun. [8] The company received an initial investment of US$ 31 million. [9] [10]

In August 2024, Flux was integrated into the Grok chatbot developed by xAI and made available as part of premium feature on X (formerly Twitter). [11] [12] [13] [14] Grok later switched to its own text-to-image model Aurora in December 2024. [15]

On 18 November 2024, Mistral AI announced that its Le Chat chatbot had integrated Flux Pro as its image generation model. [16] [17]

On 21 November 2024, Black Forest Labs announced the release of Flux.1 Tools, a suite of editing tools designed to be used on top of existing Flux models. The tools consisting of Flux.1 Fill for inpainting and outpainting, Flux.1 Depth for control based on extracted depth map of input images and prompts, Flux.1 Canny for control based on extracted canny edges of input images and prompts, and Flux.1 Redux for mixing existing input images and prompts. Each tools are available in both Dev and Pro variants. [18] [19]

Models

Flux is a series of text-to-image models. The models are based on a hybrid architecture that combines multimodal and parallel diffusion transformer blocks scaled to 12 billion parameters. [8] The models are released under different licences with Schnell (meaning Fast or Quick in German language) released as open-source software under Apache License, Dev released as source-available software under a non-commercial licence, and Pro released as proprietary software and only available as API that can be licensed by third-party users. [20] [21] Users retained the ownership of resulting output regardless of models used. [22] [23]

The models can be used either online or locally by using generative AI user interfaces such as ComfyUI and Stable Diffusion WebUI Forge (a fork of Automatic1111 WebUI). [8] [24]

An improved flagship model, Flux 1.1 Pro was released on 2 October 2024. [25] [26] Two additional modes were added on 6 November, Ultra which can generate image at four times higher resolution and up to 4 megapixel without affecting generation speed and Raw which can generate hyper-realistic image in the style of candid photography. [27] [28] [29]

Related to Flux is text-to-video model SOTA, under development as of December 2024. [8]

Reception

According to a test performed by Ars Technica, the outputs generated by Flux.1 Dev and Flux.1 Pro are comparable with DALL-E 3 in terms of prompt fidelity, with the photorealism closely matched Midjourney 6 and generated human hands with more consistency over previous models such as Stable Diffusion XL. [30]

Flux has been criticised for its very realistic generated images. According to media reports, depictions ranged from an image of Donald Trump posing with guns to disturbing scenes, which triggered discussions about ethical implications of technologies developed by Black Forest Labs. [4] [13]

After the release of the model, social media X was flooded with Flux-generated images. [31] [32] Black Forest Labs has not provided exact details of the data used to train the model. [27] Ars Technica suspected that Flux is based on a large, unauthorised collection of images scraped from the internet, a controversial practice with potential legal consequences. [30] [33]

Related Research Articles

OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.

123RF, a branch of Inmagine Group, is a stock photos provider founded in 2005 which sells royalty-free images and stock photography. The company also has an expansive collection of vector graphics, icons, fonts, videos, and audio files.

<span class="mw-page-title-main">Artificial intelligence art</span> Visual media created with AI

Artificial intelligence art is visual artwork created or enhanced through the use of artificial intelligence (AI) programs.

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020.

<span class="mw-page-title-main">DALL-E</span> Image-generating deep-learning model

DALL-E, DALL-E 2, and DALL-E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as prompts.

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence (AI) model. A prompt is natural language text describing the task that an AI should perform. A prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?", a command such as "write a poem in the style of Edgar Allan Poe about leaves falling", or a longer statement including context, instructions, and conversation history.

<span class="mw-page-title-main">Midjourney</span> Image-generating machine learning model

Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.

<span class="mw-page-title-main">Stable Diffusion</span> Image-generating machine learning model

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.

<span class="mw-page-title-main">Text-to-image model</span> Machine learning model

A text-to-image model is a machine learning model which takes an input natural language description and produces an image matching that description.

<span class="mw-page-title-main">Text-to-video model</span> Machine learning model

A text-to-video model is a machine learning model that uses a natural language description as input to produce a video relevant to the input text. Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video diffusion models.

<span class="mw-page-title-main">Generative artificial intelligence</span> AI system capable of generating content in response to prompts

Generative artificial intelligence is a subset of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models learn the underlying patterns and structures of their training data and use them to produce new data based on the input, which often comes in the form of natural language prompts.

In the 2020s, the rapid advancement of deep learning-based generative artificial intelligence models raised questions about whether copyright infringement occurs when such are trained or used. This includes text-to-image models such as Stable Diffusion and large language models such as ChatGPT. As of 2023, there were several pending U.S. lawsuits challenging the use of copyrighted data to train AI models, with defendants arguing that this falls under fair use.

xAI (company) Artificial Intelligence focused startup

X.AI Corp., doing business as xAI, is an American startup company working in the area of artificial intelligence (AI). Founded by Elon Musk in March 2023, its stated goal is "to understand the true nature of the universe".

Runway AI, Inc. is an American company headquartered in New York City that specializes in generative artificial intelligence research and technologies. The company is primarily focused on creating products and models for generating videos, images, and various multimedia content. It is most notable for developing the commercial text-to-video and video generative AI models Gen-1, Gen-2 and Gen-3 Alpha.

<span class="mw-page-title-main">Grok (chatbot)</span> Chatbot developed by xAI

Grok is a generative artificial intelligence chatbot developed by xAI. Based on the large language model (LLM) of the same name, it was launched in 2023 as an initiative by Elon Musk. The chatbot is advertised as having a "sense of humor" and direct access to X. It is currently under beta testing.

<span class="mw-page-title-main">Sora (text-to-video model)</span> Generative artificial intelligence model

Sora is a text-to-video model developed by OpenAI. The model generates short video clips based on user prompts, and can also extend existing short videos. Sora was released publicly for ChatGPT Plus and ChatGPT Pro users in December 2024.

<span class="mw-page-title-main">Ideogram (text-to-image model)</span> Generative artificial intelligence model

Ideogram is a freemium text-to-image model developed by Ideogram, Inc. using deep learning methodologies to generate digital images from natural language descriptions known as prompts. The model is capable of generating legible text in the images compared to other text-to-image models.

<span class="mw-page-title-main">Dream Machine (text-to-video model)</span> Generative artificial intelligence model

Dream Machine is a text-to-video model created by Luma Labs and launched in June 2024. It generates video output based on user prompts or still images. Dream Machine has been noted for its ability to realistically capture motion, while some critics have remarked upon the lack of transparency about its training data. Upon the program's release, users on social media created moving versions of various Internet memes.

<span class="mw-page-title-main">ComfyUI</span> Open source node-based generative artificial intelligence UI

ComfyUI is an open source, node-based program that allows users to generate images from a series of text prompts. It uses free diffusion models such as Stable Diffusion as the base model for its image capabilities combined with other tools such as ControlNet and LCM Low-rank adaptation with each tool being represented by a node in the program.

<span class="mw-page-title-main">Automatic1111</span> Open source generative artificial intelligence UI

AUTOMATIC1111 Stable Diffusion Web UI is an open source generative artificial intelligence program that allows users to generate images from a text prompt. It uses Stable Diffusion as the base model for its image capabilities together with a large set of extensions and features to customize its output.

References

  1. "Announcing FLUX1.1 [pro] and the BFL API". Black Forest Labs. 2 October 2024. Retrieved 17 November 2024.
  2. Killian, Nicolas (27 August 2024). "Black Forest Labs: Sie sind ein Teil von jener Kraft" . Die Zeit (in German). ISSN   0044-2070. Archived from the original on 4 October 2024. Retrieved 17 November 2024.
  3. Growcoot, Matt (5 August 2024). "AI Image Generator Made by Stable Diffusion Inventors on Par With Midjourney and DALL-E". PetaPixel . Retrieved 17 November 2024.
  4. 1 2 "Black Forest Labs unter Beschuss: Schockierende KI-Bilder sorgen für…". AlleAktien (in German). 22 August 2024. Retrieved 17 November 2024.
  5. Hermes, Ann Kathrin (8 August 2024). "Black Forest Labs: KI-Tools aus dem Schwarzwald". trend.at (in German). Retrieved 17 November 2024.
  6. 1 2 Schwär, Hannah (15 August 2024). "Black Forest Labs: Die Schwarzwald-KI, auf die Elon Musk setzt". Capital.de (in German). Retrieved 17 November 2024.
  7. "High-Resolution Image Synthesis with Latent Diffusion Models". Computer Vision & Learning Group. Archived from the original on 16 November 2024. Retrieved 17 November 2024.
  8. 1 2 3 4 "Announcing Black Forest Labs". Black Forest Labs. 1 August 2024. Archived from the original on 17 November 2024. Retrieved 17 November 2024.
  9. Steinschaden, Jakob (12 August 2024). "Black Forest Labs: 31 Mio. Dollar für Herausforderer von OpenAI und Midjourney". Trending Topics (in German). Archived from the original on 28 August 2024. Retrieved 17 November 2024.
  10. Nuñez, Michael (1 August 2024). "Stable Diffusion creators launch Black Forest Labs, secure $31M for FLUX.1 AI image generator". VentureBeat . Archived from the original on 8 October 2024. Retrieved 17 November 2024.
  11. Puscher, Frank. "Generative AI. Black Forest Labs und Flux.1: Vom Superstar zum Buhmann in fünf Tagen" . MEEDIA (in German). Archived from the original on 27 September 2024. Retrieved 17 November 2024.
  12. Bomke, Luisa. "Flux.1 – ein deutscher KI-Bildgenerator dreht mit Grok frei". Handelsblatt (in German). Archived from the original on 30 August 2024. Retrieved 17 November 2024.
  13. 1 2 Weatherbed, Jess (14 August 2024). "xAI's new Grok-2 chatbots bring AI image generation to X". The Verge . Archived from the original on 17 November 2024. Retrieved 17 November 2024.
  14. Metz, Rachel (21 August 2024). "This Tiny Startup Is Helping Musk's Grok With Image Generation" . Bloomberg News . Retrieved 19 November 2024.
  15. Davis, Wes (7 December 2024). "X gives Grok a new photorealistic AI image generator". The Verge. Archived from the original on 12 December 2024. Retrieved 10 December 2024.
  16. "Mistral has entered the chat". Mistral AI . 18 November 2024. Retrieved 11 December 2024.
  17. Franzen, Carl (18 November 2024). "Mistral unleashes Pixtral Large and upgrades Le Chat into full-on ChatGPT competitor". VentureBeat. Retrieved 11 December 2024.
  18. "Introducing FLUX.1 Tools". Black Forest Labs. 21 November 2024. Archived from the original on 26 November 2024. Retrieved 13 December 2024.
  19. Bastian, Matthias (22 November 2024). "Black Forest Labs expands FLUX.1 with four new AI tools for image editing". The Decoder. Retrieved 15 December 2024.
  20. "Get Flux". Black Forest Labs. Archived from the original on 16 November 2024. Retrieved 17 November 2024.
  21. Wiggers, Kyle (3 October 2024). "Black Forest Labs, the startup behind Grok's image generator, releases an API". TechCrunch . Archived from the original on 4 October 2024. Retrieved 17 November 2024.
  22. "flux/model_licenses/LICENSE-FLUX1-dev at main · black-forest-labs/flux". GitHub. Archived from the original on 15 September 2024. Retrieved 18 November 2024. Outputs. We claim no ownership rights in and to the Outputs. You are solely responsible for the Outputs you generate and their subsequent uses in accordance with this License. You may use Output for any purpose (including for commercial purposes), except as expressly prohibited herein. You may not use the Output to train, fine-tune or distill a model that is competitive with the FLUX.1 [dev] Model.
  23. "API Agreement - BFL Docs (Pro)". Black Forest Labs. 1 August 2024. Archived from the original on 3 October 2024. Retrieved 18 November 2024. Output. Company claims no ownership rights in and to the Outputs, and Developer and Users may use the Output for their own personal or commercial purposes, subject to any restrictions set forth herein or in the Flux Service Terms. For the avoidance of doubt, Outputs do not include any components of the Flux API or the Flux AI model, such as its weights or parameters.
  24. 田口和裕 (18 August 2024). "話題の画像生成AI「FLUX.1」をStable Diffusion用の「WebUI Forge」で動かす(高速化も試してみました) (1/6)". ASCII.jp (in Japanese). ASCII Media Works . Retrieved 21 November 2024.
  25. "Announcing FLUX1.1 [pro] and the BFL API". Black Forest Labs. 2 October 2024. Retrieved 17 November 2024.
  26. Franzen, Carl (3 October 2024). "Black Forest Labs releases Flux 1.1 Pro and an API". VentureBeat. Retrieved 17 November 2024.
  27. 1 2 Growcoot, Matt (7 November 2024). "Flux AI Introduces Raw Mode That 'Captures the Genuine Feel of Candid Photography'". PetaPixel. Retrieved 19 November 2024.
  28. Bastian, Matthias (6 November 2024). "Flux 1.1 Pro AI image model adds "amateur" RAW photo mode and 4K image generation". The Decoder. Retrieved 17 November 2024.
  29. "Introducing FLUX1.1 [pro] Ultra and Raw Modes". Black Forest Labs. 6 November 2024. Archived from the original on 12 November 2024. Retrieved 17 November 2024.
  30. 1 2 Edwards, Benj (2 August 2024). "FLUX: This new AI image generator is eerily good at creating human hands". Ars Technica . Retrieved 17 November 2024.
  31. Zeff, Maxwell (14 August 2024). "Meet Black Forest Labs, the startup powering Elon Musk's unhinged AI image generator". TechCrunch. Archived from the original on 17 November 2024. Retrieved 17 November 2024.
  32. Schwarzer, Matthias (16 August 2024). "Drogen, Bomben und Gewalt: KI-Bildgenerator von Elon Musk zeigt alles – mit deutscher Technik". RND.de (in German). Retrieved 17 November 2024.
  33. Künne, Christoph (7 August 2024). "FLUX.1: Neuer KI-Bildgenerator". DOCMA (in German). Archived from the original on 31 August 2024. Retrieved 17 November 2024.