Developer(s) | Midjourney, Inc. |
---|---|
Initial release | July 12, 2022 (open beta) |
Stable release | V6.1 / July 31, 2024 |
Website | midjourney.com |
Part of a series on |
Artificial intelligence |
---|
Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts , similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. [1] [2] It is one of the technologies of the AI boom.
The tool is in open beta as of August 2024, which it entered on July 12, 2022. [3] The Midjourney team is led by David Holz, who co-founded Leap Motion. [4] Holz told The Register in August 2022 that the company was already profitable. [5] Users create artwork with Midjourney using Discord bot commands or the official website. [6] [7]
Midjourney, Inc. was founded in San Francisco, California, by David Holz, [8] previously a co-founder of Leap Motion. [9] The Midjourney image generation platform entered open beta on July 12, 2022. [3] On March 14, 2022, the Midjourney Discord server launched with a request to post high-quality photographs to Twitter and Reddit for systems training.[ citation needed ]
The company has been working on improving its algorithms, releasing new model versions every few months. Version 2 of their algorithm was launched in April 2022, [10] and version 3 on July 25. [11] On November 5, 2022, the alpha iteration of version 4 was released to users. [12] [13] Starting from the 4th version, MJ models were trained on Google TPUs. [14]
On March 15, 2023, the alpha iteration of version 5 was released. [15] The 5.1 model is more opinionated than version 5, applying more of its own stylization to images, while the 5.1 RAW model adds improvements while working better with more literal prompts. The version 5.2 included a new "aesthetics system", and the ability to "zoom out" by generating surroundings to an existing image. [16] On December 21, 2023, the alpha iteration of version 6 was released. The model was trained from scratch over a nine month period. Support was added for better text rendition and a more literal interpretation of prompts.
|
Midjourney is accessible through a Discord bot or by accessing their website. Users can use Midjourney through Discord either through their official Discord server, by directly messaging the bot, or by inviting the bot to a third-party server. To generate images, users use the /imagine
command and type in a prompt; [23] the bot then returns a set of four images, which users are given the option to upscale. To generate images on the website, users must first have generated at least 1,000 images through the bot.
Midjourney released a Vary (Region) feature on September 5, 2023, as part of MidJourney V5.2. This feature allows users to select a specific area of an image and apply variations only to that region while keeping the rest of the image unchanged. [24]
Midjourney introduced its web interface to make its tools more accessible, moving beyond its initial reliance on Discord. This web-based platform was launched in August 2024 alongside the release of Midjourney version 6.1. The web editor consolidates tools such as image editing, panning, zooming, region variation, and inpainting into a single interface.
The introduction of the web interface also syncs conversations between Midjourney's Discord channels and web rooms, further enhancing collaboration across both platforms. This shift was in response to growing competition from other AI image generation platforms like Adobe Firefly and Google’s Imagen, which had already launched as native web apps with integration into popular design tools. [25]
This feature lets users control how much influence an uploaded image has on the final output. By adjusting the "image weight" parameter, users can prioritize either the content of the prompt or the characteristics of the image. For instance, setting a higher weight will ensure that the generated result closely follows the image's structure and details, while a lower weight allows the text prompt to have more influence over the final output. [26]
With Style Reference, users can upload an image to use as a stylistic guide for their creation. This tool enables MidJourney to extract the style—whether it is the color palette, texture, or overall atmosphere—from the reference image and apply it to a newly generated image. The feature allows users to fine-tune the aesthetics of their creations by integrating specific artistic styles or moods. [27]
The Character Reference feature allows for a more targeted approach in defining characters. Users can upload an image of a character, and the system uses that image as a reference to generate similar characters in the output. This feature is particularly useful in maintaining consistency in appearance for characters across different images. [28]
Midjourney's founder, David Holz, told The Register that artists use Midjourney for rapid prototyping of artistic concepts to show to clients before starting work themselves. [5]
The advertising industry has been quick to embrace AI tools such as Midjourney, DALL-E, and Stable Diffusion, among others. The tools that enable advertisers to create original content and brainstorm ideas quickly are providing new opportunities, such as "custom ads created for individuals, a new way to create special effects, or even making e-commerce advertising more efficient", according to Ad Age . [29] [ promotion? ]
Architects have described using the software to generate mood boards for the early stages of projects, as an alternative to searching Google Images. [30]
The program was used by the British magazine The Economist to create the front cover for an issue in June 2022. [32] [33] In Italy, the leading newspaper Corriere della Sera published a comic created with Midjourney by writer Vanni Santoni in August 2022. [34] Charlie Warzel used Midjourney to generate two images of Alex Jones for Warzel's newsletter in The Atlantic . The use of an AI-generated cover was criticised by people who felt it was taking jobs from artists. Warzel called his action a mistake in an article about his decision to use generated images. [35] Last Week Tonight with John Oliver included a 10-minute segment on Midjourney in an episode broadcast in August 2022. [36] [37]
A Midjourney image called Théâtre D'opéra Spatial won first place in the digital art competition at the 2022 Colorado State Fair. Jason Allen, who wrote the prompt that led Midjourney to generate the image, printed the image onto a canvas and entered it into the competition using the name Jason M. Allen via Midjourney. Other digital artists were upset by the news. [38] Allen was unapologetic, insisting that he followed the competition's rules. The two category judges were unaware that Midjourney used AI to generate images, although they later said that had they known this, they would have awarded Allen the top prize anyway. [39]
In December 2022, Midjourney was used to generate the images for an AI-generated children's book that was created over a weekend. Titled Alice and Sparkle , the book features a young girl who builds a robot that becomes self-aware. The creator, Ammaar Reeshi, used Midjourney to generate a large number of images, from which he chose 13 for the book. [40] Both the product and process drew criticism. One artist wrote that "the main problem... is that it was trained off of artists' work. It's our creations, our distinct styles that we created, that we did not consent to being used." [31]
In 2023, the realism of AI-based text-to-image generators, such as Midjourney, DALL-E, or Stable Diffusion, [41] [42] reached such a high level that it led to a significant wave of viral AI-generated photos. Widespread attention was gained by a Midjourney-generated photo of Pope Francis wearing a white puffer coat, [43] [44] the fictional arrest of Donald Trump, [45] and a hoax of an attack on the Pentagon, [46] as well as the usage in professional creative arts. [47] [48]
Research has suggested that the images Midjourney generates can be biased. For example, even neutral prompts in one study returned unequal results on the aspects of gender, skin color, and location. [49] A study by researchers at the nonprofit group Center for Countering Digital Hate found the tool to be easy to use to generate racist and conspiratorial images. [50] In October 2023, Rest of World reported that Midjourney tends to generate images based on national stereotypes. [51]
In 2024, a Frontiers journal published a paper [53] which contained gibberish figures generated with Midjourney, one of which was a diagram of a rat with large testicles and a large penis towering over himself. The paper was retracted a day after the images went viral on Twitter. [52]
Prior to May 2023, Midjourney implemented a moderation mechanism predicated on a banned word system. This method prohibited the use of language associated with explicit content, such as sexual or pornographic themes, as well as extreme violence. Moreover, the system also banned certain individual words, including those of religious and political figures, such as Allah or Xi Jinping. This practice occasionally stirred controversy due to perceived instances of censorship within the Midjourney platform. [54] [55]
Commencing in May 2023, with subsequent updates post version 5, Midjourney transitioned to an AI-powered content moderation system. This advanced mechanism allowed for a more nuanced interpretation of user prompts by analyzing them in their entirety. It consequently facilitated the context-dependent use of words that had previously been prohibited. For instance, users can now prompt the AI to generate a portrait of Xi Jinping. At the same time, the system will prevent the generation of contentious images, such as depictions of global leaders, including Xi Jinping, in situations of arrest. [56]
On January 13, 2023, three artists—Sarah Andersen, Kelly McKernan, and Karla Ortiz—filed a copyright infringement lawsuit against Stability AI, Midjourney, and DeviantArt, claiming that these companies have infringed on the rights of millions of artists by training AI tools on five billion images scraped from the web, without the consent of the original artists. [57]
The legal action was initiated in San Francisco by attorney Matthew Butterick in partnership with the Joseph Saveri Law Firm, the same team challenging Microsoft, GitHub, and OpenAI (developers of ChatGPT and DALL-E) in court. In July 2023, U.S. District Judge William Orrick inclined to dismiss most of the lawsuit filed by Andersen, McKernan, and Ortiz but allowed them to file a new complaint. [58] Another lawsuit was filed in November 2023 against Midjourney, Stability AI, DeviantArt and Runway AI for using the copyrighted work of over 4,700 artists. [59]
Digital art refers to any artistic work or practice that uses digital technology as part of the creative or presentation process. It can also refer to computational art that uses and engages with digital media. Since the 1960s, various names have been used to describe digital art, including computer art, electronic art, multimedia art, and new media art.
Wacom Co., Ltd. is a Japanese company headquartered in Kazo, Saitama, Japan, that specializes in manufacturing graphics tablets and related products. As of 2012 Wacom generated sales of approximately 40.7 billion yen with 785 employees. The company's shares are listed on the Tokyo Stock Exchange.
OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.
123RF, a branch of Inmagine Group, is a stock photos provider founded in 2005 which sells royalty-free images and stock photography. The company also has an expansive collection of vector graphics, icons, fonts, videos, and audio files.
Deepfakes are images, videos, or audio which are edited or generated using artificial intelligence tools, and which may depict real or non-existent people. They are a type of synthetic media and modern form of a Media prank.
Artificial intelligence art is visual artwork created or enhanced through the use of artificial intelligence (AI) programs.
Synthetic media is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especially through the use of artificial intelligence algorithms, such as for the purpose of misleading people or changing an original meaning. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology Significant attention arose towards the field of synthetic media starting in 2017 when Motherboard reported on the emergence of AI altered pornographic videos to insert the faces of famous actresses. Potential hazards of synthetic media include the spread of misinformation, further loss of trust in institutions such as media and government, the mass automation of creative and journalistic jobs and a retreat into AI-generated fantasy worlds. Synthetic media is an applied form of artificial imagination.
DALL-E, DALL-E 2, and DALL-E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as "prompts".
Artbreeder, formerly known as Ganbreeder, is a collaborative, machine learning-based art website. Using the models StyleGAN and BigGAN, the website allows users to generate and modify images of faces, landscapes, and paintings, among other categories.
Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence (AI) model. A prompt is natural language text describing the task that an AI should perform. A prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?", a command such as "write a poem in the style of Edgar Allan Poe about leaves falling", or a longer statement including context, instructions, and conversation history.
Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.
A text-to-image model is a machine learning model which takes an input natural language description and produces an image matching that description.
NovelAI is an online cloud-based, SaaS model, and a paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature being implemented later on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware.
ChatGPT is a generative artificial intelligence (AI) chatbot developed by OpenAI and launched in 2022. It is based on the GPT-4o large language model (LLM). ChatGPT can generate human-like conversational responses, and enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. It is credited with accelerating the AI boom, which has led to ongoing rapid investment in and public attention to the field of artificial intelligence. Some observers have raised concern about the potential of ChatGPT and similar programs to displace human intelligence, enable plagiarism, or fuel misinformation.
Generative artificial intelligence is a subset of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models learn the underlying patterns and structures of their training data and use them to produce new data based on the input, which often comes in the form of natural language prompts.
The AI boom, or AI spring, is an ongoing period of rapid progress in the field of artificial intelligence (AI) that started in the late 2010s before gaining international prominence in the early 2020s. Examples include protein folding prediction led by Google DeepMind as well as large language models and generative AI applications developed by OpenAI.
Microsoft Copilot is a generative artificial intelligence chatbot developed by Microsoft. Based on the GPT-4 series of large language models, it was launched in 2023 as Microsoft's primary replacement for the discontinued Cortana.
Théâtre D'opéra Spatial is an image created by Jason Michael Allen with the generative artificial intelligence platform Midjourney. The image won the 2022 Colorado State Fair's annual fine art competition in the digital art category on August 29, becoming one of the first images made using artificial intelligence (AI) to win such a prize.
In the 2020s, the rapid advancement of deep learning-based generative artificial intelligence models raised questions about whether copyright infringement occurs when such are trained or used. This includes text-to-image models such as Stable Diffusion and large language models such as ChatGPT. As of 2023, there were several pending U.S. lawsuits challenging the use of copyrighted data to train AI models, with defendants arguing that this falls under fair use.
Flux is a text-to-image model developed by Black Forest Labs, based in Freiburg im Breisgau, Germany. Black Forest Labs was founded by Robin Rombach, Andreas Blattmann, and Patrick Esser. As with other text-to-image models, Flux generates images from natural language descriptions, called prompts.