Developer(s) | Luma Labs |
---|---|
Initial release | June 12, 2024 |
Type | Text-to-video model |
Website | lumalabs.ai/dream-machine |
Dream Machine is a text-to-video model created by Luma Labs and launched in June 2024. It generates video output based on user prompts or still images. Dream Machine has been noted for its ability to realistically capture motion, while some critics have remarked upon the lack of transparency about its training data. Upon the program's release, users on social media created moving versions of various Internet memes.
Dream Machine is a text-to-video model created by the San Francisco-based generative artificial intelligence company Luma Labs, which had previously created Genie, a 3D model generator. It was released to the public on June 12, 2024, which was announced by the company in a post on X alongside examples of videos it created. [1] Soon after its release, users on social media posted video versions of images generated with Midjourney, as well as moving recreations of artworks such as Girl with a Pearl Earring and memes such as Doge, Picard facepalm, Success Kid, and distracted boyfriend. [2] [3] [4] [5] One video, a trailer for a fictional animated movie titled Monster Camp, was reposted by Luma Labs on their X account. Users on the platform criticized the video as stealing the aesthetic of the Monsters, Inc. franchise, also pointing out that Mike Wazowski, a character from the franchise, appears in the trailer. [6] Another video posted by director Ellenor Argyropoulos of a Pixar-style animation of a girl in ancient Egypt created with Dream Machine went viral online. [7]
As of June 2024 [update] , users can create videos with Dream Machine, which are five seconds long and 1360 × 752 pixels, by signing up with their Google account and typing in a prompt or using a still image. [8] Dream Machine alters the prompt based on its own large language model. Users can create 10 videos a day and 30 videos for free with Dream Machine. The program also offers Standard, Pro, and Premier subscription plans, which allow users to create 120, 400, and 2,000 videos, respectively. Dream Machine's website states that its videos have difficulty depicting text and motion. [9] Luma Labs has stated that it has plans to release a developer-friendly API for Dream Machine. [2] The week after its release, Luma Labs announced that it would be adding the ability to extend videos, a discovery feature, and in-video editing. [10]
Critics compared Dream Machine heavily to Sora, a text-to-video model created by OpenAI, and Kling, another text-to-video model, upon its release. [2] [11] Charles Pulliam-Moore of The Verge wrote that "bullish fans" of generative AI "were quick to call [Dream Machine] a novel innovation", but remarked upon its training data not being available to the public. [6] Mark Wilson of TechRadar also noted that it was unclear what Dream Machine's training data was, which he said "means that its potential outside of personal use or improving your GIF game could be limited", but wrote that it was "certainly a fun tool to test drive" as "a taster of the more advanced (and no doubt more expensive) AI video generators to come". [9] For Tom's Guide , Ryan Morrison called Dream Machine "one of the best prompt following and motion understanding AI video models yet" and "an impressive next step in generative AI video", but that "it is still falling short of what is needed". [11] Mashable 's Chase DiBenedetto described user-created Dream Machine videos circulating on social media as "eerily-moving" and " Harry Potter -esque". [2]
Music and artificial intelligence (AI) is the development of music software programs which use AI to generate music. As with applications in other fields, AI in music also simulates mental tasks. A prominent feature is the capability of an AI algorithm to learn based on past data, such as in computer accompaniment technology, wherein the AI is capable of listening to a human performer and performing accompaniment. Artificial intelligence also drives interactive composition technology, wherein a computer composes music in response to a live performance. There are other AI applications in music that cover not only music composition, production, and performance but also how music is marketed and consumed. Several music player programs have also been developed to use voice recognition and natural language processing technology for music voice control. Current research includes the application of AI in music composition, performance, theory and digital sound processing.
OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.
123RF, a branch of Inmagine Group, is a stock photos provider founded in 2005 which sells royalty-free images and stock photography. The company also has an expansive collection of vector graphics, icons, fonts, videos, and audio files.
Artificial intelligence art is visual artwork created or enhanced through the use of artificial intelligence (AI) programs.
Synthetic media is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especially through the use of artificial intelligence algorithms, such as for the purpose of misleading people or changing an original meaning. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology Significant attention arose towards the field of synthetic media starting in 2017 when Motherboard reported on the emergence of AI altered pornographic videos to insert the faces of famous actresses. Potential hazards of synthetic media include the spread of misinformation, further loss of trust in institutions such as media and government, the mass automation of creative and journalistic jobs and a retreat into AI-generated fantasy worlds. Synthetic media is an applied form of artificial imagination.
15.ai was a freeware artificial intelligence web application that generated text-to-speech voices from fictional characters from various media sources. Created by a pseudonymous developer under the alias 15, the project used a combination of audio synthesis algorithms, speech synthesis deep neural networks, and sentiment analysis models to generate emotive character voices faster than real-time.
DALL-E, DALL-E 2, and DALL-E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as "prompts".
Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco-based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.
Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.
A text-to-image model is a machine learning model which takes an input natural language description and produces an image matching that description.
A text-to-video model is a machine learning model that uses a natural language description as input to produce a video relevant to the input text. Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video diffusion models.
ElevenLabs is a software company that specializes in developing natural-sounding speech synthesis software using deep learning.
Generative artificial intelligence is a subset of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models learn the underlying patterns and structures of their training data and use them to produce new data based on the input, which often comes in the form of natural language prompts.
The AI boom, or AI spring, is an ongoing period of rapid progress in the field of artificial intelligence (AI) that started in the late 2010s before gaining international prominence in the early 2020s. Examples include protein folding prediction led by Google DeepMind as well as large language models and generative AI applications developed by OpenAI.
Runway AI, Inc. is an American company headquartered in New York City that specializes in generative artificial intelligence research and technologies. The company is primarily focused on creating products and models for generating videos, images, and various multimedia content. It is most notable for developing the commercial text-to-video and video generative AI models Gen-1, Gen-2 and Gen-3 Alpha.
Suno AI, or simply Suno, is a generative artificial intelligence music creation program designed to generate realistic songs that combine vocals and instrumentation, or are purely instrumental. Suno has been widely available since December 20, 2023, after the launch of a web application and a partnership with Microsoft, which included Suno as a plugin in Microsoft Copilot.
Sora is a text-to-video model developed by OpenAI. The model generates short video clips based on user prompts, and can also extend existing short videos. Sora was released publicly for ChatGPT Plus and ChatGPT Pro users in December 2024.
Udio is a generative artificial intelligence model that produces music based on simple text prompts. It can generate vocals and instrumentation. Its free beta version was released publicly on April 10, 2024. Users can pay to subscribe monthly or annually to unlock more capabilities such as audio inpainting.
MiniMax is an artificial intelligence (AI) company based in Shanghai, China. As of 2024, it has been dubbed one of China's "AI Tiger" companies by investors.
Flux is a text-to-image model developed by Black Forest Labs, based in Freiburg im Breisgau, Germany. Black Forest Labs was founded by former employees of Stability AI. As with other text-to-image models, Flux generates images from natural language descriptions, called prompts.