LMArena

Last updated

LMArena
Chatbot Arena main UI.png
Screenshot as of February 20, 2025, using the Gradio library
Type of site
Artificial intelligence
Country of originUnited States
OwnerLMSYS Org
Founder(s)
  • Wei-Lin Chiang
  • Anastasios N. Angelopoulos
  • Ion Stoica
URL lmarena.ai
RegistrationOptional
LaunchedMay 3, 2023;2 years ago (2023-05-03)

LMArena (formerly Chatbot Arena) is a public, web-based platform that evaluates large language models (LLMs) through anonymous, crowd-sourced pairwise comparisons. Users enter prompts for two anonymous models to respond to and vote on the model that gave the better response, in which the model's identities are then revealed. Users can also choose models to test themselves. [1] [2]

LMArena is popular within the artificial intelligence industry, with major companies supplying their large language models, such as OpenAI's GPT-4o and o1, Google DeepMind's Gemini, [3] and Anthropic's Claude, [4] and using their subsequent rankings to promote them.

The website has even been used for preview releases of upcoming models. Notably, Chinese company DeepSeek tested its prototype models in the LMArena months before its R1 model gained attention in Western media. [5] Other notable pre-release models include OpenAI's GPT-5 under the codename "summit" and Google DeepMind's Gemini 2.5 Flash Image, an image generation and editing model, under the codename "nano-banana". [6] [7]

LMArena’s evaluation methodology for large language models has been examined in academic analyses, which have identified specific limitations and suggested areas for improvement. The platform is an active contributor of the AI research ecosystem and has since implemented methodological updates in coordination with ongoing research through its policy updates. [8] [9]

References

  1. Hart, Robert (July 18, 2024). "What AI Is The Best? Chatbot Arena Relies On Millions Of Human Votes". Forbes . Retrieved April 21, 2025.
  2. Kruppa, Miles (December 5, 2024). "The UC Berkeley Project That Is the AI Industry's Obsession". The Wall Street Journal . Retrieved April 21, 2025.
  3. Nuñez, Michael (November 15, 2024). "Google Gemini unexpectedly surges to No. 1, over OpenAI, but benchmarks don't tell the whole story". VentureBeat . Retrieved April 21, 2025.
  4. Edwards, Benj (March 27, 2024). ""The king is dead"—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time". Ars Technica . Retrieved April 21, 2025.
  5. Metz, Rachel (February 18, 2025). "Before DeepSeek Blew Up, Chatbot Arena Announced Its Arrival". Bloomberg News . Retrieved April 21, 2025.
  6. Ziff, Maxwell (Aug 26, 2025). "Google Gemini's AI image model gets a 'bananas' upgrade". TechCrunch . Retrieved August 27, 2025.
  7. Langley, Hugh (Aug 19, 2025). "Is Google behind a mysterious new AI image generator? These bananas might confirm it". Business Insider . Retrieved August 27, 2025.
  8. Stokel-Walker, Chris (February 6, 2025). "Hundreds of rigged votes can skew AI model rankings on Chatbot Arena, study finds". Fast Company . Retrieved April 21, 2025.
  9. Wiggers, Kyle (September 5, 2024). "The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark". TechCrunch . Retrieved April 21, 2025.