LMArena

LMArena
	Screenshot as of January 26, 2026
Type of site	Artificial intelligence
Country of origin	United States
Founders	Wei-Lin Chiang; Anastasios N. Angelopoulos; Ion Stoica;
URL	lmarena.ai
Registration	Optional
Launched	May 3, 2023;2 years ago

Last updated January 28, 2026

LMArena (formerly Chatbot Arena) is a public, web-based platform that evaluates large language models (LLMs) through anonymous, crowd-sourced pairwise comparisons. Users enter prompts for two anonymous models to respond to and vote on the model that gave the better response, after which the models' identities are revealed. Users can also choose models to test themselves.^[1]^[2]

The website has even been used for preview releases of upcoming models. Notably, Chinese company DeepSeek tested its prototype models in the LMArena months before its R1 model gained attention in Western media.^[5] Other notable pre-release models include OpenAI's GPT-5 under the codename "summit" and Google DeepMind's Gemini 2.5 Flash Image (an image-generation and editing model) under the codename "Nano Banana".^[6]^[7]

LMArena’s evaluation methodology for large language models has been examined in academic analyses, which have identified specific limitations and suggested areas for improvement. The platform is an active contributor of the AI research ecosystem and has since implemented methodological updates in coordination with ongoing research through its policy updates.^[8]^[9]

In January 2026, LMArena announced the closing of a $150 million Series A funding round, bringing the company’s post-money valuation to approximately $1.7 billion. The round was led by Felicis and UC Investments (University of California), with participation from Andreessen Horowitz, The House Fund, LDVP, Kleiner Perkins, Lightspeed Venture Partners, and Laude Ventures. LMArena stated that the funding would be used to scale its AI evaluation platform, expand technical and research teams, and support product development following rapid community growth and adoption.^[10]

History

LMArena was released on April 24, 2023. During the first week, Vicuna (vicuna-13b), an LLM fine-tuned from LLaMA by LMArena (then LMSYS) was ranked at #1, with an ELO of 1169, followed by Koala (koala-13b), a dialogue model by BAIR at #2 with an ELO of 1082, and Oasst Pythia (oasst-pythia-12b), an LLM by LAION at #3 with an ELO of 1065.^[11] In the second week, GPT-4, Claude-v1, and GPT-3.5 were added to the arena alongside RWKV-4-Raven-14B.^[12]

References

↑ Hart, Robert (July 18, 2024). "What AI Is The Best? Chatbot Arena Relies On Millions Of Human Votes". Forbes . Retrieved April 21, 2025.
↑ Kruppa, Miles (December 5, 2024). "The UC Berkeley Project That Is the AI Industry's Obsession". The Wall Street Journal . Retrieved April 21, 2025.
↑ Nuñez, Michael (November 15, 2024). "Google Gemini unexpectedly surges to No. 1, over OpenAI, but benchmarks don't tell the whole story". VentureBeat . Retrieved April 21, 2025.
↑ Edwards, Benj (March 27, 2024). ""The king is dead"—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time". Ars Technica . Retrieved April 21, 2025.
↑ Metz, Rachel (February 18, 2025). "Before DeepSeek Blew Up, Chatbot Arena Announced Its Arrival". Bloomberg News . Retrieved April 21, 2025.
↑ Ziff, Maxwell (Aug 26, 2025). "Google Gemini's AI image model gets a 'bananas' upgrade". TechCrunch . Retrieved August 27, 2025.
↑ Langley, Hugh (Aug 19, 2025). "Is Google behind a mysterious new AI image generator? These bananas might confirm it". Business Insider . Retrieved August 27, 2025.
↑ Stokel-Walker, Chris (February 6, 2025). "Hundreds of rigged votes can skew AI model rankings on Chatbot Arena, study finds". Fast Company . Retrieved April 21, 2025.
↑ Wiggers, Kyle (September 5, 2024). "The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark". TechCrunch . Retrieved April 21, 2025.
↑ "Fueling the World's Most Trusted AI Evaluation Platform". LMArena Blog. 2026-01-06. Retrieved 2026-01-08.
↑ Zheng, Lianmin; Sheng, Ying; Chiang, Wei-Lin; Zhang, Hao; Gonzalez, Joseph E.; Stoica, Ion (May 3, 2023). "Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings". LMArena Blog. Retrieved January 28, 2026.
↑ Zheng, Lianmin; Sheng, Ying; Zhang, Hao; Gonzalez, Joseph E.; Stoica, Ion (May 10, 2023). "Chatbot Arena Leaderboard Updates (Week 2)". LMArena Blog. Retrieved January 28, 2026.

External links

This large language model-related article is a stub. You can help Wikipedia by adding missing information.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Hart, Robert (July 18, 2024). "What AI Is The Best? Chatbot Arena Relies On Millions Of Human Votes". Forbes . Retrieved April 21, 2025.

[2] Kruppa, Miles (December 5, 2024). "The UC Berkeley Project That Is the AI Industry's Obsession". The Wall Street Journal . Retrieved April 21, 2025.

[3] Nuñez, Michael (November 15, 2024). "Google Gemini unexpectedly surges to No. 1, over OpenAI, but benchmarks don't tell the whole story". VentureBeat . Retrieved April 21, 2025.

[4] Edwards, Benj (March 27, 2024). ""The king is dead"—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time". Ars Technica . Retrieved April 21, 2025.

[5] Metz, Rachel (February 18, 2025). "Before DeepSeek Blew Up, Chatbot Arena Announced Its Arrival". Bloomberg News . Retrieved April 21, 2025.

[6] Ziff, Maxwell (Aug 26, 2025). "Google Gemini's AI image model gets a 'bananas' upgrade". TechCrunch . Retrieved August 27, 2025.

[7] Langley, Hugh (Aug 19, 2025). "Is Google behind a mysterious new AI image generator? These bananas might confirm it". Business Insider . Retrieved August 27, 2025.

[8] Stokel-Walker, Chris (February 6, 2025). "Hundreds of rigged votes can skew AI model rankings on Chatbot Arena, study finds". Fast Company . Retrieved April 21, 2025.

[9] Wiggers, Kyle (September 5, 2024). "The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark". TechCrunch . Retrieved April 21, 2025.

[10] "Fueling the World's Most Trusted AI Evaluation Platform". LMArena Blog. 2026-01-06. Retrieved 2026-01-08.

[arenablog-11] Zheng, Lianmin; Sheng, Ying; Chiang, Wei-Lin; Zhang, Hao; Gonzalez, Joseph E.; Stoica, Ion (May 3, 2023). "Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings". LMArena Blog. Retrieved January 28, 2026.

[arenablogw2-12] Zheng, Lianmin; Sheng, Ying; Zhang, Hao; Gonzalez, Joseph E.; Stoica, Ion (May 10, 2023). "Chatbot Arena Leaderboard Updates (Week 2)". LMArena Blog. Retrieved January 28, 2026.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

v t e Generative AI chatbots
LMArena List of chatbots List of LLMs
Character.ai ChatGPT Claude Copilot DeepSeek Duck.ai Ernie Gemini GLM Grok HKChat Hunyuan Kimi Llama MiniMax Mistral Perplexity Poe Qwen Velvet You.com
Category

LMArena

Contents

History

References

External links