Hutter Prize

Last updated March 24, 2025

The Hutter Prize is a cash prize funded by Marcus Hutter which rewards data compression improvements on a specific 1 GB English text file, with the goal of encouraging research in artificial intelligence (AI).

Launched in 2006, the prize awards 5000 euros for each one percent improvement (with 500,000 euros total funding)^[1] in the compressed size of the file enwik9, which is the larger of two files used in the Large Text Compression Benchmark (LTCB);^[2] enwik9 consists of the first 10⁹ bytes of a specific version of English Wikipedia.^[3] The ongoing^[4] competition is organized by Hutter, Matt Mahoney, and Jim Bowery.^[1]

The prize was announced on August 6, 2006^[1] with a smaller text file: enwik8 consisting of 100MB. On February 21, 2020 it was expanded by a factor of 10, to enwik9 of 1GB, the prize went from 50,000 to 500,000 euros.

Goals

The goal of the Hutter Prize is to encourage research in artificial intelligence (AI). The organizers believe that text compression and AI are equivalent problems. Hutter proved that the optimal behavior of a goal-seeking agent in an unknown but computable environment is to guess at each step that the environment is probably controlled by one of the shortest programs consistent with all interaction so far.^[5] However, there is no general solution because Kolmogorov complexity is not computable. Hutter proved that in the restricted case (called AIXI ^tl) where the environment is restricted to time t and space l, a solution can be computed in time O(t2^l), which is still intractable.

The organizers further believe that compressing natural language text is a hard AI problem, equivalent to passing the Turing test. Thus, progress toward one goal represents progress toward the other. They argue that predicting which characters are most likely to occur next in a text sequence requires vast real-world knowledge. A text compressor must solve the same problem in order to assign the shortest codes to the most likely text sequences.^[6]

Models like ChatGPT are not ideal for the Hutter Prize for a variety of reasons, they might take more computational resources than those allowed by the competition (computational and storage space).

Rules

The contest is open-ended. It is open to everyone. To enter, a competitor must submit a compression program and a decompressor that decompresses to the file enwik9 (formerly enwik8 up to 2017).^[3] It is also possible to submit a compressed file instead of the compression program. The total size of the compressed file and decompressor (as a Win32 or Linux executable) must be less than or equal 99% of the previous prize winning entry. For each one percent improvement, the competitor wins 5,000 euros. The decompression program must also meet execution time and memory constraints.

Submissions must be published in order to allow independent verification. There is a 30-day waiting period for public comment before awarding a prize. In 2017, the rules were changed to require the release of the source code under a free software license, out of concern that "past submissions [which did not disclose their source code] had been useless to others and the ideas in them may be lost forever."^[4]

Winners

Author (enwik9)	Date	Program	Total size	Award
Kaido Orav and Byron Knoll	September 3, 2024	fx2-cmix	110,793,128	7,950€
Kaido Orav	February 2, 2024	fx-cmix	112,578,322	6,911€
Saurabh Kumar	July 16, 2023	fast cmix	114,156,155	5,187€
Artemiy Margaritov	May 31, 2021	starlit	115,352,938	9,000€
Alexander Rhatushnyak	July 4, 2019	phda9v1.8	116,673,681	No prize

Author (enwik8)	Date	Program	Total size	Award
Alexander Rhatushnyak	November 4, 2017	phda9	15,284,944	2,085€
Alexander Rhatushnyak	May 23, 2009	decomp8	15,949,688	1,614€
Alexander Rhatushnyak	May 14, 2007	paq8hp12	16,481,655	1,732€
Alexander Rhatushnyak	September 25, 2006	paq8hp5	17,073,018	3,416€
Matt Mahoney	March 24, 2006	paq8f	18,324,887	No prize

References

1 2 3 "500'000€ Prize for Compressing Human Knowledge". Hutter Prize. Retrieved 2023-01-08.
↑ Mahoney, Matt (2022-12-02). "Large Text Compression Benchmark" . Retrieved 2023-01-08.
1 2 Mahoney, Matt (2011-09-01). "About the Test Data" . Retrieved 2022-11-16.
1 2 "Human Knowledge Compression Contest Frequently Asked Questions & Answers". Hutter Prize. Retrieved 14 Oct 2022.
↑ Hutter, Marcus (2005). Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Texts in Theoretical Computer Science an EATCS Series. Springer. doi:10.1007/b138233. ISBN 3-540-22139-5.
↑ Mahoney, Matt (2009-07-23). "Rationale for a Large Text Compression Benchmark" . Retrieved 2022-11-16.

External links

Official website

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[website-1] 1 2 3 "500'000€ Prize for Compressing Human Knowledge". Hutter Prize. Retrieved 2023-01-08.

[Benchmark-2] Mahoney, Matt (2022-12-02). "Large Text Compression Benchmark" . Retrieved 2023-01-08.

[textdata-3] 1 2 Mahoney, Matt (2011-09-01). "About the Test Data" . Retrieved 2022-11-16.

[faq-4] 1 2 "Human Knowledge Compression Contest Frequently Asked Questions & Answers". Hutter Prize. Retrieved 14 Oct 2022.

[5] Hutter, Marcus (2005). Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Texts in Theoretical Computer Science an EATCS Series. Springer. doi:10.1007/b138233. ISBN 3-540-22139-5.

[6] Mahoney, Matt (2009-07-23). "Rationale for a Large Text Compression Benchmark" . Retrieved 2022-11-16.

[1]

[2]

[3]

[4]

[5]

[6]

v t e Standard test items
Pangram Reference implementation Sanity check Standard test image
Artificial intelligence	Chinese room Turing test
Television (test card)	SMPTE color bars EBU colour bars Indian-head test pattern EIA 1956 resolution chart BBC Test Card A, B, C, D, E, F, G, H, J, W, X ETP-1 Philips circle pattern (PM 5538, PM 5540, PM 5544, PM 5644) Snell & Wilcox SW2/SW4 Telefunken FuBK TVE test card UEIT
Computer languages	"Hello, World!" program Quine Trabb Pardo–Knuth algorithm Man or boy test Just another Perl hacker
Data compression	Calgary corpus Canterbury corpus Silesia corpus enwik8, enwik9
3D computer graphics	Cornell box Stanford bunny Stanford dragon Utah teapot List
Machine learning	ImageNet MNIST database List
Typography (filler text)	Etaoin shrdlu Hamburgevons Lorem ipsum The quick brown fox jumps over the lazy dog
Other	3DBenchy Acid 1 2 3 "Bad Apple!!" EICAR test file functions for optimization GTUBE Harvard sentences Lenna "The North Wind and the Sun" "Tom's Diner" SMPTE universal leader EURion constellation Shakedown Webdriver Torso 1951 USAF resolution test chart