Toloka

Last updated
Toloka
Toloka Logo After 2021 Rebranding.jpg
Type of site
Platform
Founded2014;11 years ago (2014)
Headquarters Amsterdam, Netherlands
Owner Nebius Group
Founder Olga Megorskaya [1] [2]
Industry Artificial intelligence
Information technology
URL www.toloka.ai

Toloka is a Netherlander multinational data services company based in Amsterdam, Netherlands. It delivers human-in-the-loop annotation and evaluation work that supports the development of generative AI and large language models.

Contents

Toloka is a unit of Nasdaq-listed AI infrastructure provider Nebius Group NV [3] . In May 2025, Jeff Bezos’ investment firm, Bezos Expeditions, led a 72 million USD funding round into the company [4] . Mikhail Parakhin, CTO of Shopify, also participated in the round [5] .

Toloka’s clients include Amazon, Microsoft, Anthropic, Shopify, and poolside.

History

Toloka was founded in 2014 by Olga Megorskaya as a crowdsourcing and microtasking platform for data markup that improved machine learning and search algorithms. As generative AI advanced, it evolved into a provider of specialized data annotation and model refinement for frontier AI developers.

In 2024, the company's Russian operations were sold to Russian investors. [6] [7]

Services

Generative AI

Toloka provides end-to-end data solutions for machine learning and generative AI, combining model fine-tuning and reinforcement learning from human feedback with expert-led evaluation that tests model performance.

Agentic AI

Toloka develops data and evaluation systems that help train and assess AI agents. Its work includes RL-Gyms that let models learn through simulated tasks, along with human-in-the-loop methods that combine people and AI in shared workflows. According to the company, these virtual environments act as “sandboxed enterprises” where agents can safely test complex reasoning and operational tasks before being deployed in production systems.

Machine learning

On Toloka, trainers are tasked with identifying the presence or absence of objects in content, as specified by algorithms. [8] [9] They also assess chatbot responses within given dialogues for relevance and engagement. [10] Additionally, translation verification tasks involve evaluating the accuracy of translations from multiple annotators. For the fine-tuning of large language models (LLMs), experts are required to generate and provide context-based prompts that can be single-turn or multi-turn, serving various domains and purposes.

Natural language processing

In the natural language processing (NLP) domain, Toloka facilitates optical character recognition and classification, sentiment analysis, named-entity recognition, and search relevance evaluation. It also provides transcription and classification of audio data. [8]

Annotators

Toloka mainly works with domain experts, such as physicists, scientists, lawyers, and software engineers, to develop specialized data for models targeting niche tasks. [1] Contributors are recruited and trained through Mindrift, Toloka’s platform for sourcing skilled participants in AI projects.

The platform supports work on data annotation and model evaluation, particularly for large language models and other generative AI systems. Toloka also collaborates with freelancers, referred to as “Tolokers,” who annotate and create data for a wide range of applications, including labelling personally identifiable information, translating content, summarising text, and transcribing audio. [1]

Research

In May 2019, Toloka's research team began publishing datasets for non-commercial and academic purposes to support the scientific community and attract researchers to Toloka. Such datasets are addressed to researchers in different directions like linguistics, computer vision, testing of result aggregation models, and chatbot training. [11]

Toloka research has been showcased at a range of conferences, including the Conference on Neural Information Processing Systems (NeurIPS), [11] the International Conference on Machine Learning (ICML) [12] and the International Conference on Very Large Data Bases (VLDB). [13]

In 2024 and 2025, the company expanded its research portfolio with several open benchmarks. Beemo introduced a dataset for evaluating AI-text-detection systems on human- and model-edited text, while U-MATH and μ-MATH provided large-scale benchmarks for assessing mathematical reasoning and judgment in large language models.

Toloka researchers also co-led a tutorial at the AAAI 2024 conference on aligning large language models to low-resource languages and a COLING 2025 tutorial on hybrid human-AI data annotation. The company continues to collaborate with partners such as Hugging Face, leading universities including Penn State and MIT, through initiatives including BigCode, a joint project led by Hugging Face and ServiceNow, where Toloka served as the primary data partner.

Controversies

Enabling arrests of protesters via facial recognition software (March 2024)

In March 2024, Toloka's Russian division was criticized for helping develop the facial recognition software used by Russia to track and arrest protesters after the death of Alexei Navalny. [14] The company's Russian operations were sold in July 2024.

References

  1. 1 2 3 Shrivastava, Rashi (July 24, 2024). "The Internet Isn't Big Enough To Train AI. One Fix? Fake Data". Forbes .
  2. Sacolick, Isaac (April 8, 2024). "How to test large language models". InfoWorld .
  3. "About Nebius". nebius.com. Retrieved 2025-10-22.
  4. "Amazon's Bezos leads new investment in AI data company Toloka". Reuters. Archived from the original on 2025-05-15. Retrieved 2025-10-22.
  5. "Amsterdam-based AI data firm Toloka raises €64M round led by Jeff Bezos - Silicon Canals". 2025-05-10. Retrieved 2025-10-22.
  6. Sawers, Paul (July 21, 2024). "From Yandex's ashes comes Nebius, a 'startup' with plans to be a European AI compute leader". TechCrunch .
  7. "Yandex founder to build AI business in Europe after Russia exit". Financial Times . July 16, 2024.
  8. 1 2 Woodie, Alex (April 27, 2021). "Toloka Expands Data Labeling Service". Datanami.
  9. Bussler, Frederik (December 7, 2021). "Data labeling will fuel the AI revolution". VentureBeat .
  10. Gandharv, Kumar (April 29, 2021). "Why Are Data Labelling Firms Eyeing Indian Market?". Analytics India Magazine.
  11. 1 2 "Toloka to present new dataset at prestigious Data-Centric AI workshop launched by Andrew Ng". FE News. November 18, 2021.
  12. "Toloka". icml.cc.
  13. "VLDB 2021 Challenge". crowdscience.ai.
  14. "Dutch Yandex subsidiary helping Russia with facial recognition software". NL Times. 27 March 2024.