| | |
Type of site | Platform |
|---|---|
| Founded | 2014 |
| Headquarters | Amsterdam, Netherlands |
| Owner | Nebius Group |
| Founder | Olga Megorskaya [1] [2] |
| Industry | Artificial intelligence Information technology |
| URL | www |
Toloka is a Netherlander multinational data services company based in Amsterdam, Netherlands. It delivers human-in-the-loop annotation and evaluation work that supports the development of generative AI and large language models.
Toloka is a unit of Nasdaq-listed AI infrastructure provider Nebius Group NV. [3] In May 2025, Jeff Bezos’ investment firm, Bezos Expeditions, led a 72 million USD funding round into the company. [4] Mikhail Parakhin, CTO of Shopify, also participated in the round. [5]
Toloka’s clients include Amazon, Microsoft, Anthropic, Shopify, and poolside.
Toloka was founded in 2014 by Olga Megorskaya as a crowdsourcing and microtasking platform for data markup that improved machine learning and search algorithms. As generative AI advanced, it evolved into a provider of specialized data annotation and model refinement for frontier AI developers.
In 2024, the company's Russian operations were sold to Russian investors. [6] [7]
Toloka develops data and evaluation systems that help train and assess AI agents. Its work includes RL-Gyms that let models learn through simulated tasks, along with human-in-the-loop methods that combine people and AI in shared workflows.
On Toloka, trainers are tasked with identifying the presence or absence of objects in content, as specified by algorithms. [8] [9] They also assess chatbot responses within given dialogues for relevance and engagement. [10] Additionally, translation verification tasks involve evaluating the accuracy of translations from multiple annotators. For the fine-tuning of large language models (LLMs), experts are required to generate and provide context-based prompts that can be single-turn or multi-turn, serving various domains and purposes.
In the natural language processing (NLP) domain, Toloka facilitates optical character recognition and classification, sentiment analysis, named-entity recognition, and search relevance evaluation. It also provides transcription and classification of audio data. [8]
Toloka mainly works with domain experts, such as physicists, scientists, lawyers, and software engineers, to develop specialized data for models targeting niche tasks. [1] Contributors are recruited and trained through Mindrift, Toloka’s platform for sourcing skilled participants in AI projects.
The platform supports work on data annotation and model evaluation, particularly for large language models and other generative AI systems. Toloka also collaborates with freelancers, referred to as “Tolokers,” who annotate and create data for a wide range of applications, including labelling personally identifiable information, translating content, summarising text, and transcribing audio. [1]
In May 2019, Toloka's research team began publishing datasets for non-commercial and academic purposes to support the scientific community and attract researchers to Toloka. Such datasets are addressed to researchers in different directions like linguistics, computer vision, testing of result aggregation models, and chatbot training. [11]
Toloka research has been showcased at a range of conferences, including the Conference on Neural Information Processing Systems (NeurIPS), [11] the International Conference on Machine Learning (ICML) [12] and the International Conference on Very Large Data Bases (VLDB). [13]
In March 2024, Toloka's Russian division was criticized for helping develop the facial recognition software used by Russia to track and arrest protesters after the death of Alexei Navalny. [14] The company's Russian operations were sold in July 2024.