Type of site | Platform |
---|---|
Founded | 2014 |
Owner | Nebius Group |
Founder(s) | Olga Megorskaya [1] [2] |
Industry | Artificial intelligence Information technology |
URL | toloka |
Toloka, based in Amsterdam, is a crowdsourcing and generative AI services provider. [1]
The company helps development of artificial intelligence from training to evaluation and provides generative artificial intelligence and large language model-related services. [3] [4]
Toloka was founded in 2014 by Olga Megorskaya, a member of the board of directors of Yandex, as a crowdsourcing and microtasking platform. [5] It was founded primarily for data markup to improve machine learning and search algorithms
As generative AI evolved, the platform adapted to provide expert data labeling to generational AI app producers. [6]
In 2024, the company's Russian operations were sold to Russian investors. [4] [7]
In the generative AI domain, Toloka provides services such as model fine tuning, reinforcement learning from human feedback, evaluation, adhoc datasets, which require large volumes of highly skilled experts annotation.
On Toloka, trainers are tasked with identifying the presence or absence of objects in content, as specified by algorithms. [5] [8] They also assess chatbot responses within given dialogues for relevance and engagement. [9] Additionally, translation verification tasks involve evaluating the accuracy of translations from multiple annotators. For the fine-tuning of large language models (LLMs), experts are required to generate and provide context-based prompts that can be single-turn or multi-turn, serving various domains and purposes.
In the natural language processing (NLP) domain, Toloka facilitates optical character recognition and classification, sentiment analysis, named-entity recognition, and search relevance evaluation. It also provides transcription and classification of audio data. [5]
Toloka mainly works with domain experts, such as physicists, scientists, lawyers, and software engineers, to develop specialized data for models targeting niche tasks. [1] Toloka also works with freelancers, referred to as "Tolokers," who annotate and create data for diverse applications. [1] They perform tasks such as labeling personally identifiable information for AI projects, translating content, summarizing information, and transcribing audio to text. [1]
Upon completion of each task the performer receives a reward based on the volume of images, videos, and unstructured text. [5]
In May 2019, Toloka's research team began publishing datasets for non-commercial and academic purposes to support the scientific community and attract researchers to Toloka. Such datasets are addressed to researchers in different directions like linguistics, computer vision, testing of result aggregation models, and chatbot training. [10]
Toloka research has been showcased at a range of conferences, including the Conference on Neural Information Processing Systems (NeurIPS), [10] the International Conference on Machine Learning (ICML) [11] and the International Conference on Very Large Data Bases (VLDB). [12]
In February 2024, Toloka conducted a tutorial at the AAAI Conference on Artificial Intelligence, focusing on aligning Large Language Models to Low-Resource Languages. [13]
The company participated in BigCode, a joint scientific initiative led by HuggingFace and ServiceNow, where it served as the primary data partner. [14]
In March 2024, Toloka's Russian division was criticized for helping develop the facial recognition software used by Russia to track and arrest protesters after the death of Alexei Navalny. [15] The company's Russian operations were sold in July 2024.
Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs.
A chatbot is a software application or web interface that is designed to mimic human conversation through text or voice interactions. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use deep learning and natural language processing, but simpler chatbots have existed for decades.
Natural language generation (NLG) is a software process that produces natural language output. A widely-cited survey of NLG methods describes NLG as "the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems that can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information".
Multimodal learning is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images, or video. This integration allows for a more holistic understanding of complex data, improving model performance in tasks like visual question answering, cross-modal retrieval, text-to-image generation, aesthetic ranking, and image captioning.
This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence (AI), its subdisciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.
Explainable AI (XAI), often overlapping with interpretable AI, or explainable machine learning (XML), either refers to an artificial intelligence (AI) system over which it is possible for humans to retain intellectual oversight, or refers to the methods to achieve this. The main focus is usually on the reasoning behind the decisions or predictions made by the AI which are made more understandable and transparent. XAI counters the "black box" tendency of machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.
Alice is a Russian intelligent personal assistant for Android, iOS and Windows operating systems and Yandex's own devices developed by Yandex. Alice was officially introduced on 10 October 2017. Aside from common tasks, such as internet search or weather forecasts, it can also run applications and chit-chat. Alice is also the virtual assistant used for the Yandex Station smart speaker.
In artificial intelligence, researchers teach AI systems to develop their own ways of communicating by having them work together on tasks and use symbols as parts of a new language. These languages might grow out of human languages or be built completely from scratch. When AI is used for translating between languages, it can even create a new shared language to make the process easier. Natural Language Processing (NLP) helps these systems understand and generate human-like language, making it possible for AI to interact and communicate more naturally with people.
An energy-based model (EBM) is an application of canonical ensemble formulation from statistical physics for learning from data. The approach prominently appears in generative artificial intelligence.
Emotion recognition in conversation (ERC) is a sub-field of emotion recognition, that focuses on mining human emotions from conversations or dialogues having two or more interlocutors. The datasets in this field are usually derived from social platforms that allow free and plenty of samples, often containing multimodal data. Self- and inter-personal influences play critical role in identifying some basic emotions, such as, fear, anger, joy, surprise, etc. The more fine grained the emotion labels are the harder it is to detect the correct emotion. ERC poses a number of challenges, such as, conversational-context modeling, speaker-state modeling, presence of sarcasm in conversation, emotion shift across consecutive utterances of the same interlocutor.
Meta AI is a company owned by Meta that develops artificial intelligence and augmented and artificial reality technologies. Meta AI deems itself an academic research laboratory, focused on generating knowledge for the AI community, and should not be confused with Meta's Applied Machine Learning (AML) team, which focuses on the practical applications of its products.
A text-to-image model is a machine learning model which takes an input natural language description and produces an image matching that description.
In the field of artificial intelligence (AI), a hallucination or artificial hallucination is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where hallucination typically involves false percepts. However, there is a key difference: AI hallucination is associated with erroneous responses rather than perceptual experiences.
Artificial intelligence in mental health is the application of artificial intelligence (AI), computational technologies and algorithms to supplement the understanding, diagnosis, and treatment of mental health disorders. AI is becoming a ubiquitous force in everyday life which can be seen through frequent operation of models like ChatGPT. Utilizing AI in the realm of mental health signifies a form of digital healthcare, in which, the goal is to increase accessibility in a world where mental health is becoming a growing concern. Prospective ideas involving AI in mental health include identification and diagnosis of mental disorders, explication of electronic health records, creation of personalized treatment plans, and predictive analytics for suicide prevention. Learning how to apply AI in healthcare proves to be a difficult task with many challenges, thus it remains rarely used as efforts to bridge gaps are deliberated.
A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an artificial neural network that is used in natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs had these characteristics and are sometimes referred to broadly as GPTs.
Generative artificial intelligence is a subset of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models often generate output in response to specific prompts. Generative AI systems learn the underlying patterns and structures of their training data, enabling them to create new data.
Edward Y. Chang is a computer scientist, academic, and author. He is an adjunct professor of Computer Science at Stanford University, and Visiting Chair Professor of Bioinformatics and Medical Engineering at Asia University, since 2019.
YandexGPT is a neural network of the GPT family developed by the Russian company Yandex LLC. YandexGPT can create and revise texts, generate new ideas and capture the context of the conversation with the user.
Nicholas Carlini is an American researcher affiliated with Google DeepMind who has published research in the fields of computer security and machine learning. He is known for his work on adversarial machine learning, particularly his work on the Carlini & Wagner attack in 2016. This attack was particularly useful in defeating defensive distillation, a method used to increase model robustness, and has since been effective against other defenses against adversarial input.
Artificial Intelligence engineering is a tech discipline that focuses on the design, development, and deployment of AI systems. AI engineering involves applying engineering principles and methodologies to create scalable, efficient, and reliable AI-based solutions. It merges aspects of data engineering and software engineering to create real-world applications in diverse domains such as healthcare, finance, autonomous systems, and industrial automation.