Language creation in artificial intelligence

Last updated February 24, 2025

In Artificial Intelligence, researchers teach AI systems to develop their own ways of communicating by having them work together on tasks and use symbols as parts of a new language. These languages might grow out of human languages or be built completely from scratch. When AI is used for translating between languages, it can even create a new shared language to make the process easier. Natural Language Processing (NLP) helps these systems understand and generate human-like language, making it possible for AI to interact and communicate more naturally with people.

Evolution from English

In 2017 Facebook Artificial Intelligence Research (FAIR) trained chatbots on a corpus of English text conversations between humans playing a simple trading game involving balls, hats, and books.^[1] When programmed to experiment with English and tasked with optimizing trades, the chatbots seemed to evolve a reworked version of English to better solve their task. In some cases the exchanges seemed nonsensical:^[2]^[3]^[4]

Bob: "I can can I I everything else"
Alice: "Balls have zero to me to me to me to me to me to me to me to me to"

Facebook's Dhruv Batra said: "There was no reward to sticking to English language. Agents will drift off understandable language and invent codewords for themselves. Like if I say 'the' five times, you interpret that to mean I want five copies of this item."^[4] It's often unclear exactly why a neural network decided to produce the output that it did.^[2] Because the agents' evolved language was opaque to humans, Facebook modified the algorithm to explicitly provide an incentive to mimic humans. This modified algorithm is preferable in many contexts, even though it scores lower in effectiveness than the opaque algorithm, because clarity to humans is important in many use cases.^[1]

In The Atlantic , Adrienne LaFrance analogized the wondrous and "terrifying" evolved chatbot language to cryptophasia, the phenomenon of some twins developing a language that only the two children can understand.^[5]

Beginning of the AI language creation

In 2017 researchers at OpenAI demonstrated a multi-agent environment and learning methods that bring about emergence of a basic language ab initio without starting from a pre-existing language. The language consists of a stream of "ungrounded" (initially meaningless) abstract discrete symbols uttered by agents over time, which comes to evolve a defined vocabulary and syntactical constraints. One of the tokens might evolve to mean "blue-agent", another "red-landmark", and a third "goto", in which case an agent will say "goto red-landmark blue-agent" to ask the blue agent to go to the red landmark. In addition, when visible to one another, the agents could spontaneously learn nonverbal communication such as pointing, guiding, and pushing. The researchers speculated that the emergence of AI language might be analogous to the evolution of human communication.^[2]^[6]^[7]

Similarly, a 2017 study from Abhishek Das (programmer) and colleagues demonstrated the emergence of language and communication in a visual question-answer context, showing that a pair of chatbots can invent a communication protocol that associates ungrounded tokens with colors and shapes.^[5]^[8]

This shows the language generation and how models were trained from scratch for the AI to understand and build off for human communication and understanding.^{[ citation needed ]}

Interlingua

In 2016, Google deployed to Google Translate an AI designed to directly translate between any of 103 different natural languages, including pairs of languages that it had never before seen translated between. Researchers examined whether the machine learning algorithms were choosing to translate human-language sentences into a kind of "interlingua", and found that the AI was indeed encoding semantics within its structures. The researchers cited this as evidence that a new interlingua, evolved from the natural languages, exists within the network.^[2]^[9]

Current standpoint of language generation in AI

At the timeline of this page^{[ when? ]}, AI generation is at a slow pace. The development of Natural Language Processing (NLP) has changed the game of language generation which is currently being used throughout various generative AI chatbots such as ChatGPT, Microsoft Copilot, and Google Gemini.^{[ citation needed ]} The whole basis of language generation is through the training of computer models and algorithms which can learn from a large dataset of information. For example, there are mixed sentence models which tend to perform better as they take a larger sampling size of sentenced data rather than just words^[10]. These models continuously develop over time through the integration of more data. This allows for better communication over time as more information is being learned from which the AI can feed.

The image on the right portrays how these models are implemented to communicate with users trying to learn about information and things around the world.

Applications of generative AI

Generative AI for language use has been applicate to industries and markets across the world such as customer service, games, translation, and other technical tasks such as understanding large chunks of data. Focusing in customer service, AI chatbots such as ChatGPT and Google Gemini utilize natural language processing (NLP) to work, understand, and communicate with users live to offer responses and opinions depending on the questions asked. They not only mimic human interaction but represent themselves as their own being which allows for one-on-one interaction with users by developing language and their own way of talking. In the field of gaming, non-playable characters (NPC's) are used to better the in game experience by providing insights from the bots and other characters that are implemented in many story-mode and first person shooter (FPS) games. In addition, when using for translation, these generative AI's are able to understand thousands of other languages and translate them to help the user understand information. This is helpful and leads to a larger appeal of an audience. These applications are evolving over time and portray the various uses of language through AI in industries, markets, and daily situations.

Challenges and limitations of AI language creation

Although AI seems to be evolving rapidly, it faces many technical challenges. For example, in many cases the language used by AI is very vague, and thus confusing for the user to understand. In addition, there is a "black-box problem"^{[11]^[10]} in which there is a lack of transparency and interpretability in the language of AI outputs. In addition, as premium versions of AI chatbots come forward, they can scrape data from the web, which may lead to biases in the information they present. AI models could accidentally form opinions based on the language (words and sentences) from which they are trained. This is undesirable for a neutral-minded AI.

It is intended to overcome these limitations and challenges in future, as the models learn more language through conversations and information they receive. This will strengthen language creation and aid in the conversational skills and understanding of the AI, which can then be implemented to an acceptable standard.

Ethical risks in AI language development

Many ethical risks arise from the challenges of AI language development and conversation, such as the misuse of these chatbots to create fake information or manipulate others. In addition, there is a strong privacy concern when using chatbots. Many are concerned with the AI saving and selling information. There are many guidelines from journals such as IEEE and the EU that mention the necessary measures "to ensure privacy preservation ... involving sensitive information".^[11] That article calls for responsible AI use, especially for sensitive medical data, as explained within the article.

As these technologies advance, it is critical that ethical standards are met, in order to achieve privacy of information and to maintain a neutral standpoint in communicating with users.^[10]^[12]^[13]^[14]

Future of AI language creation

As AI technology continue to evolve, the goal is to develop refined systems in which there is a neutral, but informative standpoint from the AI. There are many types of upcoming deep learning and neural network models that will be used to dive deeper and develop multiple layers of checking which will be helpful for the NLP as it will ensure enhanced interactions with users. These integrations and stronger models will lead to a safer environment of communication to prevent biases, any irrational claims, and a better environment within games, customer service, VR/AR systems, and translation within thousands of languages. Theres a future towards medical scribing and communication with doctors during live surgeries. The future is promising for generative AI language as it will continue to grow by being trained on millions of new words, sentences, and dialect day by day through the use of intricate computational models^[14].

File:Deep Learning in Natural Language Processing.jpeg ^{[ dead link ‍]} (this image portrays the intricate modeling of NLP and how it ensures its accuracy during communication)

References

1 2 "Chatbots learn how to negotiate and drive a hard bargain". New Scientist. 14 June 2017. Retrieved 24 January 2018.
1 2 3 4 Baraniuk, Chris (1 August 2017). "'Creepy Facebook AI' story sweeps media". BBC News. Retrieved 24 January 2018.
↑ "Facebook robots shut down after they talk to each other in language only they understand". The Independent. 31 July 2017. Retrieved 24 January 2018.
1 2 Field, Matthew (1 August 2017). "Facebook shuts down robots after they invent their own language". The Telegraph. Retrieved 24 January 2018.
1 2 LaFrance, Adrienne (20 June 2017). "What an AI's Non-Human Language Actually Looks Like". The Atlantic. Retrieved 24 January 2018.
↑ "It Begins: Bots Are Learning to Chat in Their Own Language". WIRED. 16 March 2017. Retrieved 24 January 2018.
↑ Mordatch, I., & Abbeel, P. (2017). Emergence of Grounded Compositional Language in Multi-Agent Populations. arXiv : 1703.04908.
↑ Das, A., Kottur, S., Moura, J. M., Lee, S., & Batra, D. (2017). Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning. arXiv : 1703.06585.
↑ Johnson, Melvin; Schuster, Mike; Le, Quoc V.; Krikun, Maxim; Wu, Yonghui; Chen, Zhifeng; Thorat, Nikhil; Viégas, Fernanda; Wattenberg, Martin; Corrado, Greg; Hughes, Macduff; Dean, Jeffrey (2017). "Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation". Transactions of the Association for Computational Linguistics. 5: 339–351. arXiv: 1611.04558 . doi:10.1162/tacl_a_00065.
1 2 khan, Bangul; Fatima, Hajira; Qureshi, Ayatullah; Kumar, Sanjay; Hanan, Abdul; Hussain, Jawad; Abdullah, Saad (2023-02-08). "Drawbacks of Artificial Intelligence and Their Potential Solutions in the Healthcare Sector". Biomedical Materials & Devices (New York, N.Y.). 1 (2): 731–738. doi:10.1007/s44174-023-00063-2. ISSN 2731-4812. PMC 9908503 . PMID 36785697.
↑ Martinelli, Fabio (28 September 2020). "Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence". 2020 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. doi:10.1109/IJCNN48605.2020.9206801. ISBN 978-1-7281-6926-2.{{cite book}}: |journal= ignored (help)
↑ Goodman, Joshua (2001-08-09). "A Bit of Progress in Language Modeling". arXiv: cs/0108005 .
↑ Martinelli, Fabio (28 September 2020). "Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence". 2020 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. doi:10.1109/IJCNN48605.2020.9206801. ISBN 978-1-7281-6926-2.{{cite book}}: |journal= ignored (help)
↑ Rita, Mathieu; Michel, Paul; Chaabouni, Rahma; Pietquin, Olivier; Dupoux, Emmanuel; Strub, Florian (2024-03-18). "Language Evolution with Deep Learning". arXiv: 2403.11958 [cs.CL].

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[ns-1] 1 2 "Chatbots learn how to negotiate and drive a hard bargain". New Scientist. 14 June 2017. Retrieved 24 January 2018.

[bbc-2] 1 2 3 4 Baraniuk, Chris (1 August 2017). "'Creepy Facebook AI' story sweeps media". BBC News. Retrieved 24 January 2018.

[3] "Facebook robots shut down after they talk to each other in language only they understand". The Independent. 31 July 2017. Retrieved 24 January 2018.

[telegraph-4] 1 2 Field, Matthew (1 August 2017). "Facebook shuts down robots after they invent their own language". The Telegraph. Retrieved 24 January 2018.

[atlantic-5] 1 2 LaFrance, Adrienne (20 June 2017). "What an AI's Non-Human Language Actually Looks Like". The Atlantic. Retrieved 24 January 2018.

[6] "It Begins: Bots Are Learning to Chat in Their Own Language". WIRED. 16 March 2017. Retrieved 24 January 2018.

[7] Mordatch, I., & Abbeel, P. (2017). Emergence of Grounded Compositional Language in Multi-Agent Populations. arXiv : 1703.04908.

[8] Das, A., Kottur, S., Moura, J. M., Lee, S., & Batra, D. (2017). Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning. arXiv : 1703.06585.

[9] Johnson, Melvin; Schuster, Mike; Le, Quoc V.; Krikun, Maxim; Wu, Yonghui; Chen, Zhifeng; Thorat, Nikhil; Viégas, Fernanda; Wattenberg, Martin; Corrado, Greg; Hughes, Macduff; Dean, Jeffrey (2017). "Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation". Transactions of the Association for Computational Linguistics. 5: 339–351. arXiv: 1611.04558 . doi:10.1162/tacl_a_00065.

[:0-10] 1 2 khan, Bangul; Fatima, Hajira; Qureshi, Ayatullah; Kumar, Sanjay; Hanan, Abdul; Hussain, Jawad; Abdullah, Saad (2023-02-08). "Drawbacks of Artificial Intelligence and Their Potential Solutions in the Healthcare Sector". Biomedical Materials & Devices (New York, N.Y.). 1 (2): 731–738. doi:10.1007/s44174-023-00063-2. ISSN 2731-4812. PMC 9908503 . PMID 36785697.

[11] Martinelli, Fabio (28 September 2020). "Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence". 2020 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. doi:10.1109/IJCNN48605.2020.9206801. ISBN 978-1-7281-6926-2.{{cite book}}: |journal= ignored (help)

[12] Goodman, Joshua (2001-08-09). "A Bit of Progress in Language Modeling". arXiv: cs/0108005 .

[:1-13] Martinelli, Fabio (28 September 2020). "Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence". 2020 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. doi:10.1109/IJCNN48605.2020.9206801. ISBN 978-1-7281-6926-2.{{cite book}}: |journal= ignored (help)

[14] Rita, Mathieu; Michel, Paul; Chaabouni, Rahma; Pietquin, Olivier; Dupoux, Emmanuel; Strub, Florian (2024-03-18). "Language Evolution with Deep Learning". arXiv: 2403.11958 [cs.CL].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]