NECA Project

Last updated

The NECA Project (Net Environment for Embodied Emotional Conversational Agents) was a research project that focused on multimodal communication with animated agents in a virtual world. NECA was funded by the European Commission from 1998–2002 and the research results were published up to 2005. [1] [2] [3]

Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data. For example, a multimodal question answering system employs multiple modalities at both question (input) and answer (output) level.

Computer animation art of creating moving images using computers

Computer animation is the process used for digitally generating animated images. The more general term computer-generated imagery (CGI) encompasses both static scenes and dynamic images, while computer animation only refers to the moving images. Modern computer animation usually uses 3D computer graphics, although 2D computer graphics are still used for stylistic, low bandwidth, and faster real-time renderings. Sometimes, the target of the animation is the computer itself, but sometimes film as well.

A virtual world is a computer-based simulated environment which may be populated by many users who can create a personal avatar, and simultaneously and independently explore the virtual world, participate in its activities and communicate with others. These avatars can be textual, two or three-dimensional graphical representations, or live video avatars with auditory and touch sensations. In general, virtual worlds allow for multiple users but single player computer games, such as Skyrim, can also be considered a type of virtual world.

Contents

The project focused on communication between animated agents in a virtual world, using characters that exhibit realistic personality traits and natural looking behavior that reflects the emotional features of conversations. The project goal was to combine different research efforts such as situation-based natural language and speech generation, representation of non-verbal expression, and the modeling of emotions and personality. [1] [4] [5]

Affect displays are the verbal and non-verbal displays of emotion. These displays can be through facial expressions, gestures and body language, volume and tone of voice, laughing, crying, etc. Affect displays can be altered or faked so one may appear one way, when they feel another. Affect can be conscious or non-conscious and can be discreet or obvious. The display of positive emotions, such as smiling, laughing, etc., is termed "positive affect", while the displays of more negative emotions, such as crying and tense gestures, is respectively termed "negative affect".

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.

Goals and milestones

The underlying research direction of the NECA Project was the development of a computing platform in which animated characters within a virtual world could be capable of realistic behavior. For character interactions to look natural, various factors such the proxemics of the distance between their bodies as they interact, to the kinesics of body language at the individual level, and the level of eye contact between individuals, as well as the paralinguistics of tone and intonation of sentences had to be considered.

Proxemics is the study of human use of space and the effects that population density has on behaviour, communication, and social interaction.

Kinesics is the interpretation of body motion communication such as facial expressions and gestures, nonverbal behavior related to movement of any part of the body or the body as a whole. The equivalent popular culture term is body language, a term Ray Birdwhistell, considered the founder of this area of study, neither used nor liked.

Body language is a type of nonverbal communication in which physical behaviors, as opposed to words, are used to express or convey information. Such behavior includes facial expressions, body posture, gestures, eye movement, touch and the use of space. Body language exists in both animals and humans, but this article focuses on interpretations of human body language. It is also known as kinesics.

Based on that the research there were three main goals for NECA. [2] The first goal was the general development of a platform that allowed the simulation and interaction of conversational characters.

The second goal was the design of a multi-user web-application called the Socialite, that allowed social "face to face" emotion-based interactions between animated agents on the internet. [1] [3] The Socialite user could select a set of avatars to interact with and after learning about the user's personal preferences, the avatars helped the user navigate a virtual world and get in touch with other agents and users. [1]

Avatar (computing) graphical representation of the user or the users alter ego or character

In computing, an avatar is the graphical representation of the user or the user's alter ego or character. An icon or figure representing a particular person in a video game, Internet forum, etc. It may take either a three-dimensional form, as in games or virtual worlds, or a two-dimensional form as an icon in Internet forums and other online communities. Avatar images have also been referred to as "picons" in the past, though the usage of this term is uncommon now. It can also refer to a text construct found on early systems such as MUDs. The term "avatar" can also refer to the personality connected with the screen name, or handle, of an Internet user.

The third component was eShowRoom as an e-commerce platform demonstration that allowed for the display of products in the commercial domain. In the eShowRoom application, two or three virtual agents could be seen discussing various features of a product among themselves in a natural setting. [5]

E-commerce is the activity of buying or selling of products on online services or over the Internet. Electronic commerce draws on technologies such as mobile commerce, electronic funds transfer, supply chain management, Internet marketing, online transaction processing, electronic data interchange (EDI), inventory management systems, and automated data collection systems.

Examples of NECA research

One of NECA's designs was the Rich Representation Language, specifically designed to facilitate the interaction of two or more animated agents. [6] [7] RRL influenced the design of other languages such as the Player Markup Language which extended parts of the design of RRL. [8]

The design of RRL aimed to automatically generate much of the facial animation as well as the skeletal animation based on the content of the conversations. Due to the interdependence of nonverbal communication components such as facial features on the spoken words, no animation is possible in the language without considering the context of the scene in which the animation takes place - e.g. anger versus joy. [9]

See also

Sources

  1. 1 2 3 4 Brigitte Krenn, et al. Lifelike Agents for the Internet in "Agent culture: human-agent interaction in a multicultural world" edited by Sabine Payr, Robert Trappl 2004 ISBN   0-8058-4808-8 pages 197-228
  2. 1 2 NECA Project description
  3. 1 2 Brigitte Krenn and Barbara Neumayr Incorporating animated conversation into a web-based Community Building Tool in "Intelligent virtual agents: 4th International Workshop, IVA 2003", edited by Thomas Rist ISBN   3-540-20003-7 pages 18-22
  4. Multimodal Intelligent Information Presentation by Oliviero Stock 2005 ISBN   1-4020-3049-5 page 64
  5. 1 2 Patrick Gebhard, et al. Coloring Multi-character Conversations through the Expression of Emotions in "Affective dialogue systems" edited by Elisabeth André 2004 ISBN pages 125-139
  6. 'Intelligent virtual agents: 6th international working conference by Jonathan Matthew Gratch 2006 ISBN   3-540-37593-7 page 221
  7. Data-driven 3D facial animation by Zhigang Deng, Ulrich Neumann 2007 ISBN   1-84628-906-8 page 54
  8. Technologies for interactive digital storytelling and entertainment by Stefan Göbel 2004 ISBN   3-540-22283-9 page 83
  9. Interactive storytelling: First Joint International Conference, edited by Ulrike Spierling, Nicolas Szilas 2008 ISBN   3-540-89424-1 page 93

Related Research Articles

The ELIZA effect, in computer science, is the tendency to unconsciously assume computer behaviors are analogous to human behaviors, that is anthropomorphisation.

A digital pet is a type of artificial human companion. They are usually kept for companionship or enjoyment. People may keep a digital pet in lieu of a real pet. Cyberpet and Tamagotchi were some of the first popular digital pets.

Artificial human companions may be any kind of hardware or software creation designed to give companionship to a person. These can include digital pets, such as the popular Tamagotchi, or robots, such as the Sony AIBO. Virtual companions can be used as a form of entertainment, or they can be medical or functional, to assist the elderly in maintaining an acceptable standard of life.

Interactive storytelling is a form of digital entertainment in which the storyline is not predetermined. The author creates the setting, characters, and situation which the narrative must address, but the user experiences a unique story based on their interactions with the story world. The architecture of an interactive storytelling program includes a drama manager, user model, and agent model to control, respectively, aspects of narrative production, player uniqueness, and character knowledge and behavior. Together, these systems generate characters that act "human," alter the world in real-time reactions to the player, and ensure that new narrative events unfold comprehensibly.

In artificial intelligence, an embodied agent, also sometimes referred to as an interface agent, is an intelligent agent that interacts with the environment through a physical body within that environment. Agents that are represented graphically with a body, for example a human or a cartoon animal, are also called embodied agents, although they have only virtual, not physical, embodiment. A branch of artificial intelligence focuses on empowering such agents to interact autonomously with human beings and the environment. Mobile robots are one example of physically embodied agents; Ananova and Microsoft Agent are examples of graphically embodied agents. Embodied conversational agents are embodied agents that are capable of engaging in conversation with one another and with humans employing the same verbal and nonverbal means that humans do.

Justine Cassell is an American professor and researcher interested in human-human conversation, human-computer interaction, and storytelling. Since August 2010 she has been on the faculty of the Carnegie Mellon Human Computer Interaction Institute (HCII).

AutoTutor is an intelligent tutoring system developed by researchers at the Institute for Intelligent Systems at the University of Memphis, including Arthur C. Graesser that helps students learn Newtonian physics, computer literacy, and critical thinking topics through tutorial dialogue in natural language. AutoTutor differs from other popular intelligent tutoring systems such as the Cognitive Tutor, in that it focuses on natural language dialog. This means that the tutoring occurs in the form of an ongoing conversation, with human input presented using either voice or free text input. To handle this input, AutoTutor uses computational linguistics algorithms including latent semantic analysis, regular expression matching, and speech act classifiers. These complementary techniques focus on the general meaning of the input, precise phrasing or keywords, and functional purpose of the expression, respectively. In addition to natural language input, AutoTutor can also accept ad-hoc events such as mouse clicks, learner emotions inferred from emotion sensors, and estimates of prior knowledge from a student model. Based on these inputs, the computer tutor determine when to reply and what speech acts to reply with. This process is driven by a "script" that includes a set of dialog-specific production rules.

Human–computer interaction (HCI) researches the design and use of computer technology, focused on the interfaces between people (users) and computers. Researchers in the field of HCI both observe the ways in which humans interact with computers and design technologies that let humans interact with computers in novel ways. As a field of research, human–computer interaction is situated at the intersection of computer science, behavioral sciences, design, media studies, and several other fields of study. The term was popularized by Stuart K. Card, Allen Newell, and Thomas P. Moran in their seminal 1983 book, The Psychology of Human–Computer Interaction, although the authors first used the term in 1980 and the first known use was in 1975. The term connotes that, unlike other tools with only limited uses, a computer has many uses and this takes place as an open-ended dialog between the user and the computer. The notion of dialog likens human–computer interaction to human-to-human interaction, an analogy which is crucial to theoretical considerations in the field.

iClone is a real-time 3D animation and rendering software program that enables users to make 3D animated films. Real-time playback is enabled by using a 3D videogame engine for instant on-screen rendering.

Affective haptics is the emerging area of research which focuses on the study and design of devices and systems that can elicit, enhance, or influence the emotional state of a human by means of sense of touch. The research field is originated with the Dzmitry Tsetserukou and Alena Neviarouskaya papers on affective haptics and real-time communication system with rich emotional and haptic channels. Driven by the motivation to enhance social interactivity and emotionally immersive experience of users of real-time messaging, virtual, augmented realities, the idea of reinforcing (intensifying) own feelings and reproducing (simulating) the emotions felt by the partner was proposed. Four basic haptic (tactile) channels governing our emotions can be distinguished: (1) physiological changes, (2) physical stimulation, (3) social touch, (4) emotional haptic design.

Animation database

An animation database is a database which stores fragments of animations or human movements and which can be accessed, analyzed and queried to develop and assemble new animations. Given that the manual generation of a large amount of animation can be time consuming and expensive, an animation database can assist users in building animations by using existing components, and sharing animation fragments.

The Rich Representation Language, often abbreviated as RRL, is a computer animation language specifically designed to facilitate the interaction of two or more animated characters. The research effort was funded by the European Commission as part of the NECA Project. The NECA framework within which RRL was developed was not oriented towards the animation of movies, but the creation of intelligent "virtual characters" that interact within a virtual world and hold conversations with emotional content, coupled with suitable facial expressions.

The PAD emotional state model is a psychological model developed by Albert Mehrabian and James A. Russell to describe and measure emotional states. PAD uses three numerical dimensions, Pleasure, Arousal and Dominance to represent all emotions. Its initial use was in a theory of environmental psychology, the core idea being that physical environments influence people through their emotional impact. It was subsequently used by Peter Lang and colleagues to propose a physiological theory of emotion. It was also used by James A. Russell to develop a theory of emotional episodes. The PA part of PAD was developed into a circumplex model of emotion experience, and those two dimensions were termed "core affect". The D part of PAD was re-conceptualized as part of the appraisal process in an emotional episode. A more fully developed version of this approach is termed the psychological construction theory of emotion.

Computer-generated imagery application of computer graphics to create or contribute to images

Computer-generated imagery (CGI) is the application of computer graphics to create or contribute to images in art, printed media, video games, films, television programs, shorts, commercials, videos, and simulators. The visual scenes may be dynamic or static and may be two-dimensional (2D), though the term "CGI" is most commonly used to refer to 3D computer graphics used for creating scenes or special effects in films and television. Additionally, the use of 2D CGI is often mistakenly referred to as "traditional animation", most often in the case when dedicated animation software such as Adobe Flash or Toon Boom is not used or the CGI is hand drawn using a tablet and mouse.

Pedagogical agent

A pedagogical agent is a concept borrowed from computer science and artificial intelligence and applied to education, usually as part of an intelligent tutoring system (ITS). It is a simulated human-like interface between the learner and the content, in an educational environment. A pedagogical agent is designed to model the type of interactions between a student and another person. Mabanza and de Wet define it as "as a character enacted by a computer that interacts with the user in a socially engaging manner". A pedagogical agent can be assigned different roles in the learning environment, such as tutor or co-learner, depending on the desired purpose of the agent. "A tutor agent plays the role of a teacher, while a co-learner agent plays the role of a learning companion".

Nadine Social Robot

Nadine is a female humanoid social robot that is modelled on Professor Nadia Magnenat Thalmann. The robot has a strong human-likeness with a natural-looking skin and hair and realistic hands. Nadine is a socially intelligent robot which returns a greeting, makes eye contact, and can remember all the conversations had with it. It is able to answer questions autonomously in several languages, simulate emotions both in gestures and facially, depending on the content of the interaction with the user. Nadine can recognise persons it has previously seen, and engage in flowing conversation. Nadine has been programmed with a "personality", in that its demeanour can change according to what is said to it. Nadine has a total of 27 degrees of freedom for facial expressions and upper body movements. With persons it has previously encountered, it remembers facts and events related to each person. It can assist people with special needs by reading stories, showing images, put on Skype sessions, send emails, and communicate with other members of the family. It can play the role of a receptionist in an office or be dedicated to be a personal coach.

Joëlle Coutaz is a French computer scientist, specializing in human-computer interaction (HCI). Her career includes research in the fields of operating systems and HCI, as well as being a professor at the University of Grenoble. Coutaz is considered a pioneer in HCI in France, and in 2007, she was awarded membership to SIGCHI. She was also involved in organizing CHI conferences and was a member on the editorial board of ACM Transactions on Computer-Human Interaction. She has authored over 130 publications, including two books, in the domain of human-computer interaction.

Sharon Oviatt is an internationally recognized computer scientist, professor and researcher known for her work in the field of human–computer interaction on human-centered multimodal interface design and evaluation.