Face detection

Last updated
Automatic face detection with OpenCV Face detection.jpg
Automatic face detection with OpenCV

Face detection is a computer technology being used in a variety of applications that identifies human faces in digital images. [1] Face detection also refers to the psychological process by which humans locate and attend to faces in a visual scene. [2]

Contents

Face detection can be regarded as a specific case of object-class detection. In object-class detection, the task is to find the locations and sizes of all objects in an image that belong to a given class. Examples include upper torsos, pedestrians, and cars. Face detection simply answers two question, 1. are there any human faces in the collected images or video? 2. where is the face located?

Face-detection algorithms focus on the detection of frontal human faces. It is analogous to image detection in which the image of a person is matched bit by bit. Image matches with the image stores in database. Any facial feature changes in the database will invalidate the matching process. [3]

A reliable face-detection approach based on the genetic algorithm and the eigen-face [4] technique:

Firstly, the possible human eye regions are detected by testing all the valley regions in the gray-level image. Then the genetic algorithm is used to generate all the possible face regions which include the eyebrows, the iris, the nostril and the mouth corners. [3]

Each possible face candidate is normalized to reduce both the lighting effect, which is caused by uneven illumination; and the shirring effect, which is due to head movement. The fitness value of each candidate is measured based on its projection on the eigen-faces. After a number of iterations, all the face candidates with a high fitness value are selected for further verification. At this stage, the face symmetry is measured and the existence of the different facial features is verified for each face candidate.[ citation needed ]

Applications

Facial motion capture

Facial recognition

Face detection is used in biometrics, often as a part of (or together with) a facial recognition system. It is also used in video surveillance, human computer interface and image database management.

Photography

Some recent digital cameras use face detection for autofocus. [5] Face detection is also useful for selecting regions of interest in photo slideshows that use a pan-and-scale Ken Burns effect.

Modern appliances also use smile detection to take a photograph at an appropriate time.

Marketing

Face detection is gaining the interest of marketers. A webcam can be integrated into a television and detect any face that walks by. The system then calculates the race, gender, and age range of the face. Once the information is collected, a series of advertisements can be played that is specific toward the detected race/gender/age.

An example of such a system is OptimEyes and is integrated into the Amscreen digital signage system. [6] [7]

Emotional Inference

Face detection can be used as part of a software implementation of emotional inference. Emotional inference can be used to help people with autism understand the feelings of people around them. [8]

AI-assisted emotion detection in faces has gained significant traction in recent years, employing various models to interpret human emotional states. OpenAI's CLIP model [9] exemplifies the use of deep learning to associate images and text, facilitating nuanced understanding of emotional content. For instance, combined with a network psychometrics approach, the model has been used to analyze political speeches based on changes in politicians' facial expressions. [10] Research generally highlights the effectiveness of these technologies, noting that AI can analyze facial expressions (with or without vocal intonations and written language) to infer emotions, although challenges remain in accurately distinguishing between closely related emotions and understanding cultural nuances. [11]

Lip Reading

Face detection is essential for the process of language inference from visual cues. Automated lip reading has applications to help computers determine who is speaking which is needed when security is important.

See also

Related Research Articles

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images,and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

Affective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science. While some core ideas in the field may be traced as far back as to early philosophical inquiries into emotion, the more modern branch of computer science originated with Rosalind Picard's 1995 paper on affective computing and her book Affective Computing published by MIT Press. One of the motivations for the research is the ability to give machines emotional intelligence, including to simulate empathy. The machine should interpret the emotional state of humans and adapt its behavior to them, giving an appropriate response to those emotions.

A facial expression is one or more motions or positions of the muscles beneath the skin of the face. These movements convey the emotional state of an individual to observers. Facial expressions are a form of nonverbal communication. They are a primary means of conveying social information between humans, but they also occur in most other mammals and some other animal species.

<span class="mw-page-title-main">Pathognomy</span> Study of expressed emotions

Pathognomy is "a 'semiotik' of the transient features of someone's face or body, be it voluntary or involuntary". Examples of this can be laughter and winking to the involuntary such as sneezing or coughing. By studying the features or expressions, there is then an attempt to infer the mental state and emotion felt by the individual.

<span class="mw-page-title-main">Face perception</span> Cognitive process of visually interpreting the human face

Facial perception is an individual's understanding and interpretation of the face. Here, perception implies the presence of consciousness and hence excludes automated facial recognition systems. Although facial recognition is found in other species, this article focuses on facial perception in humans.

<span class="mw-page-title-main">Microexpression</span> Innate result of voluntary, involuntary, and conflicting emotional responses

A microexpression is a facial expression that only lasts for a short moment. It is the innate result of a voluntary and an involuntary emotional response occurring simultaneously and conflicting with one another, and occurs when the amygdala responds appropriately to the stimuli that the individual experiences and the individual wishes to conceal this specific emotion. This results in the individual very briefly displaying their true emotions followed by a false emotional reaction.

<span class="mw-page-title-main">Facial recognition system</span> Technology capable of matching a face from an image against a database of faces

A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image.

Facial motion capture is the process of electronically converting the movements of a person's face into a digital database using cameras or laser scanners. This database may then be used to produce computer graphics (CG), computer animation for movies, games, or real-time avatars. Because the motion of CG characters is derived from the movements of real people, it results in a more realistic and nuanced computer character animation than if the animation were created manually.

<span class="mw-page-title-main">Rosalind Picard</span> American computer scientist

Rosalind Wright Picard is an American scholar and inventor who is Professor of Media Arts and Sciences at MIT, founder and director of the Affective Computing Research Group at the MIT Media Lab, and co-founder of the startups Affectiva and Empatica.

Emotional responsivity is the ability to acknowledge an affective stimuli by exhibiting emotion. It is a sharp change of emotion according to a person's emotional state. Increased emotional responsivity refers to demonstrating more response to a stimulus. Reduced emotional responsivity refers to demonstrating less response to a stimulus. Any response exhibited after exposure to the stimulus, whether it is appropriate or not, would be considered as an emotional response. Although emotional responsivity applies to nonclinical populations, it is more typically associated with individuals with schizophrenia and autism.

Emotions in virtual communication are expressed and understood in a variety of different ways from those in face-to-face interactions. Virtual communication continues to evolve as technological advances emerge that give way to new possibilities in computer-mediated communication (CMC). The lack of typical auditory and visual cues associated with human emotion gives rise to alternative forms of emotional expression that are cohesive with many different virtual environments. Some environments provide only space for text based communication, where emotions can only be expressed using words. More newly developed forms of expression provide users the opportunity to portray their emotions using images.

Emotion perception refers to the capacities and abilities of recognizing and identifying emotions in others, in addition to biological and physiological processes involved. Emotions are typically viewed as having three components: subjective experience, physical changes, and cognitive appraisal; emotion perception is the ability to make accurate decisions about another's subjective experience by interpreting their physical changes through sensory systems responsible for converting these observed changes into mental representations. The ability to perceive emotion is believed to be both innate and subject to environmental influence and is also a critical component in social interactions. How emotion is experienced and interpreted depends on how it is perceived. Likewise, how emotion is perceived is dependent on past experiences and interpretations. Emotion can be accurately perceived in humans. Emotions can be perceived visually, audibly, through smell and also through bodily sensations and this process is believed to be different from the perception of non-emotional material.

Affectiva is a software development company that created artificial intelligence. In 2021, the company was acquired by SmartEye. The company claimed its AI understood human emotions, cognitive states, activities and the objects people use, by analyzing facial and vocal expressions. The offshoot of MIT Media Lab, Affectiva created a new technological category of Artificial Emotional Intelligence, namely, Emotion AI.

<span class="mw-page-title-main">Visage SDK</span> Software development kit

Visage SDK is a multi-platform software development kit (SDK) created by Visage Technologies AB. Visage SDK allows software programmers to build facial motion capture and eye tracking applications.

Facial coding is the process of measuring human emotions through facial expressions. Emotions can be detected by computer algorithms for automatic emotion recognition that record facial expressions via webcam. This can be applied to better understanding of people’s reactions to visual stimuli.

Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.

Artificial empathy or computational empathy is the development of AI systems—such as companion robots or virtual agents—that can detect emotions and respond to them in an empathic way.

Amazon Rekognition is a cloud-based software as a service (SaaS) computer vision platform that was launched in 2016. It has been sold to, and used by, a number of United States government agencies, including U.S. Immigration and Customs Enforcement (ICE) and Orlando, Florida police, as well as private entities.

Nadine Gogolla is a Research Group Leader at the Max Planck Institute of Neurobiology in Martinsried, Germany as well as an Associate Faculty of the Graduate School for Systemic Neuroscience. Gogolla investigates the neural circuits underlying emotion to understand how the brain integrates external cues, feeling states, and emotions to make calculated behavioral decisions. Gogolla is known for her discovery using machine learning and two-photon microscopy to classify mouse facial expressions into emotion-like categories and correlate these facial expressions with neural activity in the insular cortex.

References

  1. "Face Detection: Facial recognition and finding Homepage".
  2. Lewis, Michael B; Ellis, Hadyn D (2003), "How we detect a face: A survey of psychological evidence", International Journal of Imaging Systems and Technology, 13: 3–7, doi:10.1002/ima.10040, S2CID   14976176
  3. 1 2 Sheu, Jia-Shing; Hsieh, Tsu-Shien; Shou, Ho-Nien (1 December 2014). "Automatic Generation of Facial Expression Using Triangular Geometric Deformation". Journal of Applied Research and Technology. 12 (6): 1115–1130. doi: 10.1016/S1665-6423(14)71671-2 . ISSN   2448-6736.
  4. Jun Zhang; Yong Yan; Lades, M. (1997). "Face recognition: Eigenface, elastic matching, and neural nets". Proceedings of the IEEE. 85 (9): 1423–1435. doi:10.1109/5.628712.
  5. "DCRP Review: Canon PowerShot S5 IS". Dcresource.com. Archived from the original on 2009-02-21. Retrieved 2011-02-15.
  6. Tesco face detection sparks needless surveillance panic, Facebook fails with teens, doubts over Google+ | Technology | theguardian.com
  7. IBM has to deal with the privacy issue of facial recognition | Technology | amarvelfox.com [ permanent dead link ]
  8. Bathelt, Joe; Geurts, Hilde M.; Borsboom, Denny (2022-06-01). "More than the sum of its parts: Merging network psychometrics and network neuroscience with application in autism". Network Neuroscience. 6 (2): 445–466. doi:10.1162/netn_a_00222. ISSN   2472-1751. PMC   9207995 . PMID   35733421.
  9. openai/CLIP, OpenAI, 2024-08-16, retrieved 2024-08-16
  10. Tomašević, Aleksandar; Major, Sara (2024-08-01). "Dynamic exploratory graph analysis of emotions in politics". Advances.in/Psychology. 2: e312144. doi:10.56296/aip00021. ISSN   2976-937X.
  11. Khare, Smith K.; Blanes-Vidal, Victoria; Nadimi, Esmaeil S.; Acharya, U. Rajendra (2024-02-01). "Emotion recognition and artificial intelligence: A systematic review (2014–2023) and research recommendations". Information Fusion. 102: 102019. doi:10.1016/j.inffus.2023.102019. ISSN   1566-2535.