Synergy Effect Inc

BUSINESS OPTIMIZATION AND DIGITALIZATION CONSULTANCY SERVICES AGENCY

BLOG

Helping Computers Perceive Human Emotions

Jun 24, 2020 | Artificial Intelligence (AI), Digitalization Trends

MIT  Media Lab researchers have developed a machine-learning model that  takes computers a step closer to interpreting our emotions as naturally  as humans do.

In the growing field of “affective computing,” robots and computers  are being developed to analyze facial expressions, interpret our  emotions, and respond accordingly. Applications include, for instance,  monitoring an individual’s health and well-being, gauging student  interest in classrooms, helping diagnose signs of certain diseases, and  developing helpful robot companions.

A challenge, however, is people express emotions quite differently,  depending on many factors. General differences can be seen among  cultures, genders, and age groups. But other differences are even more  fine-grained: The time of day, how much you slept, or even your level of  familiarity with a conversation partner leads to subtle variations in  the way you express, say, happiness or sadness in a given moment.

Human brains instinctively catch these deviations, but machines  struggle. Deep-learning techniques were developed in recent years to  help catch the subtleties, but they’re still not as accurate or as  adaptable across different populations as they could be.

The Media Lab researchers have developed a machine-learning model  that outperforms traditional systems in capturing these small facial  expression variations, to better gauge mood while training on thousands  of images of faces. Moreover, by using a little extra training data, the  model can be adapted to an entirely new group of people, with the same  efficacy. The aim is to improve existing affective-computing  technologies.

“This is an unobtrusive way to monitor our moods,” says Oggi Rudovic,  a Media Lab researcher and co-author on a paper describing the model,  which was presented last week at the Conference on Machine Learning and  Data Mining. “If you want robots with social intelligence, you have to  make them intelligently and naturally respond to our moods and emotions,  more like humans.”

Co-authors on the paper are: first author Michael Feffer, an  undergraduate student in electrical engineering and computer science;  and Rosalind Picard, a professor of media arts and sciences and founding  director of the Affective Computing research group.

Personalized experts

Traditional affective-computing models use a “one-size-fits-all”  concept. They train on one set of images depicting various facial  expressions, optimizing features — such as how a lip curls when smiling —  and mapping those general feature optimizations across an entire set of  new images.

The researchers, instead, combined a technique, called “mixture of  experts” (MoE), with model personalization techniques, which helped mine  more fine-grained facial-expression data from individuals. This is the  first time these two techniques have been combined for affective  computing, Rudovic says.

In MoEs, a number of neural network models, called “experts,” are  each trained to specialize in a separate processing task and produce one  output. The researchers also incorporated a “gating network,” which  calculates probabilities of which expert will best detect moods of  unseen subjects. “Basically the network can discern between individuals  and say, ‘This is the right expert for the given image,’” Feffer says.

For their model, the researchers personalized the MoEs by matching  each expert to one of 18 individual video recordings in the RECOLA  database, a public database of people conversing on a video-chat  platform designed for affective-computing applications. They trained the  model using nine subjects and evaluated them on the other nine, with  all videos broken down into individual frames.

Each expert, and the gating network, tracked facial expressions of  each individual, with the help of a residual network (“ResNet”), a  neural network used for object classification. In doing so, the model  scored each frame based on level of valence (pleasant or unpleasant) and  arousal (excitement) — commonly used metrics to encode different  emotional states. Separately, six human experts labeled each frame for  valence and arousal, based on a scale of -1 (low levels) to 1 (high  levels), which the model also used to train.

The researchers then performed further model personalization, where  they fed the trained model data from some frames of the remaining videos  of subjects, and then tested the model on all unseen frames from those  videos. Results showed that, with just 5 to 10 percent of data from the  new population, the model outperformed traditional models by a large  margin — meaning it scored valence and arousal on unseen images much  closer to the interpretations of human experts.

This shows the potential of the models to adapt from population to  population, or individual to individual, with very few data, Rudovic  says. “That’s key,” he says. “When you have a new population, you have  to have a way to account for shifting of data distribution [subtle  facial variations]. Imagine a model set to analyze facial expressions in  one culture that needs to be adapted for a different culture. Without  accounting for this data shift, those models will underperform. But if  you just sample a bit from a new culture to adapt our model, these  models can do much better, especially on the individual level. This is  where the importance of the model personalization can best be seen.”

Currently available data for such affective-computing research isn’t  very diverse in skin colors, so the researchers’ training data were  limited. But when such data become available, the model can be trained  for use on more diverse populations. The next step, Feffer says, is to  train the model on “a much bigger dataset with more diverse cultures.”

Better machine-human interactions

Another goal is to train the model to help computers and robots  automatically learn from small amounts of changing data to more  naturally detect how we feel and better serve human needs, the  researchers say.

It could, for example, run in the background of a computer or mobile  device to track a user’s video-based conversations and learn subtle  facial expression changes under different contexts. “You can have things  like smartphone apps or websites be able to tell how people are feeling  and recommend ways to cope with stress or pain, and other things that  are impacting their lives negatively,” Feffer says.

This could also be helpful in monitoring, say, depression or  dementia, as people’s facial expressions tend to subtly change due to  those conditions. “Being able to passively monitor our facial  expressions,” Rudovic says, “we could over time be able to personalize  these models to users and monitor how much deviations they have on daily  basis — deviating from the average level of facial expressiveness — and  use it for indicators of well-being and health.”

A promising application, Rudovic says, is human-robotic interactions,  such as for personal robotics or robots used for educational purposes,  where the robots need to adapt to assess the emotional states of many  different people. One version, for instance, has been used in helping robots better interpret the moods of children with autism.

Roddy Cowie, professor emeritus of psychology at the Queen’s  University Belfast and an affective computing scholar, says the MIT work  “illustrates where we really are” in the field. “We are edging toward  systems that can roughly place, from pictures of people’s faces, where  they lie on scales from very positive to very negative, and very active  to very passive,” he says. “It seems intuitive that the emotional signs  one person gives are not the same as the signs another gives, and so it  makes a lot of sense that emotion recognition works better when it is  personalized. The method of personalizing reflects another intriguing  point, that it is more effective to train multiple ‘experts,’ and  aggregate their judgments, than to train a single super-expert. The two  together make a satisfying package.”

 

 

 

 

Original Article published by MIT News/ Rob Matheson/ 2018

Share this Post:

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *