Learning Emotion Representations from Verbal and Nonverbal Communication
Sitao Zhang, Yimu Pan, James Z. Wang
The Pennsylvania State University, USA
Abstract:
Emotion understanding is an essential but
highly challenging component of artificial general intelligence. The
absence of extensive annotated datasets has significantly impeded
advancements in this field. We present EmotionCLIP, the first
pre-training paradigm to extract visual emotion representations from
verbal and nonverbal communication using only uncurated data. Compared
to numerical labels or descriptions used in previous methods,
communication naturally contains emotion information. Furthermore,
acquiring emotion representations from communication is more congruent
with the human learning process. We guide EmotionCLIP to attend to
nonverbal emotion cues through subject-aware context encoding and
verbal emotion cues using sentiment-guided contrastive
learning. Extensive experiments validate the effectiveness and
transferability of EmotionCLIP. Using merely linear-probe evaluation
protocol, EmotionCLIP outperforms the state-of-the-art supervised
visual emotion recognition methods and rivals many multimodal
approaches across various benchmarks. We anticipate that the advent of
EmotionCLIP will address the prevailing issue of data scarcity in
emotion understanding, thereby fostering progress in related
domains. The code and pre-trained models are available at
https://github.com/Xeaver/EmotionCLIP.
Full Paper (including Appendix)
(high-resolution PDF, 3MB)
More information
Citation:
Sitao Zhang, Yimu Pan and James Z. Wang, ``Learning Emotion
Representations from Verbal and Nonverbal Communication,'' Proceedings
of the IEEE/CVF International Conference on Computer Vision and
Pattern Recognition, pp. 18993-19004, Vancouver, Canada, June 2023.
© 2023 IEEE/CVF. Personal use of this material is permitted. However,
permission to reprint/republish this material for advertising or
promotional purposes or for creating new collective works for resale
or redistribution to servers or lists, or to reuse any copyrighted
component of this work in other works must be obtained from the IEEE/CVF.
Last Modified:
October 18, 2023
© 2023