Benchmarking commercial emotion detection systems using realistic distortions of facial image datasets
Currently, there are several widely used commercial cloud-based services that attempt to recognize an individual’s emotions based on their facial expressions. Most research into facial emotion recognition has used high-resolution, front-oriented, full-face images. However, when images are collected in naturalistic settings (e.g., using smartphone’s frontal camera), these images are likely to be far from ideal due to camera positioning, lighting conditions, and camera shake. The impact these conditions have on the accuracy of commercial emotion recognition services has not been studied in full detail. To fill this gap, we selected five prominent commercial emotion recognition systems—Amazon Rekognition, Baidu Research, Face++, Microsoft Azure, and Affectiva—and evaluated their performance via two experiments. In Experiment 1, we compared the systems’ accuracy at classifying images drawn from three standardized facial expression databases. In Experiment 2, we first identified several common scenarios (e.g., partially visible face) that can lead to poor-quality pictures during smartphone use, and manipulated the same set of images used in Experiment 1 to simulate these scenarios. We used the manipulated images to again compare the systems’ classification performance, finding that the systems varied in how well they handled manipulated images that simulate realistic image distortion. Based on our findings, we offer recommendations for developers and researchers who would like to use commercial facial emotion recognition technologies in their applications.