|Bing Zhou||Stony Brook University|
|Jay Lohokare||Stony Brook University|
|Ruipeng Gao||Beijing Jiaotong University|
|Fan Ye||Stony Brook University|
User authentication on smartphones must satisfy both security and convenience, an inherently dificult balancing art. Apple's FaceID is arguably the latest of such eforts, at the cost of additional hardware (e.g., dot projector, flood illuminator and infrared camera). We propose a novel user authentication system EchoPrint, which leverages acoustics and vision for secure and convenient user authentication, without requiring any special hardware. EchoPrint actively emits almost inaudible acoustic signals from the earpiece speaker to “illuminate” the user's face and authenticates the user by the unique features extracted from the echoes bouncing of the 3D facial contour. To combat changes in phoneholding poses thus echoes, a Convolutional Neural Network (CNN) is trained to extract reliable acoustic features, which are further combined with visual facial landmark locations to feed a binary Support Vector Machine (SVM) classifier for ifnal authentication. Because the echo features depend on 3D facial geometries, EchoPrint is not easily spoofed by images or videos like 2D visual face recognition systems. It needs only commodity hardware, thus avoiding the extra costs of special sensors in solutions like FaceID. Experiments with 62 volunteers and non-human objects such as images, photos, and sculptures show that EchoPrint achieves 93.75% balanced accuracy and 93.50% F-score, while the average precision is 98.05%, and no image/video based attack is observed to succeed in spoofing.