Full Program »
Evaluating the Effectiveness of Protection Jamming Devices in Mitigating Smart Speaker Eavesdropping Attacks Using Gaussian White Noise
Protection Jamming Devices (PJD) are specialized tools designed to sit on top of virtual assistant (VA) smart speakers (i.e., Amazon Echo Dot, Google Home, etc.) and hinder them from ``hearing'' nearby user speech. These devices aim to protect you from smart speaker eavesdropping attacks by injecting a jamming signal directly into the microphones of the smart speaker device. However, we know that current signal processing routines can be used to reduce noise and enhance speech contained in noisy audio samples. Therefore, we identify a potential vulnerability for speech eavesdropping via smart speaker recordings, even when a PJD is being used. If an attacker can gain access to smart speaker recordings, or force the device to make recordings, they may be able to compromise the speech contained in the recording with successful noise cancellation. Specifically, we are interested in the potential for Gaussian white noise (GWN) to be an effective jamming signal for a PJD. To our knowledge, the effectiveness of white noise and PJDs to protect against eavesdropping attacks has received some attention in academia. But a systematic evaluation with physical experiments that use an actual PJD implementation has yet to be performed.
In this work we construct our own implementation of a PJD, specialized for consistent experimentation, to simulate an attack scenario where recordings from a smart speaker, in the presence of normal speech and the PJDs jamming signal, are recovered. We perform substantial data collection under different settings to build a repository of 1500 recovered audio samples. We applied post-processing on our dataset and conducted an extensive signal/speech quality analysis including both time and frequency domain inspection, and evaluation of metrics including cross-correlation, STNR, and PESQ. Lastly, we performed feature extraction (MFCC) and built machine learning classifiers for tasks including speech (digit) recognition, speaker identification, and gender recognition. We also attempted song recognition using the Shazam app. For all speech recognition tasks that we attempted, we were able to achieve classification accuracies above that of random guessing (46% for digit recognition, 51% for speaker identification, 80% for gender identification), as well as demonstrate successful song recognition. These results highlight the real potential for attackers to compromise user speech, to some extent, using smart speaker recordings; even if the smart speaker is protected by a PJD. And with improved signal processing techniques for more sophisticated attacks, greater accuracies could certainly be achieved.