Many people have grown accustomed to talking to their smart devices, asking them to read a text, play a song or set an alarm. But someone else might be secretly talking to them, too.
Over the past two years, researchers in China and the United States have begun demonstrating that they can send hidden commands that are undetectable to the human ear to Apple’s Siri, Amazon’s Alexa and Google’s Assistant.
Inside university labs, the researchers have been able to secretly activate the artificial intelligence systems on smartphones and smart speakers, making them dial phone numbers or open websites. In the wrong hands, the technology could be used to unlock doors, wire money or buy stuff online — simply with music playing over the radio.
A group of students from University of California, Berkeley, and Georgetown University showed in 2016 that they could hide commands in white noise played over loudspeakers and through YouTube videos to get smart devices to turn on aeroplane mode or open a website.
This month, some of those Berkeley researchers published a research paper that went further, saying they could embed commands directly into recordings of music or spoken text. So while a human listener hears someone talking or an orchestra playing, Amazon’s Echo speaker might hear an instruction to add something to your shopping list.
“We wanted to see if we could make it even more stealthy,” said Nicholas Carlini, a fifth-year PhD student in computer security at UC Berkeley and one of the paper’s authors.
Speech-recognition systems typically translate each sound to a letter, eventually compiling those into words and phrases. By making slight changes to audio files, researchers were able to cancel out the sound that the speech-recognition system was supposed to hear and replace it with a sound that would be transcribed differently by machines while being nearly undetectable to the human ear.
Amazon said that it doesn’t disclose specific security measures, but it has taken steps to ensure its Echo smart speaker is secure. Google said that security is an ongoing focus and that its Assistant has features to mitigate undetectable audio commands. Both companies’ assistants employ voice-recognition technology to prevent devices from acting on certain commands unless they recognise the user’s voice.
Apple said its smart speaker, HomePod, is designed to prevent commands from doing things like unlocking doors, and it noted that iPhones and iPads must be unlocked before Siri will act on commands that access sensitive data or open apps and websites, among other measures.
Carlini and his colleagues at Berkeley incorporated commands into audio recognised by Mozilla’s DeepSpeech voice-to-text translation software, an open-source platform. They were able to hide the command, “OK, Google, browse to evil.com” in a recording of the spoken phrase, “Without the data set, the article is useless.” Humans cannot discern the command.
The Berkeley group also embedded the command in music files, including a four-second clip from Verdi’s “Requiem.”