Computer experts Nicholas Carlini and David Wagner succeeded in cheating Mozilla's most popular DeepSpeech open source text-to-speech converter, giving anyone an audio sound wave for a scientist to make another Audio sound waves, with a similarity of over 99.9%, allows scientists to select any phrase at transcription speed of up to 50 characters per second. The success rate of attacks is as high as 100%, regardless of the speech being transcribed, and the original source phrase If you enter arbitrary waveform sounds, but not voice, such as inputting music, scientists can insert the voice into the audio and you can not recognize it as voice; you can hide the audio by keeping it silent Text - voice conversion system.
In other words, in theory, scientists can manipulate arbitrary audio files and trick the text-to-speech converter to convince them that audio is something else. The world today is full of smart speakers and voice assistants, and the new attack technique is undoubtedly one heavy bomb.