From mental retardation to intelligence | Optical and speech recognition took 100 years

Not long ago, Chinese version Bixby Open beta, though this is not Bixby Of the first release, but it means that the new voice giant began to enter the Chinese market found in our tests, Bixby Have a good recognition rate, you can achieve voice unlock, voice text conversion, and users can voice calls to the phone text messages, software control, third-party content access and other operations.In a sense, the phone Have achieved understanding of human voice.

It took 100 years from mental retardation to intelligent light-to-speech recognition

When we see artificial intelligence in human-science dialogues in science fiction movies, we still feel very tech-savvy. In fact, our voice assistants have quietly entered our lives so that when you see an excellent voice product nowadays, Would be unfamiliar, but in most of us's impressions, the voice assistant is still a rather 'mentally handicapped' existence, which in fact does not deny it because the machine is not intelligent, it can not think like a human being, or it has not had a Enough to make it think about the huge system.But it is also undeniable that the artificial intelligence in speech recognition already has a considerable scale.

Speech recognition has brought great convenience

For a very common example, most of the current Internet TV networks support voice search, which greatly facilitates the use of television because the input device of the battery is still remote-controlled and its input efficiency can not match the keyboard, but the support Voice, as long as the input voice commands on the television, you can complete the interaction, is very convenient.In addition, the voice recognition interaction for the blind use of the device has brought the gospel. Its role should not be underestimated.

Phone voice assistant (picture from the network)

I believe there are a lot of readers who did not care about speech recognition have such questions: Brother, are you kidding? Siri Pull two, there are smart speakers at home, the ordinary voice assistant Well, really how you talk about Xuanhu? And the price of smart speakers are not expensive Well, how high-tech would be so cheap? Also not to mention, It's not really a small project to develop a voice assistant, and voice assistants are often hard to come by, at least when you dictate your voice, IME vendors do not charge you, so without strong funding it's hard to stick with it Revenue of the day.

Speech recognition is also some tasteless (pictures from the network)

Now we go to see the manufacturer of voice assistant publicity, are saying that our accuracy reached a few percent. For now, the accuracy rate reached 90%The above is quite good.Although we have such a high rate of accuracy, but we still feel some of the tasteless voice assistants, which is language complexity, the third party interface has a great relationship between the integrity of the current fever school, we To talk about the realization of voice recognition theory, as well as the status quo and future development of voice recognition, imagine when artificial intelligence will rule the world.

This article belongs to the original article, if reprinted, please indicate the source: from intellectual retarded to intelligent light and voice recognition took 100 years http://mobile.zol.com.cn/665/6656792.html

Speech recognition to achieve the principle: database

To summarize, the principle of speech recognition is actually not hard to understand. The principle of fingerprint recognition is the same on the large surface: the device collects the target speech and then performs a series of processing on the collected speech to obtain the characteristic information of the target speech and then makes the characteristic The similarity between the information and the stored data in the database is compared and the highest score is the recognition result, then the speech recognition function of the device is completed through the access of other systems.

Briefly identify the process

If you are not interested in speech recognition, then know the general principle is sufficient. In fact, the process of speech recognition is quite complex. The most direct reason is that the complexity of speech. Fingerprint identification as long as the input fingerprint information and Database stored in the matching information to complete the identification, the database contains only a few fingerprint information only, but the voice is completely different.

"Kangxi Dictionary" (quoted from the Chinese antiques network)

Kangxi Dictionary contains Chinese characters 47035Which was compiled by more than thirty famous scholars at the time for six years, while Chinese is not only an isolated Chinese character, but also has a rise and fall in the language, a sense of sentiment, and a different understanding of different semantics.At the same time, Not speak mandarin, there are many dialects, you want to build a perfect language database can imagine how difficult .Chinese complexity is Bixby Chinese version later than the English version of the reason.

High-accuracy speech recognition can not be separated without a huge cloud-based database (images from the network)

Large language database is difficult to place on the mobile terminal, which is why almost all mobile voice assistants need to network when the speech recognition development is not without the offline version, but we can easily find the off-line version of the accuracy is far Lower than the online version.In addition, we have just mentioned that many voice manufacturers have claimed that the accuracy rate reached 90%Above, this can be said to be very remarkable, it is not an exaggeration to say that this time every upgrade 1A percentage of the accuracy rate is a qualitative leap.This not only to a fairly complete database, to meet such an accuracy rate has to be more efficient recognition algorithm and self-learning system.

Of course, such data should be dialectic of the point of view, the saying goes, a hundred words that the Chinese language can be described as broad and profound; and the accuracy of the data given by the manufacturer of the test is difficult to have extensive, so some users are using voice recognition It is normal to find that it is still mentally handicapped when it comes to functionality.

http://mobile.zol.com.cn/665/6657840.html mobile.zol.com.cn true Zhongguancun Online http://mobile.zol.com.cn/665/6656792.html report 1512 The principle of speech recognition: The database summarizes that the principle of speech recognition is not hard to understand. The principle of fingerprint recognition is the same on both large-scale and large-scale: the device collects the target speech and then performs a series of processing on the collected speech to obtain the target speech Feature information, and then make the feature information and the database stored data for similarity search and comparison, the highest score is the recognition result.

Speech recognition to achieve the principle: algorithms and self-learning

Earlier we mentioned the recognition algorithm and the self-learning system. Here we can briefly understand the working process of them: firstly, the speech recognition system preprocesses the collected target speech, which is very complicated and includes the sampling of speech signal, Anti-aliasing band-pass filtering, removal of individual pronunciation differences and equipment, ambient noise effects, etc. The processed speech is then feature extracted.

Digital voice waveform (images from the network)

We know that the essence of the sound is vibration, which can be expressed by the waveform, the identification needs to frame the wave, a number of frames form a state, three states constitute a phoneme. English commonly used phoneme set is Carnegie Mellon University A 39Phonemes composed of phonemes, the Chinese generally directly with all the initials and vowels as phoneme, in addition to Chinese recognition is also subtle tone .After the synthesis by phoneme system of words or Chinese characters.Of course, after the match and post-content processing also need corresponding Algorithm to complete.

Output text form of identification process

Self-learning system is more for the database.Will convert speech into text speech recognition system to have two databases, one with the extracted information to match the acoustic model database, the second is to match Text language database.These two databases need to advance a large number of data model training analysis, also known as the self-learning system, thus extracting a useful data model constitutes a database; In addition, in the identification process, self-learning system will summarize the user Habits and identification methods, and then the data into the database, so that the identification system for the user is more intelligent.

To further summarize the entire recognition process: the acquisition of the target voice processing, access to the key part of the voice information - extract key information - identify the smallest unit words, analysis of the provisions of the grammar arrangement - analysis of sentence semantics, the key elements Sentence arrangement, adjust the composition of the text - according to the overall information to amend the slight deviation of the content.

http://mobile.zol.com.cn/665/6657841.html mobile.zol.com.cn true Zhongguancun Online http://mobile.zol.com.cn/665/6656792.html report 1278 Speech recognition to achieve the principle: Algorithm and self-learning We just mentioned the recognition algorithm and self-learning extraction system, where we may wish to briefly understand their work process: First, speech recognition system to collect the target speech pretreatment, the process Has been very complex, including voice signal sampling, anti-aliasing band-pass filtering, removal of individual pronunciation differences and equipment, environmental ...

Voice recognition status and future

Radio Rex Toy dog

Artificial intelligence can be erupted in these two years is not an overnight thing, speech recognition is no exception from the initial voice recognition prototype, up to now 90%The above accuracy rate, there are already about 100Years of history. Radio Rex Toy dog production in the last century 20The era, this toy dog can be called when it pops up, is considered the originator of speech recognition.In the true sense of speech recognition began in the last century 50Age, AT & T Bell Lab-built Audry System to achieve the ten English digital voice recognition.

And recently hot NPU Neural network, as early as 60The age has been used for speech recognition. The large vocabulary, continuous speech, non-specific three characteristics based on a Sphinx System was born in 80End of the year. 90Since the age is a good time for voice recognition development, government agencies began to attach importance to voice recognition technology, many well-known companies have begun to invest heavily in this area, a large number of high-level research institutions have joined the voice recognition research field for a time Significant achievement.

IETF voice dictation

Today, speech recognition has made a breakthrough. 2017year 8month 20day ,Microsoft speech recognition system error rate from 5.9%Reduce to 5.1%, Can reach the level of professional stenographers; domestic leader in the field of voice recognition, ItexamTech voice dictation accuracy is reached 95%, Performance tough domestic such as Ali, Baidu, Tencent and other large companies have also made their voice recognition, the prospect of a promising.

IETF Voice Assistant Voice of the Flying Fish System (picture from the network)

Moreover, the speech recognition system will not only be used in the previously mentioned mobile phone interaction, intelligent speaker command, in the toys, furniture, home, automotive, justice, medical, education, industry and other fields, speech recognition systems will play a can not be ignored After all, at a time when artificial intelligence is just beginning, interactive voice is the most efficient way of human-computer interaction before devices can easily detect human's thoughts.

Write in the end

See here, I believe we have a general understanding of speech recognition.We see in mobile phones, smart speakers speech recognition, speech recognition is only the field of iceberg, and we will see in the future will have More forms of speech recognition are used in all aspects of everyday life, such as speech recognition systems with driverless cars. Whenever you tell a car where to go, the car can automatically take you to your destination.

Artificial Intelligence, when will rule the world, the problem is not good to say that artificial intelligence has mastered the natural language skills, even if the relative language ability of humans is still very junior, but can be given according to the corresponding content, which has a The conditions of wisdom, in a sense, are the result of the integration of a basic function of human beings, but apparently this is not something we are worried about, and it is good to expect and enjoy the benefits of artificial intelligence.

http://mobile.zol.com.cn/665/6657842.html mobile.zol.com.cn true Zhongguancun Online http://mobile.zol.com.cn/665/6656792.html report 1851 Voice recognition status quo and the future Radio Rex toy dog (Photo quoted from the network) Artificial intelligence can be erupted in the past two years is not an overnight thing, voice recognition is no exception from the initial speech recognition prototype, up to now more than 90% accuracy About 100 years ago, Radio Rex Toy Dog was produced in the 1920s, when the toy dog was called ...