AI for Speech Recognition


What Is A Speech Recognition System?

A speech recognition system is a type of software that allows the user to have their spoken words converted into written text in a computer application such as a word processor or spreadsheet. The computer can also be controlled by the use of spoken commands.

            Speech recognition software can be installed on a personal computer of appropriate specification. The user speaks into a microphone (a headphone microphone is usually supplied with the product). The software generally requires an initial training and enrolment process in order to teach the software to recognise the voice of the user. A voice profile is then produced that is unique to that individual. This procedure also helps the user to learn how to ‘speak’ to a computer.

 About

         When you dial the telephone number of a big company, you are likely to hear the sonorous voice of a cultured lady who responds to your call with great courtesy saying  “welcome to company X. Please give me the extension number you want” .You pronounce the extension number, your name, and the name of the person you want to contact. If the called person accepts the call, the connection is given quickly. This is artificial intelligence where an automatic call-handling system is used without employing any telephone operator.


 Working Of The System

              The voice input to the microphone produces an analogue speech signal. An analogue to digital converter (ADC) converts this speech signal into binary words that are compatible with digital computer. The converted binary version is then stored in the system and compared with previously stored binary representation of words and phrases.

What Software Is Available?

There are a number of publishers of speech recognition software. New and improved versions are regularly produced, and older versions are often sold at greatly reduced prices. Invariably, the newest versions require the most modern computers of well above average specification. Using the software on a computer with a lower specification means that it will run very slowly and may well be impossible to use. There are two main types of speech recognition software: discrete speech and continuous speech.

Acceptance And Rejection

               When the recognition engine processes an utterance, it returns a result. The result can be either of two states: acceptance or rejection. An accepted utterance is one in which the engine returns recognized text. Whatever the caller says, the speech recognition engine tries very hard to match the utterance to a word or phrase in the active grammar.

              Sometimes the match may be poor because the caller said something that the application was not expecting, or the caller spoke indistinctly. In these cases, the speech engine returns the closest match, which might be incorrect. Some engines also return a confidence score along with the text to indicate the likelihood that the returned text is correct. Not all utterances that are processed by the speech engine are accepted. Acceptance or rejection is flagged by the engine with each processed utterance.

The Limits Of Speech Recognition

            To improve speech recognition applications, designers must understand acoustic memory and prosody. Continued research and development should be able to improve certain speech input, output, and dialogue applications. Speech recognition and gen-eration is sometimes helpful for environments that are hands-busy, eyes-busy, mobility-required, or hostile and shows promise for telephone-based ser-vices.

              Dictation input is increasingly accurate, but adoption outside the disabled-user community has been slow compared to visual interfaces. Obvious physical problems include fatigue from speaking continuously and the disruption in an office filled with people speaking.

       By understanding the cognitive processes sur-rounding human “acoustic memory” and process-ing, interface designers may be able to integrate speech more effectively and guide users more suc- cessfully. By appreciating the differences between human-human interaction and human-computer interaction, designers may then be able to choose appropriate applications for human use of speech with computers.

Conclusion

        Speech recognition will revolutionize the way people conduct business over the Web and will, ultimately, differentiate world-class e-businesses. VoiceXML ties speech  recognition and telephony together and provides the technology with which businesses can develop and deploy voice-enabled Web solutions TODAY! These solutions can greatly expand the accessibility of Web-based self-service transactions to customers who would otherwise not have access, and, at the same time, leverage a business’ existing Web investments.


No comments:

Post a Comment