Glimpse Of Next Big Thing-Automatic Speech Recognition

Automatic Speech Recognition (ASR) has been in development for decades. However significant progress and widespread adoption has only recently been achieved. This has been driven by the prevalence of devices such as smartphones and tablets where the absence of a physical keyboard means speech recognition provides a potentially useful input mechanism.

Apple’s acquisition and commercialization of Siri was catalytic in giving ASR a larger user base and more awareness. Google’s Now, Microsoft’s Cortana and Amazon’s Alexa have benefited both from Siri’s front running, but also Siri’s poor user experience.

ASR is primarily used to perform basic operations like device navigation, internet search, setting up a reminder etc. as a part of smartphone’s utility tools. However, ASR integration into connected devices with cloud based infrastructure and seamless integration through 5G networks will revive its real value.

Integration of natural language processing capacity and predictive analytics will push ASR solutions from a simple assistant to something closer to an efficient consultant for the user. Up-coming ecosystems around wearables and artificial intelligent will drive the ASR usability in next few years to become one of the key user interface options of next generation devices.

In the IoT space, speech recognition can be utilized in smart homes to control HVAC, lighting, entertainment and other connected devices. In order to achieve this, the device interface will require development particularly in relation to microphone position and performance.

The smartphone space will remain a key consumer of ASR; it can act as one point of differentiation where there are few others. Apple is currently marketing Siri strongly, although usability remains poor in our experience, compared to Google’s Now and Microsoft’s Cortana, both of which achieve better voice recognition.

The high-level ASR value chain is shown below. ASR solution providers are driving the value chain through different routes. The final value is delivered through various applications and/or devices.

ASR Value Chain


 Efficient microphones and associated noise cancelling technologies are coming to market that can have a transformative effect on the usability of ASR. Amazon’s Echo device has demonstrated a remarkable ability to ‘hear’ instructions even in noisy environments. Players like Kopin that showcased its “Whisper Voice Chip” at CES, which can detect the slightest voice input despite high levels of background noise.

Amazon’s Echo differs from smartphone implementations of ASR because voice control is Echo’s main function. It acts secondarily as a loudspeaker. For other devices – whether a smartphone or a smart car, acting as a voice-driven assistant is currently a second-order function.

However, with the development of technology around the ASR value chain, the role of an intelligent agent can become a primary feature. We think this will be important in the further development of IoT, smart cars and other applications where the use of hands as an interface can be problematic.