Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Ultra-low power acoustic front-ends for natural language user interfaces

Ultra-low power acoustic front-ends for speech interfaces
: Fischer, Johannes; Bhardwaj, Kanav; Breiling, Marco; Leyh, Martin; Bäckström, Tom

Kutter, Christoph (Hrsg.) ; Verband der Elektrotechnik, Elektronik, Informationstechnik -VDE-:
VDE-Kongress 2016. Internet der Dinge: Technologien, Anwendungen, Perspektiven. CD-ROM : 07./08.11.2016, Center Mannheim; Kongressbeiträge
Berlin: VDE Verlag, 2016
ISBN: 978-3-8007-4308-7
5 S.
Kongress "Internet der Dinge - Technologien, Anwendungen, Perspektiven" <2016, Mannheim>
Fraunhofer IIS ()

Current research on IoT focuses predominantly on its potential applications and the digital communications aspects, while the required human-machine-interfaces (HMI) receive less attention. To enhance the users experience, smart phones, computers and recently also consumer electronics offer natural language user interfaces (NLUI). These NLUIs have the benefit that even for complex devices, all functionality is accessible without the need to browse complex menus. Moreover, especially in the context of IoT, it is beneficial to be capable to control various devices from a distance, such as lighting, thermostats, TVs or radio sets and many more. A device, which is constantly running a speech recognition frame-work would consume a considerable amount of energy, as such frame-works are of significant complexity. Therefore, this would not only waste energy, but also drain the battery rapidly. This problem can be circumvented by the use of a voice activity detector (VAD), such that the speech recognition framework is only run in the presence of speech. This concept can be still expanded by low complexity keyword spotting, such that the device does not utilize the speech recognition framework the entire time, but rather listens for the keyword which then wakes up the device. In the following paper we present a two stage VAD approach, where the first stage is based on time-domain features, and very low complexity at 0.3 million MACs. In the case of uncertainty, a second stage is evaluated with approximately 6.62 million MACs. The evaluation shows that in adverse conditions, the performance of the proposed approach is more accurate than conventional methods especially with non-stationary noise while retaining low complexity.