Options
2024
Conference Paper
Title
Towards Reducing Latency Using Beam Search in an Interactive Conversational Speech Agent
Abstract
The rapid advancement of generative artificial in-telligence (AI) has led to groundbreaking developments in large language models. As large language models generate textual sequences autoregressively, mitigating latency becomes imper-ative for providing a highly immersive interaction experience within a realtime conversation, for example, providing fast and accurate responses to users' questions. Current efforts focus on accelerating inference processes, yet often at the expense of model architecture alterations, leading to compromised quality. In this paper, we explore latency reduction in the case of speech-based conversational agents. We leverage mathematical functions based on Beam Search to analyze autoregressive textual sequences, enabling a nuanced evaluation of semantic quality during auditory interaction, for example, for use within interactive web podcasts. We implemented our concepts and used the software to evaluate the concepts within (1) an automated evaluation of 1000 question-answer pairs and (2) a user survey. The results show that the semantic quality of autoregressive textual sequences could be assessed successfully by our proposed mathematical terms.
Author(s)
Keyword(s)
Human-Computer Interaction
Interactive Podcasts
Large language models (LLM)
Latency Reduction
Branche: Information Technology
Branche: Cultural and Creative Economy
Research Line: Human computer interaction (HCI)
Research Line: Machine learning (ML)
LTA: Machine intelligence, algorithms, and data structures (incl. semantics)
Conversational user interfaces
Human-computer interaction (HCI)
Beam Search