Towards Reducing Latency Using Beam Search in an Interactive Conversational Speech Agent

Ott, NikolasNikolasOttHorst, RobinRobinHorstDörner, RalfRalfDörner2024-09-252024-10-092024-09-252024https://publica.fraunhofer.de/handle/publica/47568810.1109/GEM61861.2024.105857722-s2.0-85199537913The rapid advancement of generative artificial in-telligence (AI) has led to groundbreaking developments in large language models. As large language models generate textual sequences autoregressively, mitigating latency becomes imper-ative for providing a highly immersive interaction experience within a realtime conversation, for example, providing fast and accurate responses to users' questions. Current efforts focus on accelerating inference processes, yet often at the expense of model architecture alterations, leading to compromised quality. In this paper, we explore latency reduction in the case of speech-based conversational agents. We leverage mathematical functions based on Beam Search to analyze autoregressive textual sequences, enabling a nuanced evaluation of semantic quality during auditory interaction, for example, for use within interactive web podcasts. We implemented our concepts and used the software to evaluate the concepts within (1) an automated evaluation of 1000 question-answer pairs and (2) a user survey. The results show that the semantic quality of autoregressive textual sequences could be assessed successfully by our proposed mathematical terms.enHuman-Computer InteractionInteractive PodcastsLarge language models (LLM)Latency ReductionBranche: Information TechnologyBranche: Cultural and Creative EconomyResearch Line: Human computer interaction (HCI)Research Line: Machine learning (ML)LTA: Machine intelligence, algorithms, and data structures (incl. semantics)Conversational user interfacesHuman-computer interaction (HCI)Beam SearchTowards Reducing Latency Using Beam Search in an Interactive Conversational Speech Agentconference paper