Building a German-centric SpeechLLM Using Limited Data

Maurya, Manas; Dethmann, Thomas; Walter, Oliver; Schmidt, Christoph Andreas; Köhler, Joachim

November 2025

Conference Paper

Abstract

This paper presents a novel approach using German speech data to develop a Speech Large Language Model (Speech- LLM) for processing speech and text inputs. We introduce a data generation process as an alternative to Text-to- Speech for creating a Speech Instruction Following (SIF) training dataset, where we prompt an LLM to generate translations and summaries of speech transcripts and pair them with the corresponding audio file. Combined with original speech data, we train a model for Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST). Despite training only on German speech, our model processes English speech as well, retaining multilingual capabilities from pre-trained components. Evaluation shows reasonable ASR and AST performance given limited training data and demonstrates 0-shot Spoken Question Answering (SQA) capability with potential for future enhancements.