Options
November 2025
Conference Paper
Title
Building a German-centric SpeechLLM Using Limited Data
Abstract
This paper presents a novel approach using German speech data to develop a Speech Large Language Model (Speech- LLM) for processing speech and text inputs. We introduce a data generation process as an alternative to Text-to- Speech for creating a Speech Instruction Following (SIF) training dataset, where we prompt an LLM to generate translations and summaries of speech transcripts and pair them with the corresponding audio file. Combined with original speech data, we train a model for Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST). Despite training only on German speech, our model processes English speech as well, retaining multilingual capabilities from pre-trained components. Evaluation shows reasonable ASR and AST performance given limited training data and demonstrates 0-shot Spoken Question Answering (SQA) capability with potential for future enhancements.
Author(s)
Conference
Keyword(s)
Training Data
Speech Recognition
Question Answering
Audio Files
Input Text
Automatic Speech
Limited Training Data
Speech Data
Speech Input
Frame Rate
Multilayer Perceptron
Multiple-choice Questions
Hourly Data
Speech Samples
Training Pipeline
Hidden Size
Word Error Rate
Text Modality
Speech Large Language Model (Speech- LLM)
Text-to-Speech
Speech Instruction Following (SIF)
Automatic Speech Recognition (ASR)
Automatic Speech Translation (AST)
multilingual
Spoken Question Answering (SQA)