• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Building a German-centric SpeechLLM Using Limited Data
 
  • Details
  • Full
Options
November 2025
Conference Paper
Title

Building a German-centric SpeechLLM Using Limited Data

Abstract
This paper presents a novel approach using German speech data to develop a Speech Large Language Model (Speech- LLM) for processing speech and text inputs. We introduce a data generation process as an alternative to Text-to- Speech for creating a Speech Instruction Following (SIF) training dataset, where we prompt an LLM to generate translations and summaries of speech transcripts and pair them with the corresponding audio file. Combined with original speech data, we train a model for Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST). Despite training only on German speech, our model processes English speech as well, retaining multilingual capabilities from pre-trained components. Evaluation shows reasonable ASR and AST performance given limited training data and demonstrates 0-shot Spoken Question Answering (SQA) capability with potential for future enhancements.
Author(s)
Maurya, Manas
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Dethmann, Thomas
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Walter, Oliver  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Schmidt, Christoph Andreas  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Köhler, Joachim  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Mainwork
Speech Communication. 16th ITG Conference 2025  
Conference
Conference on Speech Communication 2025  
Link
Link
Language
English
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Keyword(s)
  • Training Data

  • Speech Recognition

  • Question Answering

  • Audio Files

  • Input Text

  • Automatic Speech

  • Limited Training Data

  • Speech Data

  • Speech Input

  • Frame Rate

  • Multilayer Perceptron

  • Multiple-choice Questions

  • Hourly Data

  • Speech Samples

  • Training Pipeline

  • Hidden Size

  • Word Error Rate

  • Text Modality

  • Speech Large Language Model (Speech- LLM)

  • Text-to-Speech

  • Speech Instruction Following (SIF)

  • Automatic Speech Recognition (ASR)

  • Automatic Speech Translation (AST)

  • multilingual

  • Spoken Question Answering (SQA)

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024