Objective assessment of a speech enhancement scheme with an automatic speech recognition-based system

Pusch, A.A.PuschMoritz, N.N.MoritzSchepker, H.H.SchepkerMeyer, B.T.B.T.MeyerHuber, R.R.HuberRennies, JanJanRennies2022-03-152022-03-152018https://publica.fraunhofer.de/handle/publica/412222A single-ended method for the prediction of perceived listening effort based on an automatic speech recognition system was adopted from the literature and modified to evaluate a near-end listening enhancement (NELE) scheme. The listening effort prediction method employs a deep time delay neural network (TDNN) that was trained as part of an automatic speech recognizer. The TDNN computes phoneme posterior probabilities (or âposteriorgramsâ), which degrade in the presence of noise or other distortions. The degree of posteriorgram degradation is quantified by a performance measure and serves as a predictor for mean subjective listening effort ratings of normal-hearing listeners. The modification of the original method consists of the usage of a TDNN (in contrast to a regular feed-forward DNN used before), which was trained on a much bigger speech corpus. Without any task-specific training or optimization, the modified method achieves a very high correlation with subject ive listening effort ratings from the used test data set of unprocessed and NELE-processed speech in two types of background noise (r = 0.98), generalizes to unseen noise conditions, and produces consistent predictions across these conditions that can be directly compared.en621006Objective assessment of a speech enhancement scheme with an automatic speech recognition-based systemconference paper