Toward a Statistically Well-Grounded Evaluation of Listening Tests - Avoiding Pitfalls, Misuse, and Misconceptions

Nagel, F.F.NagelSporer, T.T.SporerSedlmeier, P.P.Sedlmeier2022-03-112022-03-112010https://publica.fraunhofer.de/handle/publica/370535Many recent publications in audio research present subjective evaluations of audio quality based on the Recommendation ITU-R BS.1534-1 (MUSHRA, MUltiple Stimuli with Hidden Reference and Anchor). This is a very welcome trend because it enables researchers to assess the implications of their developments. The evaluation of listening tests, however, sometimes sufers from an incomplete understanding of the underlying statistics. The present paper aims at identifying the causes for the pitfalls and misconceptions in MUSHRA evaluations. It exemplifes the impact of falsely used or even misused statistics. Subsequently, schemes for evaluating the listeners' judgments that are well-grounded on statistical considerations comprising an understanding of the concepts of statistical power and efect size are proposed.en621006Toward a Statistically Well-Grounded Evaluation of Listening Tests - Avoiding Pitfalls, Misuse, and Misconceptionsconference paper