Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Deep Neural Network Driven Speech Classification for Relevance Detection in Automatic Medical Documentation

: Ahamed, S.; Weiler, G.; Boden, K.; Januschowski, K.; Stennes, M.; McCrae, P.; Bock, C.; Rawein, C.; Petris, M.; Foth, K.; Rohm, K.; Kiefer, S.

Volltext ()

Mantas, J. ; European Federation of Medical Informatics -EFMI-:
Public Health and Informatics : Proceedings of MIE 2021, held virtually, 29-21 May 2021
Amsterdam: IOS Press, 2021 (Studies in health technology and informatics 281)
ISBN: 978-1-64368-184-9 (Print)
ISBN: 978-1-64368-185-6 (Online)
Medical Informatics in Europe Conference (MIE) <31, 2021, Online>
Konferenzbeitrag, Elektronische Publikation
Fraunhofer IBMT ()
Fraunhofer IDMT ()

The automation of medical documentation is a highly desirable process, especially as it could avert significant temporal and monetary expenses in healthcare. With the help of complex modelling and high computational capability, Automatic Speech Recognition (ASR) and deep learning have made several promising attempts to this end. However, a factor that significantly determines the efficiency of these systems is the volume of speech that is processed in each medical examination. In the course of this study, we found that over half of the speech, recorded during follow-up examinations of patients treated with Intra-Vitreal Injections, was not relevant for medical documentation. In this paper, we evaluate the application of Convolutional and Long Short-Term Memory (LSTM) neural networks for the development of a speech classification module aimed at identifying speech relevant for medical report generation. In this regard, various topology parameters are tested and the effect of the model performance on different speaker attributes is analyzed. The results indicate that Convolutional Neural Networks (CNNs) are more successful than LSTM networks, and achieve a validation accuracy of 92.41%. Furthermore, on evaluation of the robustness of the model to gender, accent and unknown speakers, the neural network generalized satisfactorily.