Assessment of COVID-19 lung involvement on computed tomography by deep-learning-, threshold-, and human reader-based approaches - an international, multi-center comparative study

Fervers, PhilippPhilippFerversFervers, FlorianFlorianFerversJaiswal, AsthaAsthaJaiswalRinneburger, MiriamMiriamRinneburgerWeisthoff, MathildaMathildaWeisthoffPollmann-Schweckhorst, PhilipPhilipPollmann-SchweckhorstKottlors, JonathanJonathanKottlorsCarolus, HeikeHeikeCarolusLennartz, SimonSimonLennartzMaintz, DavidDavidMaintzShahzad, RahilRahilShahzadPersigehl, ThorstenThorstenPersigehl2022-11-022022-11-022022https://publica.fraunhofer.de/handle/publica/42810110.21037/qims-22-175Background: The extent of lung involvement in coronavirus disease 2019 (COVID-19) pneumonia, quantified on computed tomography (CT), is an established biomarker for prognosis and guides clinical decision-making. The clinical standard is semi-quantitative scoring of lung involvement by an experienced reader. We aim to compare the performance of automated deep-learning- and threshold-based methods to the manual semi-quantitative lung scoring. Further, we aim to investigate an optimal threshold for quantification of involved lung in COVID pneumonia chest CT, using a multi-center dataset. Methods: In total 250 patients were included, 50 consecutive patients with RT-PCR confirmed COVID-19 from our local institutional database, and another 200 patients from four international datasets (n=50 each). Lung involvement was scored semi-quantitatively by three experienced radiologists according to the established chest CT score (CCS) ranging from 0-25. Inter-rater reliability was reported by the intraclass correlation coefficient (ICC). Deep-learning-based segmentation of ground-glass and consolidation was obtained by CT Pulmo Auto Results prototype plugin on IntelliSpace Discovery (Philips Healthcare, The Netherlands). Threshold-based segmentation of involved lung was implemented using an open-source tool for whole-lung segmentation under the presence of severe pathologies (R231CovidWeb, Hofmanninger et al., 2020) and consecutive quantitative assessment of lung attenuation. The best threshold was investigated by training and testing a linear regression of deep-learning and threshold-based results in a five-fold cross validation strategy. Results: Median CCS among 250 evaluated patients was 10 [6-15]. Inter-rater reliability of the CCS was excellent [ICC 0.97 (0.97-0.98)]. Best attenuation threshold for identification of involved lung was -522 HU. While the relationship of deep-learning- and threshold-based quantification was linear and strong (r2deep-learningvs. threshold=0.84), both automated quantification methods translated to the semi-quantitative CCS in a non-linear fashion, with an increasing slope towards higher CCS (r2deep-learningvs. CCS= 0.80, r2thresholdvs. CCS=0.63). Conclusions: The manual semi-quantitative CCS underestimates the extent of COVID pneumonia in higher score ranges, which limits its clinical usefulness in cases of severe disease. Clinical implementation of fully automated methods, such as deep-learning or threshold-based approaches (best threshold in our multi-center dataset: -522 HU), might save time of trained personnel, abolish inter-reader variability, and allow for truly quantitative, linear assessment of COVID lung involvement.enAssessment of COVID-19 lung involvement on computed tomography by deep-learning-, threshold-, and human reader-based approaches - an international, multi-center comparative studyjournal article