Subjective assessment of ovarian masses using pattern recognition
The impact of experience on diagnostic performance and interobserver variability
Purpose: To compare diagnostic performance and interobserver variability in a group of 36 examiners, with four different levels of experience. Methods: Nine junior trainees, eight level I senior trainees, 11 level II senior gynecologists, and eight level III expert sonologists classified 105 ultrasound images of adnexal masses into three subgroups of ovarian lesions (malignancies, functional cysts, and dermoid cysts). Results: The level III sonologists obtained the best diagnostic results together with the lowest interobserver variability ( = 0.70, SD = 0.04). They achieved significantly better results in comparison with the junior trainees and also the senior trainees ( = 0.51, SD = 0.12, p < 0.001; and = 0.51, SD = 0.09, p < 0.001). Differences between level III sonologists and the group of level II observers did not reach statistical significance ( = 0.65, SD = 0.09, p = 0.70). There were no significant differences between senior and junior trainees (p = 1.0) and bo th groups achieved a significantly poorer diagnostic performance in comparison with the level II observers (p < 0.01 and p < 0.01). For all observers, the largest differences were seen for classifying malignancies, the best results for classifying functional cysts, and the poorest for evaluating dermoid cysts. Conclusions: Diagnostic performance of pattern recognition significantly improves with an increasing level of experience, emphasizing the importance of standardized ultrasound training programs with supervision by experts.