A business logic system for mining German patient records
Fulltext: Introduction and goals: Today service provider in the public health sector face the major challenge to integrate innovations coming from research and development, to improve the quality of treatment, to raise the patient safety and to reduce the costs of health services. The secondary use of already existing biomedical routine data is one approach to make use of existing data resources in order to improve service quality. This paper presents an application scenario for secondary usage of electronic orthopaedic patient records. The goal is to analyse German unstructured endoprothetic surgery reports automatically. The approach uses a German Named Entity Recognition (NER) system and subsequently a system based on business rules to find relations between identified biomedical entities. Materials and Methods: Corpus of electronic patient records: The corpus of German health records and surgery reports was provided by the University of Erlangen-Nürnberg  and the RHÖN-KLINIKUM AG . The corpus contains 256 unique and anonymised reports (with an average length of about 257 words) with diverse structure and content. Annotation of biomedical entities in patient records: The ProMiner NER system  was used to identify entities of different terminologies. These manually curated terminologies cover relevant aspects of the use case such as model and manufacturer of an endoprosthesis, (previous) surgeries, and human anatomy. Finding relationships between entities: The subsequent step is to identify relevant relationships between prior identified entities. The business logic integration platform Drools  was used to build an infrastructure defining rules (currently 64 rules) in a domain specific language (DSL). The output format is the Operational Data Model (ODM). ODM ensures compatibility to clinical systems and allows easy integration into clinical context. Results: Preliminary results are shown for correct identification of endoprosthesis, anatomy, and relevant previous surgeries from unstructured free text. The following paragraph shows a typical abridgement from a German record: ""Diagnose: Mediale Gonarthrose links, Verfahren: Implantation einer unikondylären zementierten Oberflächenersatzprothese links (Typ balanSys der Fa. Mathys, zementiertes Tibiametallimplantat Größe 2, 20 g Refobacin-Palacos-Knochenzement)"" The defined rule-set extracts the relation between entities like Gonarthrose (Eng. gonathrosis -> diagnosis and anatomy) and links (Eng. left -> body site), as well as Implantation (Eng. implantation -> operation) and ""Oberflächenersatzprothese"" (surface replacement prosthesis -> category of endoprosthesis) and ""links"" (-> body site). Additionally the rule-set extracts the model ""balanSys"", the manufacturer ""Mathys"", and applied type of cement ""Refobacin-Palacos-Knochenzement"" out in this report. This approach shows a preliminary F-Score of 0.66 (precision 0.75, recall 0.58) on a first representative subset of 10 documents with 95 manually annotated relations, 800 entities, and 180 different ODM types. The authors are currently working on a larger gold standard and more complex business rules to extract further information. Discussion: Analysing unstructured patient records makes it possible to detect putative causal relationships within the data that are unknown today. A possible outcome might be a positive correlation of an endoprosthesis model and high rate of revision. Our system enables researcher to identify such facts automatically and hence improve patient quality in the long term.