Now showing 1 - 10 of 26
  • Publication
    Implementation and evaluation of an additional GPT-4-based reviewer in PRISMA-based medical systematic literature reviews
    ( 2024-09-01) ; ;
    Mackay, Sina
    ;
    ; ; ;
    Höres, Timm
    ;
    Allende-Cid, Héctor
    Background: PRISMA-based literature reviews require meticulous scrutiny of extensive textual data by multiple reviewers, which is associated with considerable human effort. Objective: To evaluate feasibility and reliability of using GPT-4 API as a complementary reviewer in systematic literature reviews based on the PRISMA framework. Methodology: A systematic literature review on the role of natural language processing and Large Language Models (LLMs) in automatic patient-trial matching was conducted using human reviewers and an AI-based reviewer (GPT-4 API). A RAG methodology with LangChain integration was used to process full-text articles. Agreement levels between two human reviewers and GPT-4 API for abstract screening and between a single reviewer and GPT-4 API for full-text parameter extraction were evaluated. Results: An almost perfect GPT–human reviewer agreement in the abstract screening process (Cohen's kappa > 0.9) and a lower agreement in the full-text parameter extraction were observed. Conclusion: As GPT-4 has performed on a par with human reviewers in abstract screening, we conclude that GPT-4 has an exciting potential of being used as a main screening tool for systematic literature reviews, replacing at least one of the human reviewers.
  • Publication
    Addressing a new Paradigm Shift: An Empirical Study on Novel Project Characteristics for Foundation Model Projects
    In recent years, data science and machine learning (ML) has become common across sectors and industries. Project methodologies are aimed at supporting projects and try catching up with ML trends and paradigm shifts. However, they are hardly successful, since still 80% of data science projects never reach deployment. The latest paradigm shift in the area of ML - the trend of generative AI and foundation models - changes the nature of data science projects and is not yet addressed by existing project methodologies. In this work, we present novel requirements that arise from real-world projects incorporating foundation models based on 29 case studies from the NLU domain. Furthermore, we assess existing data science methodologies and identify their shortcomings. Finally, we provide guidance on adapting projects to address the new challenges in the development and operation of foundation model based solutions.
  • Publication
    Natural Language Processing in der Medizin. Whitepaper
    Künstliche Intelligenz (KI) ist in der Medizin angekommen und bereits jetzt schon unverzichtbar. Zusammen mit der Digitalisierung beschleunigt KI die Verbreitung einer datengetriebenen und personalisierten Behandlung von Patient*innen. Gerade im Krankenhaus kann KI dabei helfen, Mitarbeitende zu unterstützen, Behandlungsergebnisse zu verbessern und Kosten einzusparen. KI-Anwendungen sind mittlerweile dazu fähig, radiologische Bildgebungen auszuwerten, herapieentscheidungen zu unterstützen und Sprachdiktate zu transkribieren. Im Besonderen wurde die Textverarbeitung durch Algorithmen des Natural Language Processing (NLP) revolutioniert, die auf einer KI basieren, die sich mit natürlicher Sprache beschäftigt. Gemeint ist damit das Lesen, Verstehen und Schreiben von Texten wie beispielsweise medizinischer Befunde, Dokumentationen oder Leitlinien.
  • Publication
    Symptom diaries as a digital tool to detect SARS-CoV-2 infections and differentiate between prevalent variants
    ( 2022-11-14)
    Grüne, Barbara
    ;
    ; ;
    Wolff, Anna
    ;
    Buess, Michael
    ;
    Kossow, Annelene
    ;
    Küfer-Weiß, Annika
    ;
    ;
    Neuhann, Florian
    The COVID-19 pandemic and the high numbers of infected individuals pose major challenges for public health departments. To overcome these challenges, the health department in Cologne has developed a software called DiKoMa. This software offers the possibility to track contact and index persons, but also provides a digital symptom diary. In this work, the question of whether these can also be used for diagnostic purposes will be investigated. Machine learning makes it possible to identify infections based on early symptom profiles and to distinguish between the predominant dominant variants. Focusing on the occurrence of the symptoms in the first week, a decision tree is trained for the differentiation between contact and index persons and the prevailing dominant variants (Wildtype, Alpha, Delta, and Omicron). The model is evaluated, using sex- and age-stratified cross-validation and validated by symptom profiles of the first 6 days. The variants achieve an AUC-ROC from 0.89 for Omicron and 0.6 for Alpha. No significant differences are observed for the results of the validation set (Alpha 0.63 and Omicron 0.87). The evaluation of symptom combinations using artificial intelligence can determine the individual risk for the presence of a COVID-19 infection, allows assignment to virus variants, and can contribute to the management of epidemics and pandemics on a national and international level. It can help to reduce the number of specific tests in times of low labor capacity and could help to early identify new virus variants.
  • Publication
    A Quantitative Human-Grounded Evaluation Process for Explainable ML
    ( 2022) ;
    Müller, Sebastian
    ;
    Methods from explainable machine learning are increasingly applied. However, evaluation of these methods is often anecdotal and not systematic. Prior work has identified properties of explanation quality and we argue that evaluation should be based on them. In this work, we provide an evaluation process that follows the idea of property testing. The process acknowledges the central role of the human, yet argues for a quantitative approach for the evaluation. We find that properties can be divided into two groups, one to ensure trustworthiness, the other to assess comprehensibility. Options for quantitative property tests are discussed. Future research should focus on the standardization of testing procedures.
  • Publication
    Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety
    ( 2021) ;
    Abrecht, Stephanie
    ;
    ;
    Bär, Andreas
    ;
    Brockherde, Felix
    ;
    Feifel, Patrick
    ;
    Fingscheidt, Tim
    ;
    ;
    Ghobadi, Seyed Eghbal
    ;
    Hammam, Ahmed
    ;
    Haselhoff, Anselm
    ;
    Hauser, Felix
    ;
    Heinzemann, Christian
    ;
    Hoffmann, Marco
    ;
    Kapoor, Nikhil
    ;
    Kappel, Falk
    ;
    Klingner, Marvin
    ;
    Kronenberger, Jan
    ;
    Küppers, Fabian
    ;
    Löhdefink, Jonas
    ;
    Mlynarski, Michael
    ;
    ;
    Mualla, Firas
    ;
    Pavlitskaya, Svetlana
    ;
    ;
    Pohl, Alexander
    ;
    Ravi-Kumar, Varun
    ;
    ;
    Rottmann, Matthias
    ;
    ;
    Sämann, Timo
    ;
    Schneider, Jan David
    ;
    ;
    Schwalbe, Gesina
    ;
    Sicking, Joachim
    ;
    Srivastava, Toshika
    ;
    Varghese, Serin
    ;
    Weber, Michael
    ;
    Wirkert, Sebastian
    ;
    ;
    Woehrle, Matthias
    The use of deep neural networks (DNNs) in safety-critical applications like mobile health and autonomous driving is challenging due to numerous model-inherent shortcomings. These shortcomings are diverse and range from a lack of generalization over insufficient interpretability to problems with malicious inputs. Cyber-physical systems employing DNNs are therefore likely to suffer from safety concerns. In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged. This work provides a structured and broad overview of them. We first identify categories of insufficiencies to then describe research activities aiming at their detection, quantification, or mitigation. Our paper addresses both machine learning experts and safety engineers: The former ones might profit from the broad range of machine learning (ML) topics covered and discussions on limitations of recent methods. The latter ones might gain insights into the specifics of modern ML methods. We moreover hope that our contribution fuels discussions on desiderata for ML systems and strategies on how to propel existing approaches accordingly.
  • Publication
    Aligning Subjective Ratings in Clinical Decision Making
    ( 2020) ; ; ; ;
    Foldenauer, Ann Christina
    ;
    Köhm, Michaela
    In addition to objective indicators (e.g. laboratory values), clinical data often contain subjective evaluations by experts (e.g. disease severity assessments). While objective indicators are more transparent and robust, the subjective evaluation contains a wealth of expert knowledge and intuition. In this work, we demonstrate the potential of pairwise ranking methods to align the subjective evaluation with objective indicators, creating a new score that combines their advantages and facilitates diagnosis. In a case study on patients at risk for developing Psoriatic Arthritis, we illustrate that the resulting score (1) increases classification accuracy when detecting disease presence/absence, (2) is sparse and (3) provides a nuanced assessment of severity for subsequent analysis.
  • Publication
    Using Probabilistic Soft Logic to Improve Information Extraction in the Legal Domain
    ( 2020) ; ;
    Schmude, Timothée
    ;
    Völkening, Malte
    ;
    Rostalski, Frauke
    ;
    Extracting information from court process documents to populate a knowledge base produces data valuable to legal faculties, publishers and law firms. A challenge lies in the fact that the relevant information is interdependent and structured by numerous semantic constraints of the legal domain. Ignoring these dependencies leads to inferior solutions. Hence, the objective of this paper is to demonstrate how the extraction pipeline can be improved by the use of probabilistic soft logic rules that reflect both legal and linguistic knowledge. We propose a probabilistic rule model for the overall extraction pipeline, which enables to both map dependencies between local extraction models and to integrate additional domain knowledge in the form of logical constraints. We evaluate the performance of the model on a German court sentences corpus.
  • Publication
    Advanced Sensing and Human Activity Recognition in Early Intervention and Rehabilitation of Elderly People
    ( 2020) ;
    Vargas Toro, Agustín
    ;
    Konietzny, Sebastian
    ;
    ;
    Schäpers, Barbara
    ;
    Steinböck, Martina
    ;
    Krewer, Carmen
    ;
    Müller, Friedemann
    ;
    Güttler, Jörg
    ;
    Bock, Thomas
    Ageing is associated with a decline in physical activity and a decrease in the ability to perform activities of daily living, affecting physical and mental health. Elderly people or patients could be supported by a human activity recognition (HAR) system that monitors their activity patterns and intervenes in case of change in behavior or a critical event has occurred. A HAR system could enable these people to have a more independent life. In our approach, we apply machine learning methods from the field of human activity recognition (HAR) to detect human activities. These algorithmic methods need a large database with structured datasets that contain human activities. Compared to existing data recording procedures for creating HAR datasets, we present a novel approach, since our target group comprises of elderly and diseased people, who do not possess the same physical condition as young and healthy persons. Since our targeted HAR system aims at supporting elderly and diseased people, we focus on daily activities, especially those to which clinical relevance in attributed, like hygiene activities, nutritional activities or lying positions. Therefore, we propose a methodology for capturing data with elderly and diseased people within a hospital under realistic conditions using wearable and ambient sensors. We describe how this approach is first tested with healthy people in a laboratory environment and then transferred to elderly people and patients in a hospital environment. We also describe the implementation of an activity recognition chain (ARC) that is commonly used to analyse human activity data by means of machine learning methods and aims to detect activity patterns. Finally, the results obtained so far are presented and discussed as well as remaining problems that should be addressed in future research.