Options
September 1, 2024
Journal Article
Title
Implementation and evaluation of an additional GPT-4-based reviewer in PRISMA-based medical systematic literature reviews
Abstract
Background: PRISMA-based literature reviews require meticulous scrutiny of extensive textual data by multiple reviewers, which is associated with considerable human effort.
Objective: To evaluate feasibility and reliability of using GPT-4 API as a complementary reviewer in systematic literature reviews based on the PRISMA framework.
Methodology: A systematic literature review on the role of natural language processing and Large Language Models (LLMs) in automatic patient-trial matching was conducted using human reviewers and an AI-based reviewer (GPT-4 API). A RAG methodology with LangChain integration was used to process full-text articles. Agreement levels between two human reviewers and GPT-4 API for abstract screening and between a single reviewer and GPT-4 API for full-text parameter extraction were evaluated.
Results: An almost perfect GPT–human reviewer agreement in the abstract screening process (Cohen's kappa > 0.9) and a lower agreement in the full-text parameter extraction were observed.
Conclusion: As GPT-4 has performed on a par with human reviewers in abstract screening, we conclude that GPT-4 has an exciting potential of being used as a main screening tool for systematic literature reviews, replacing at least one of the human reviewers.
Objective: To evaluate feasibility and reliability of using GPT-4 API as a complementary reviewer in systematic literature reviews based on the PRISMA framework.
Methodology: A systematic literature review on the role of natural language processing and Large Language Models (LLMs) in automatic patient-trial matching was conducted using human reviewers and an AI-based reviewer (GPT-4 API). A RAG methodology with LangChain integration was used to process full-text articles. Agreement levels between two human reviewers and GPT-4 API for abstract screening and between a single reviewer and GPT-4 API for full-text parameter extraction were evaluated.
Results: An almost perfect GPT–human reviewer agreement in the abstract screening process (Cohen's kappa > 0.9) and a lower agreement in the full-text parameter extraction were observed.
Conclusion: As GPT-4 has performed on a par with human reviewers in abstract screening, we conclude that GPT-4 has an exciting potential of being used as a main screening tool for systematic literature reviews, replacing at least one of the human reviewers.
Author(s)
Open Access
Rights
CC BY-NC 4.0: Creative Commons Attribution-NonCommercial
Language
English