• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Reliable and Content-specific Support for Keyword Selection through AI and Statistics
 
  • Details
  • Full
Options
2024
Journal Article
Title

Reliable and Content-specific Support for Keyword Selection through AI and Statistics

Title Supplement
Characterising Educational Content with Large Language Models & Agreement Analyses
Abstract
Due to the recent popularity and availability of Large Language Models (LLMs), creators of educational materials can more efficiently extract keywords for use in personalised learning recommendations than ever before. However, due to the LLMs’ probabalistic nature, the automation of the otherwise labour-intense keyword extraction inherits the risk of biased and non-explainable results. In this research, we present an original framework to enhance keyword selection based on content title and description through a novel, reliability-sensitive, keyword selection algorithm. For this, we collected 38 potential keywords (together with their definitions) for five topics on dementia care from previous studies, together with two contents per topic. To assess the new method’s support in extracting keywords, we then prompted 5 human experts and 3 LLMs(using Retrieval Augmented Generation (RAG) for the keyword definitions) to select keywords to include and exclude for each content. Using Krippendorf’s 𝛼 metric, we then were able to adapt to the present agreement, and to reliably select keyword sets for inclusion and exclusion for each content individually. Last, we compared these LLM-based keyword sets with those selected by humans to assess the impact of the adaptive keyword selection algorithm. Overall, the results suggest that LLMs generally struggle with the task (66% of extraction attempts either contained hallucinated or did not return any keywords), and topic-wise internal agreement is low (𝛼=0.59 (0.42) for model 3 (using RAG) on average;𝛼=0.68 for human raters). Due to this, the reliable keyword selection resulted in a median set of 6|27 keywords for inclusion|exclusion per topic, with many of those keywords being within the benchmark keyword sets selected by human raters. To conclude, this approach shows effective in adapting to different levels of agreement in extracting keywords.
Author(s)
Strube, Tom
Fraunhofer-Institut für Software- und Systemtechnik ISST  
Nowak, Tom
Ruhr-Universitat Bochum
Pokotylo, Mariia
Fraunhofer-Institut für Software- und Systemtechnik ISST  
Kuhlenkötter, Bernd
Ruhr-Universitat Bochum
Journal
Current directions in biomedical engineering  
Open Access
DOI
10.1515/cdbme-2024-2154
Additional full text version
Landing Page
Language
English
Fraunhofer-Institut für Software- und Systemtechnik ISST  
Keyword(s)
  • Content Analysis

  • Education

  • Generative AI

  • Keyword Extraction

  • Reliability

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024