Options
2025
Journal Article
Title
A systematic review of long document summarization methods: Evaluation metrics and approaches
Abstract
The rapid growth of complex textual data in domains such as medicine, law, and science has heightened the relevance of Long Document Summarization (LDS). Effective summarization not only requires advanced techniques but also robust evaluation metrics capable of capturing summary quality, coherence, and factual accuracy. We analyze 113 peer-reviewed studies from last two years, selected through comprehensive searches in SCOPUS, Web of Science, and PubMed, following PRISMA 2020 guidelines. We focus on LDS methods and the metrics used to evaluate them. Results indicate a rising adoption of hybrid models combining extractive and abstractive strategies, frequently powered by deep learning and optimization. Concurrently, evaluation practices have shifted from traditional overlap-based metrics (e.g., ROUGE) toward semantic measures such as BERTScore and MoverScore. However, these metrics still face challenges related to interpretability, domain adaptation, and computational cost. We advocate for the development of holistic, explainable, and reference-free evaluation frameworks aligned with human judgment to enhance the reliability and applicability of LDS systems across domains.
Author(s)
Open Access
File(s)
Rights
CC BY 4.0: Creative Commons Attribution
Additional link
Language
English