ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering

Tran, Duong T.; Tran, Trung-Kien; Hauswirth, Manfred; Phuoc, Danh Le

doi:10.48550/arXiv.2507.16403

July 28, 2025

Paper (Preprint, Research Paper, Review Paper, White Paper, etc.)

Abstract

In this paper, we propose a new dataset, ReasonVQA, for the Visual Question Answering (VQA) task. Our dataset is automatically integrated with structured encyclopedic knowledge and constructed using a low-cost framework, which is capable of generating complex, multi-hop questions. We evaluated state-of-the-art VQA models on Rea-sonVQA, and the empirical results demonstrate that Rea-sonVQA poses significant challenges to these models, highlighting its potential for benchmarking and advancing the field of VQA. Additionally, our dataset can be easily scaled with respect to input images; the current version surpasses the largest existing datasets requiring external knowledge by more than an order of magnitude.

Author(s)

Tran, Duong T.

Tran, Trung-Kien

Hauswirth, Manfred

Technische Universität Berlin

Phuoc, Danh Le

Technische Universität Berlin

Options

ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering