On the Transferability of Adversarial Attacks from Convolutional Neural Networks to Variants of ChatGPT4

Bunzel, Niklas

doi:10.1109/DSN-W65791.2025.00072

June 23, 2025

Conference Paper

Abstract

This research evaluates the ability of adversarial attacks, primarily designed for CNN-based classifiers, to target the multimodal image captioning tasks executed by large vision language models, such as ChatGPT4. The study included different versions of ChatGPT4, several attacks, with a particular emphasis on the Projected Gradient Descent (PGD) attack, considering various parameters, surrogate models, and datasets. Initial but limited experiments support the hypothesis that PGD attacks are partly transferable to ChatGPT4. Subsequently, results demonstrated that PGD attacks could be adaptively transferred to disrupt the normal functioning of ChatGPT. On the other hand, other adversarial attack strategies showed a limited ability to compromise ChatGPT. These findings provide insights into the security vulnerabilities of emerging neural network architectures used for generative AI. Moreover, they underscore the possibility of cost-effectively crafting adversarial examples against novel architectures, necessitating the development of robust defense mechanisms for large vision language models in practical applications.

Author(s)

Bunzel, Niklas

Fraunhofer-Institut für Sichere Informationstechnologie SIT

Mainwork

55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, DSN-W 2025. Proceedings

Conference

International Conference on Dependable Systems and Networks 2025

Workshop on Dependable and Secure Machine Learning 2025

Options

On the Transferability of Adversarial Attacks from Convolutional Neural Networks to Variants of ChatGPT4