Synthetic data sets for person Re-Identification: A critical analysis

CC BY 4.0Delussu, RitaRitaDelussuPutzu, LorenzoLorenzoPutzuBoutros, FadiFadiBoutrosBisogni, CarmenCarmenBisogniDamer, NaserNaserDamerFumera, GiorgioGiorgioFumera2025-11-102025-11-102025https://publica.fraunhofer.de/handle/publica/498887https://doi.org/10.24406/publica-613210.1016/j.imavis.2025.10575310.24406/publica-61322-s2.0-105019959174Supervised methods for person Re-Identification (Re-Id) need extensive manual annotation, limiting data set size and the resulting generalisation capability to unseen target data. Unsupervised methods avoid manual annotation but typically attain a lower performance. Synthetic training data can mitigate these issues, as they allow generating large data sets encompassing more representative variations in visual factors such as background scenes and pedestrian appearance without requiring manual annotation and without privacy issues arising from recent regulations. Existing synthetic data sets vary in size, diversity of human models, camera views, backgrounds, as well as photorealism. It is, however, not yet clear how all such factors affect Re-Id performance. We conduct a comprehensive and systematic analysis and experimental evaluation of existing synthetic data sets, to understand how the main factors characterising them affect the generalisation capability to real data. Our results provide useful guidelines towards developing effective synthetic data sets for Re-Id.entrueGeneralisation capabilityPerson Re-IdentificationPhotorealismSynthetic training dataVisual variationsSynthetic data sets for person Re-Identification: A critical analysisjournal article