Mehler, AlexanderAlexanderMehlerLücking, AndyAndyLückingDong, TiansiTiansiDong2023-10-172023-10-172023https://publica.fraunhofer.de/handle/publica/45178910.3389/frai.2023.12349202-s2.0-85164820430enhuman-object interactions (HOIs)multimodal learning and analyticstext-image analysisunified methodologyvisual-linguistic interactionMultimodal communication and multimodal computingeditorial