human-object interactions (HOIs)
multimodal learning and analytics
text-image analysis
unified methodology
visual-linguistic interaction