
TemporalVQA
·
DL·ML/Paper
https://arxiv.org/abs/2501.10674 Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!Multimodal Large Language Models (MLLMs) have achieved significant advancements in tasks like Visual Question Answering (VQA) by leveraging foundational Large Language Models (LLMs). However, their abilities in specific areas such as temporal understandingarxiv.org AbstractTemp..