This Tistory blog is no longer being maintained.
·
Introduction
이 Tistory 블로그는 더 이상 운영하지 않습니다. 다음 링크에서 블로그 활동을 지속합니다:This Tistory blog is no longer being maintained. You can continue following my blog at:此 Tistory 博客已不再运营。请在以下链接继续关注我的博客: blog.csjihwanh.comcsjihwanh.com
Tips for LaTeX Users
·
etc
Hyperref extensionoverall instructions can be found on this webpage: https://ctan.org/pkg/hyperref CTAN: Package hyperrefhyperref – Extensive support for hypertext in LaTeX Package Linksctan.org you can also download a manual from the website.    How can I turn my references into hyperlinks simply add \usepackage{hyperref} at the end of your `\usepackage` section.  then it gonna be like this:whi..
LITA (ECCV 2024)
·
DL·ML/Paper
Abstracthttps://arxiv.org/pdf/2403.19046 Recent works often overlook the importance of temporal localizationThe key aspects that limit the temporal localization abilities are:time representationarchitecturedataHence, new architecture, LITA, is proposed in this paper which is capable of:leveraging time tokens to better represent time in videos handling SlowFast tokens to capture temporal informat..
RVOS Datasets
·
DL·ML/Study
VOS 중 하나인 RVOS(Referring Video Object Segmentation) task와, 이 task를 다루는 dataset들에 대해 다룬다.  segmentation task에 대한 전반적인 이해는 segmentation task들의 종류를 참조하길 바란다.  Ref-DAVISRVOS task를 처음으로 정의한 paper이다.      Refer-YouTube-VOS (URVOS)ECCV 2020 paper이고, RVOS task dataset의 크기를 키운 paper이다.   Dataset27,000+ referring expressions for 3,900 videosend-to-end architecture 제안   → 기존 DAVIS-2017 dataset은 개수가 작아서 end..
Video Token Merging(VTM) (NeurIPS 2024, long video)
·
DL·ML/Paper
https://arxiv.org/abs/2410.23782  Video Token Merging for Long-form Video UnderstandingAs the scale of data and models for video understanding rapidly expand, handling long-form video input in transformer-based models presents a practical challenge. Rather than resorting to input sampling or token dropping, which may result in information loarxiv.org Abstractlong video의 token merging에 대한 papervi..
Mac 사용자를 위한 팁
·
etc
Keyboard / 키보드 관련Disable Keyboard / 키보드 비활성화 주로 청소 등의 용도를 위해 키보드를 잠굴 필요가 있는 경우가 있다.  다음 reddit을 참조했는데, https://apple.stackexchange.com/questions/141778/how-can-i-temporarily-disable-my-mbps-keyboard/415839#415839 How can I temporarily disable my MBP's keyboard?I sometimes bring my external keyboard with me on the go and use it with my MBP. I would like to lay it over the macbook's current keyboa..
TemporalVQA
·
DL·ML/Paper
https://arxiv.org/abs/2501.10674 Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!Multimodal Large Language Models (MLLMs) have achieved significant advancements in tasks like Visual Question Answering (VQA) by leveraging foundational Large Language Models (LLMs). However, their abilities in specific areas such as temporal understandingarxiv.org   AbstractTemp..
NExT-Chat (ICML 2024, MLLM for OD and Seg)
·
DL·ML/Paper
https://icml.cc/virtual/2024/poster/33745 ICML Poster NExT-Chat: An LMM for Chat, Detection and SegmentationAbstract: The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs). In order to enhance visual comprehension, recent studies have equipped LMMs wiicml.cc Abstractpix2seq에 영감을 받은 pi..
STVG (VidSTG, CVPR 2020)
·
DL·ML/Paper
https://arxiv.org/abs/2001.06891  Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form SentencesIn this paper, we consider a novel task, Spatio-Temporal Video Grounding for Multi-Form Sentences (STVG). Given an untrimmed video and a declarative/interrogative sentence depicting an object, STVG aims to localize the spatio-temporal tube of the queried oarxiv.org AbstractSTVG task 제안V..