'DL·ML' 카테고리의 글 목록 (6 Page)

Abstracttemporal dimension에 multihead pooling attention을 추가한 ViTcomputational complexity 감소temporal dimension을 더 aware하는 ViT Motivation일반적으로 CNN에서 발전된 multiscale feature를 분석하는 방식과 ViT를 연결하는 multiscale feature hierarchies를 가진 trasnformer model을 만든다. Fig. 1을 보면, 일반적인 ViT와 다르게 MViT는 channel-resolution 'scale' stage가 존재한다. hierarchical하게 존재하는 stages에서, channel은 증가하고 spatial resolution은 감소한다. 결과적으..

Abstract ICCV 2023 VQA compositional VQA를 풀기 위해 modular architecture 사용 API와 Codex를 활용하여 Python code를 output하는 framework Motivation 특히 compositional한 문제를 풀 때에는 modular structure가 필요하게 되는 경우가 많다. 예컨대, Fig. 1의 첫 번째 query에 대한 답을 내기 위해서는 1) children과 muffin을 찾고, 2) 개수를 센 다음, 3) 'fair'하게 나눠야 한다. 이는 end-to-end approach로는 compositional reasoning하기 어려워 해결하기 어렵다. 또한 end-to-end appraoch의 경우 interpretabili..

Motivation U-Net은 biomedical image segmentation task에 적용하기 위해 처음 개발된 model이다. MLP를 사용할 경우 연산량이 너무 많고, CNN을 사용하여 resolution을 줄일 경우 feature extraction은 잘 되지만, high resolution에서의 segmentation task에는 약해진다는 문제점을 해결하기 위해 제안되었다. 이를 해결하기 위해 channel을 늘리면서 CNN을 적용하고 channel을 다시 줄여 feature extraction을 한다. 중요한 점은 contracting path와 expanding path를 symmetric하게 만들어서 hierarchy 상에서 같은 resolution이 대응되게 한다는 점이다. 이..

Introduction VAE에 대해서는 잘 설명하고 있는 좋은 글들이 많으니 개념에 대해서는 생략하겠다. 여기서 encoder는 $q_φ(z|x)$, decoder는 $p_θ(x|z)$로 표기한다. MLE MLE의 Motivation Learning의 정의 상, observed variable $\mathbb x$가 true distribution $p^*(\mathbb x)$를 따른다고 할 때, 이 distribution을 θ로 parametrized되는 함수 $p_θ(\mathbb x)$로 approximate하는 것이다.[1] 즉, $$p_θ(\mathbb x) \approx p^*(\mathbb x)$$ VAE의 MLE Derivation MLE는 다음과 같이 계산된다. $$\arg \max_θ..

Abstract ICCV 2023 3D human pose estimation in monocular video GLA-GCN 제안, graph representation으로 joint의 spatiotemporal structure model global representation과 local representation을 모두 활용하여 3D pose estimation https://github.com/bruceyo/GLA-GCN Prerequisite ST-GCN[2] (https://jordano-jackson.tistory.com/137 참조) AGCN[3] (https://jordano-jackson.tistory.com/138 참조) Motivation 기존의 방법론은 크게 TCN(Tempora..

Abstract CVPR 2019 skeleton-based action recognition을 위한 GCN based method 2s-AGCN(two-stream adaptive GCN) 제안 https://github.com/lshiwjx/2s-AGCN Motivation ST-GCN에서 처음 GCN을 이용해 skeleton-based action recognition에 활용했다. 하지만 여기에는 세 가지 문제가 있었다: skeleton graph가 heuristic하게 predefine되어 human body의 physical structure만 반영한다. (예컨대, "reading"이나 "clapping"에서는 두 손 간의 상호작용이 중요한데, 이는 joint 상에서 멀리 위치하여 depende..

Abstract AAAI 2018 human body skeleton sequence을 이용한 human action recognition GCN을 dynamic skeleton modeling에 적용한 첫 번째 시도 skeleton model에 맞게 GCN을 design한 ST-GCN(Spatial-Temporal Graph Convolutional Network) 제안 https://github.com/yysijie/st-gcn Motivation Fig. 1에서 볼 수 있듯이 skeleton sequence에서 GCN을 이용한다. edge는 두 type으로, joint의 natural connectivity를 반영하는 spatial edges가 있고, 같은 joint의 consecutive time..

Abstract ICLR 2024 spotlight open-vocabulary dense prediction task open-vocabulary object detection, semantic segmentation, panoptic segmentation CLIP ViT의 문제 개선 추가 데이터 없이 local image region까지 aware하는 CLIPSelf 제안 https://github.com/wusize/CLIPSelf Motivation open-vocabulary approach에서는 CLIP based model을 사용한다. Fig. 1을 보면, ViT-based CLIP model이 image representation에는 강하지만, dense feature를 이용해서 regi..

Abstract soccer foul detection을 위한 CNN, RNN based approach bounding box position, image, estimated pose를 utilize 2024 Apr 4 update: https://github.com/FangJiale1999/Futurefoul_Soccer the code and dataset are now available Motivation soccer broadcast video로부터 foul prediction을 위한 FutureFoul system을 제안한다. Dataset soccer foul dataset을 구성했다. Video Dataset SoccerNet-v3 dataset에서 video를 가져와 사용하였다. Sele..

티스토리툴바