'DL·ML' 카테고리의 글 목록 (8 Page)

Abstract dd Motivation object detection in aerial images(ODAI)는 real-world application에서 많이 사용됨. 하지만, object 크기의 nonuniformity, arbitrary orientation 등은 task를 어렵게 함.(Figure 1 참조) 여러 문제 중 orientation에 대한 문제가 주요 어려움인데, 그 이유는 다음과 같다: rotation-invariant한 feature representation을 만들어야 함. -> 그러나 현재 architecture로는 어려움이 있음. iDeA; 2021년 기준이라 현재에도 계속되는 문제인지는 확인해 보아야 함. horizontal bounding box(HBB)는 oriente..

Overall Architecture Input def forward(self, samples: NestedTensor, targets: List = None, **kw): """The forward expects a NestedTensor, which consists of: - samples.tensor: batched images, of shape [batch_size x 3 x H x W] - samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels It returns a dict with the following elements: - "pred_logits": the classification l..

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection We present DINO (\textbf{D}ETR with \textbf{I}mproved de\textbf{N}oising anch\textbf{O}r boxes), a state-of-the-art end-to-end object detector. % in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrasti arxiv.org DINO: DETR with Improved DeNoising Anchor Boxes for..

End-to-End Object Detection with Transformers We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor gene arxiv.org End-to-End Object Detection with Transformers Abstract object detection을 direct set p..

https://arxiv.org/abs/2112.03857 Grounded Language-Image Pre-training This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two ben arxiv.org Abstract GLIP model 제안 object detection task와 phrase groundin..

https://arxiv.org/abs/2104.14294 Emerging Properties in Self-Supervised Vision TransformersIn this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particarxiv.orgAbstractself-supervised learning이 ViT에 새로운..

Distributed package doesn't have NCCL built in Traceback (most recent call last): File "example_chat_completion.py", line 104, in fire.Fire(main) File "/home/csjihwanh/Desktop/projects/sggVQA/llama/env/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/csjihwanh/Desktop/projects/sggVQA/llama/env/lib/p..

* 원 논문에서의 Figure 번호는 괄호 안에 표기하였습니다. 1. Motivation ChatGPT나 GPT4 같은 모델을 보면, 복잡한 task에서는 좋은 성능을 보이지만 오히려 3자리 수 곱셈과 같은 간단한 task에서 fail하는 경우가 많다. 이 논문에서는 multi-hop reasoning을 통해 정답을 도출해야 하는 task를 compositional problem이라고 명명하고, 이를 통해 Transformer architecture가 가지고 있는 구조적인 한계점을 살펴본다. 이를 위해서 두 개의 hypothesis를 제시한다. 1. Transformers는 multi-step compositional reasoning을 linearized path matching으로 reduce해서 해..

개요 TPU(Tensor Processing Unit)은 Google에서 만든 ASICs(Application-Specific Integrated Circuit)이다. machine learning workload를 가속화하기 위해 사용된다. 보통 수천 개의 TPU는 함께 묶여 TPU Pod이라는 특별한 network를 구성한다. 예컨대, TPU v4에서 single v4 pod는 4096개의 TPU chips를 포함하고 있다. HBM 메모리를 활용하여 memory bandwidth가 큰 덕분에 batch size와 model이 큰 경우에도 효율적인 학습을 가능하게 한다. TPU는 2016년 Google I/O에서 처음 소개되었으며, Tensorflow를 위해 디자인되었다. 버전 업그레이드도 계속 이루어..

티스토리툴바