Jordano

Abstract dd Motivation object detection in aerial images(ODAI)는 real-world application에서 많이 사용됨. 하지만, object 크기의 nonuniformity, arbitrary orientation 등은 task를 어렵게 함.(Figure 1 참조) 여러 문제 중 orientation에 대한 문제가 주요 어려움인데, 그 이유는 다음과 같다: rotation-invariant한 feature representation을 만들어야 함. -> 그러나 현재 architecture로는 어려움이 있음. iDeA; 2021년 기준이라 현재에도 계속되는 문제인지는 확인해 보아야 함. horizontal bounding box(HBB)는 oriente..

Overall Architecture Input def forward(self, samples: NestedTensor, targets: List = None, **kw): """The forward expects a NestedTensor, which consists of: - samples.tensor: batched images, of shape [batch_size x 3 x H x W] - samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels It returns a dict with the following elements: - "pred_logits": the classification l..

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection We present DINO (\textbf{D}ETR with \textbf{I}mproved de\textbf{N}oising anch\textbf{O}r boxes), a state-of-the-art end-to-end object detector. % in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrasti arxiv.org DINO: DETR with Improved DeNoising Anchor Boxes for..

End-to-End Object Detection with Transformers We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor gene arxiv.org End-to-End Object Detection with Transformers Abstract object detection을 direct set p..

https://arxiv.org/abs/2112.03857 Grounded Language-Image Pre-training This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two ben arxiv.org Abstract GLIP model 제안 object detection task와 phrase groundin..

https://arxiv.org/abs/2104.14294 Emerging Properties in Self-Supervised Vision TransformersIn this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particarxiv.orgAbstractself-supervised learning이 ViT에 새로운..

티스토리툴바