Awesome Open Source

Programming Languages

Search results for deep learning vision and language

deep-learning x

vision-and-language x

25 search results found

Lavis ⭐ 7,917

LAVIS - A One-stop Library for Language-Vision Intelligence

Dl Nlp Readings ⭐ 847

My Reading Lists of Deep Learning and Natural Language Processing

Alphaclip ⭐ 273

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Awesome Computer Vision ⭐ 186

Awesome Resources for Advanced Computer Vision Topics

Pseudo Q ⭐ 116

[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Clip_playground ⭐ 80

An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Discrete Continuous Vln ⭐ 60

Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Robo Vln ⭐ 56

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Universal Language Conditioned Policies

Eccv Caption ⭐ 46

Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)

Multimodal ⭐ 45

A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"

Sugar Crepe ⭐ 40

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

Stanford Cs231n Assignments 2020 ⭐ 32

This repository contains my solutions to the assignments for Stanford's CS231n "Convolutional Neural Networks for Visual Recognition" (Spring 2020).

Clevr Dialog ⭐ 28

Repository to generate CLEVR-Dialog: A diagnostic dataset for Visual Dialog

Lang2seg ⭐ 25

Referring Expression Object Segmentation with Caption-Aware Consistency, BMVC 2019

[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data

Vote2cap Detr ⭐ 22

Code release for ''End-to-End 3D Dense Captioning with Vote2Cap-DETR'' (CVPR2023)

[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources

The good practice in the VQA system such as pos-tag attention, structed triplet learning and triplet attention is very general and can be inserted into almost any visual and language task

Partglot ⭐ 12

Official Implementation of PartGlot (CVPR 2022 Oral)

Official Code of CVPR'23 Paper "VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision"

Spatial Reasoning ⭐ 6

Grounding Language Models for Compositional and Spatial Reasoning

Refcontrast ⭐ 5

Understanding Synonymous Referring Expressions via Contrastive Features

Related Searches

Python Deep Learning (22,497)

Jupyter Notebook Deep Learning (10,328)

Deep Learning Pytorch (7,260)

Deep Learning Tensorflow (5,868)

Deep Learning Neural Network (5,801)

Deep Learning Computer Vision (4,313)

Deep Learning Neural (3,734)

Network Deep Learning (3,532)

Deep Learning Keras (3,258)

Deep Learning Artificial Intelligence (2,898)

1-25 of 25 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.