Awesome Open Source

Programming Languages

Search results for vision and language

vision-and-language x

143 search results found

Aerial Vision And Dialog Navigation ⭐ 20

Codebase of the ACL 2023 (Findings) Paper "Aerial Vision-and-Dialog Navigation"

Pytorch_ldast ⭐ 19

A PyTorch implementation of LDAST

Xmodal Ctx ⭐ 18

Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

Cyclical Visual Captioning ⭐ 18

PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Lxmert Advtrain ⭐ 17

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT adversarial training part

The good practice in the VQA system such as pos-tag attention, structed triplet learning and triplet attention is very general and can be inserted into almost any visual and language task

Explore And Match ⭐ 16

Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos

Gst Visdial ⭐ 15

💬 Official PyTorch Implementation for CVPR'23 "The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training"

Hero_video_feature_extractor ⭐ 15

Video Feature Extraction Code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"

Pytorch_tvc ⭐ 14

A PyTorch implementation of TVC

Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"

C3vqg Official ⭐ 14

Code for the paper "C3VQG: Category Consistent Cyclic Visual Question Generation".

Phrasecutdataset ⭐ 14

Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"

Clip Openness ⭐ 13

Code for "Delving into the Openness of CLIP"

Partglot ⭐ 12

Official Implementation of PartGlot (CVPR 2022 Oral)

Gpt Vision Assistant ⭐ 12

A simple implementation of Be My Eyes GPT-4, a vision-LLM model that acts as a personal assistant

Code on Paper [CVPR20]Image Search with Text Feedback by Visiolinguistic Attention Learning

Map2seq_vln ⭐ 11

Code for ORAR Agent for Vision and Language Navigation on Touchdown and map2seq

Prompt Adapter ⭐ 10

Prompt Tuning based Adapter for Vision-Language Model Adaption

[IJCAI 2022] Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds (official pytorch implementation)

Official Code of CVPR'23 Paper "VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision"

Visiondetect ⭐ 8

VisionDetect let you track user face gestures like blink, smile etc.

Foolyourvllms ⭐ 8

Code for paper: Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

Open Fashion Clip ⭐ 8

This is the official repository for the paper "OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data". ICIAP 2023

Ris Learning List ⭐ 8

Related papers about Referring Image Segmentation (RIS)

Model Zoo for Multimedia Applications

Groundvlp ⭐ 7

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Visual dialog agents with pre-trained vision-and-language encoders.

Code for the 2023 ICCV paper "VLSlice: Interactive Vision-and-Language Slice Discovery."

Vision Language Modelling Series ⭐ 7

Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations

Zeroshot Storytelling ⭐ 7

Github repository for Zero Shot Visual Storytelling

[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

Grounded Vision Parser ⭐ 6

semantic parser trained by using videos only instead of labeled logical forms

Spatial Reasoning ⭐ 6

Grounding Language Models for Compositional and Spatial Reasoning

Tensorflow Reproduction of the EMNLP-2018 paper "Temporally Grounding Natural Sentence in Video"

Refcontrast ⭐ 5

Understanding Synonymous Referring Expressions via Contrastive Features

CIZSL++: Creativity Inspired Generative Zero-Shot Learning

INSIDE: Steering Spatial Attention with Non-Imaging Information in CNNs

Vscmr Visual Storytelling With Corss Modal Rules ⭐ 5

Visual Storytelling with Cross-Modal Rules

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory. CVPR 2023.

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023

An implementation of SSI

Related Searches

Python Vision And Language (69)

Pytorch Vision And Language (34)

101-143 of 143 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.