Awesome Open Source

Programming Languages

Search results for python vision language

vision-language x

53 search results found

Groundingdino ⭐ 4,165

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Marqo ⭐ 3,893

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Chinese Clip ⭐ 2,816

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

One Peace ⭐ 714

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Video Chatgpt ⭐ 590

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Daclip Uir ⭐ 441

PyTorch code for "Controlling Vision-Language Models for Universal Image Restoration", ICLR 2024.

Empowers LLMs with the ability to see and draw.

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Kaleido Bert ⭐ 207

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain.

[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"

Open Groundingdino ⭐ 135

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.

Vln Bevbert ⭐ 130

[ICCV 2023} Official repo of "BEVBert: Multimodal Map Pre-training for Language-guided Navigation"

Visual Chinese Llama Alpaca ⭐ 129

多模态中文LLaMA&Alpaca大语言模型（VisualCLA）

Vse_infty ⭐ 110

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Vision Language Models Are Bows ⭐ 95

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"

Vip Llava ⭐ 81

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Vision Language Transformer ⭐ 76

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

S2 Transformer ⭐ 70

[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”

Clip2protect ⭐ 66

[CVPR 2023] Official repository of paper titled "CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search".

[ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot Temporal Action Detection via Vision-Language Prompting "

A detection/segmentation dataset with class names characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

Hierarchical Universal Language Conditioned Policies

Mix Generation ⭐ 51

MixGen: A New Multi-Modal Data Augmentation

PyTorch implementation of MCM (Delving into out-of-distribution detection with vision-language representations), NeurIPS 2022

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022

[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”

Bagformer ⭐ 41

PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Contraclip ⭐ 33

Authors official PyTorch implementation of the "ContraCLIP: Interpretable GAN generation driven by pairs of contrasting sentences".

Active_vln ⭐ 32

The repository of ECCV 2020 paper `Active Visual Information Gathering for Vision-Language Navigation`

[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data

Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".

Multimodal Meta Learn ⭐ 19

Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).

Openfusion ⭐ 16

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Pos Subspaces ⭐ 15

[NeurIPS'23] Parts of Speech–Grounded Subspaces in Vision-Language Models

Decembert ⭐ 15

Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)

Debias Vision Lang ⭐ 14

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning [AACL 2022]

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

[NeurIPS 2023] Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)

Shot2story ⭐ 11

A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.

Ntu 2022fall Dlcv ⭐ 11

Deep Learning for Computer Vision 深度學習於電腦視覺 by Frank Wang 王鈺強

Autoregressive_inference ⭐ 10

Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)

Image Captioning ⭐ 10

Image captioning using python and BLIP

Managertower ⭐ 9

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

DramaQA Starter Code (2021)

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation

Rtic Gcn Pytorch ⭐ 5

Official PyTorch Implementation of RITC

Vision Language Examples ⭐ 5

Vision-lanugage model example code.

Related Searches

Python Django (28,897)

Python Deep Learning (22,250)

Python Machine Learning (20,195)

Python Flask (17,643)

Python Pytorch (16,249)

Python Dataset (14,962)

Python Docker (14,113)

Python Tensorflow (13,992)

Python Command Line (13,351)

Python Jupyter Notebook (12,976)

1-53 of 53 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.