Awesome Open Source

Programming Languages

Search results for vision language pretraining

vision-language-pretraining x

24 search results found

Lavis ⭐ 7,917

LAVIS - A One-stop Library for Language-Vision Intelligence

Video Llama ⭐ 1,826

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Video Chatgpt ⭐ 590

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Awesome Japanese Llm ⭐ 585

日本語LLMまとめ - Overview of Japanese LLMs

Paddlemix ⭐ 172

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》

Recognize Any Regions ⭐ 92

Recognize Any Regions

Multimodality Representation Learning ⭐ 51

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.

Protoclip ⭐ 38

📍 Official pytorch implementation of ProtoCLIP in paper Prototypical Contrastive Language Image Pretraining (IEEE TNNLS)

PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"

Continual Clip ⭐ 30

Official repository for "CLIP model is an Efficient Continual Learner".

Svl_adapter ⭐ 17

SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models

Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023]

Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Managertower ⭐ 9

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

VTC: Improving Video-Text Retrieval with User Comments

Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

A codebase for flexible and efficient Image Text Representation Alignment

Adaptation_robustness ⭐ 5

Evaluate robustness of adaptation methods on large vision-language models

Related Searches

Python Vision Language Pretraining (8)

Multimodal Deep Learning Vision Language Pretraining (3)

1-24 of 24 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.