Awesome Open Source

Programming Languages

Search results for clips vision and language

vision-and-language x

0 search results found

Clip Caption Reward ⭐ 104

PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

Pathology Language and Image Pre-Training (PLIP) is the first vision and language foundation model for Pathology AI. PLIP is a large-scale pre-trained model that can be used to extract visual and language features from pathology images and text description. The model is a fine-tuned version of the original CLIP model.

An end-to-end masked contrastive video-and-language pre-training framework

Cross Modal Adapter ⭐ 24

[arXiv] Cross-Modal Adapter for Text-Video Retrieval

[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources

Xmodal Ctx ⭐ 18

Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

Open Fashion Clip ⭐ 8

This is the official repository for the paper "OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data". ICIAP 2023

1-0 of 0 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.