Awesome Open Source

Programming Languages

Search results for multimodal clips

0 search results found

Marqo ⭐ 3,893

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Mmpretrain ⭐ 3,177

OpenMMLab Pre-training Toolbox and Benchmark

Chinese Clip ⭐ 2,816

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Clip Retrieval ⭐ 1,949

Easily compute clip embeddings and build a clip retrieval system with them

Hcaptcha Challenger ⭐ 1,247

🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Clip4clip ⭐ 663

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Clip.cpp ⭐ 335

CLIP inference in plain C/C++ with no extra dependencies

Awesome Foundation And Multimodal Models ⭐ 223

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]

Fashion Clip ⭐ 189

FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.

Paddlemix ⭐ 172

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Vlmevalkit ⭐ 137

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks

Language Models Can See: Plugging Visual Controls in Text Generation

Vision Language Models Are Bows ⭐ 95

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

Clip4cir ⭐ 92

[ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"

Vip Llava ⭐ 81

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Clip_surgery ⭐ 55

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

An end-to-end masked contrastive video-and-language pre-training framework

Usearch Images ⭐ 17

Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"

Official implementation of the paper "FLIP: Cross-domain Face Anti-spoofing with Language Guidance". (ICCV 2023)

Ocrautoscore ⭐ 14

OCR自动化阅卷项目

The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"

Does Clip Know My Face ⭐ 5

Source Code for the Paper "Does CLIP Know my Face?" (Demo: https://huggingface.co/spaces/AIML-TUDA/does-clip-

A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant

Cbvs Uniclip ⭐ 5

A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios

1-0 of 0 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.