Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for multimodal clips
clips
x
multimodal
x
0 search results found
Marqo
⭐
3,893
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Mmpretrain
⭐
3,177
OpenMMLab Pre-training Toolbox and Benchmark
Chinese Clip
⭐
2,816
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Clip Retrieval
⭐
1,949
Easily compute clip embeddings and build a clip retrieval system with them
Hcaptcha Challenger
⭐
1,247
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
Uform
⭐
729
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Clip4clip
⭐
663
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Clip.cpp
⭐
335
CLIP inference in plain C/C++ with no extra dependencies
Awesome Foundation And Multimodal Models
⭐
223
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]
Fashion Clip
⭐
189
FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
Paddlemix
⭐
172
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Vlmevalkit
⭐
137
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks
Magic
⭐
124
Language Models Can See: Plugging Visual Controls in Text Generation
Vision Language Models Are Bows
⭐
95
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
Clip4cir
⭐
92
[ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features
Poda
⭐
86
[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"
Vip Llava
⭐
81
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Clip_surgery
⭐
55
CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
Mac
⭐
24
An end-to-end masked contrastive video-and-language pre-training framework
Usearch Images
⭐
17
Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"
Flip
⭐
17
Official implementation of the paper "FLIP: Cross-domain Face Anti-spoofing with Language Guidance". (ICCV 2023)
Ocrautoscore
⭐
14
OCR自动化阅卷项目
Spec
⭐
6
The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
Does Clip Know My Face
⭐
5
Source Code for the Paper "Does CLIP Know my Face?" (Demo: https://huggingface.co/spaces/AIML-TUDA/does-clip-
Clipq
⭐
5
A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant
Cbvs Uniclip
⭐
5
A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
1-0 of 0 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.