Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for computer vision multimodal
computer-vision
x
multimodal
x
30 search results found
Courses
⭐
4,018
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
Rerun
⭐
3,895
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Chinese Clip
⭐
2,816
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Torchscale
⭐
2,804
Foundation Architecture for (M)LLMs
Metatransformer
⭐
1,325
Meta-Transformer for Unified Multimodal Learning
Autodistill
⭐
1,286
Images to inference with no labeling (use foundation models to train supervised models)
Hcaptcha Challenger
⭐
1,247
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
Transformer In Vision
⭐
1,220
Recent Transformer-based CV and related works.
Sdt
⭐
724
This repository is the official implementation of Disentangling Writer and Character Styles for Handwriting Generation (CVPR23).
Papermage
⭐
494
library supporting NLP and CV research on scientific papers
Advancedliteratemachinery
⭐
464
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
Blended Latent Diffusion
⭐
458
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
Pykale
⭐
415
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
Llavavision
⭐
409
A simple "Be My Eyes" web app with a llama.cpp/llava backend
Oasis
⭐
260
Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)
Awesome Foundation And Multimodal Models
⭐
223
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]
Cav Mae
⭐
201
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
Mmmu
⭐
167
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Vlmevalkit
⭐
137
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks
Unitr
⭐
123
[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"
Qd Detr
⭐
113
Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
Poda
⭐
86
[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"
Rt X
⭐
77
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
Minklocmultimodal
⭐
73
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
Alpaca Glassoff
⭐
72
Image Acceptable Alpaca (Image-Text Chat AI).
Mkg_analogy
⭐
56
Code and datasets for the ICLR2023 paper "Multimodal Analogical Reasoning over Knowledge Graphs."
Kosmos X
⭐
53
The Next Generation Multi-Modality Superintelligence
Glami 1m
⭐
47
The largest multilingual image-text classification dataset. It contains fashion products.
Tokencompose
⭐
41
(arXiv) 🧩 TokenCompose: Grounding Diffusion with Token-level Supervision
Som
⭐
32
Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️
Vlog_action_reason
⭐
17
Identifying reasons for human actions in lifestyle vlogs.
Whereisai
⭐
10
AI company, product, and tool collection.
Multimodal Datasets
⭐
9
Multimodal datasets.
Flair 2
⭐
7
Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.
Kosmosg
⭐
7
My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"
Spatial Reasoning
⭐
6
Grounding Language Models for Compositional and Spatial Reasoning
Icdar Emoreccom
⭐
6
Spec
⭐
6
The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
Robopilot
⭐
5
Live Dense Multi Modal 3D Mapping — A system designed for real time 3D reconstruction using a fusion of multiple depth and camera sensors simultaneously at real time speed — A Generic Framework for Distributed Deep Neural Networks over the Cloud, the Edge, and End Devices for Computer Vision Applications.
Clipq
⭐
5
A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant
Cbvs Uniclip
⭐
5
A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
Related Searches
Python Computer Vision (5,055)
Deep Learning Computer Vision (3,558)
Machine Learning Computer Vision (2,342)
Jupyter Notebook Computer Vision (1,834)
Computer Vision Opencv (1,326)
Pytorch Computer Vision (1,093)
Convolutional Neural Networks Computer Vision (1,072)
Artificial Intelligence Computer Vision (952)
Tensorflow Computer Vision (905)
C Plus Plus Computer Vision (809)
1-30 of 30 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.