Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for computer vision multimodal deep learning
computer-vision
x
multimodal-deep-learning
x
22 search results found
Awesome Grounding
⭐
689
awesome grounding: A curated list of research papers in visual grounding
Advancedliteratemachinery
⭐
464
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
Blended Latent Diffusion
⭐
458
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
Awesome Parameter Efficient Transfer Learning
⭐
288
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
Eccv2022 Papers With Code Demo
⭐
207
收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
Cav Mae
⭐
201
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
Mmmu
⭐
167
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Pseudo Q
⭐
116
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Awesome 3d Vision And Language
⭐
62
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.
3dcompat V2
⭐
57
3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition
Visual Spatial Reasoning
⭐
38
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Referit3d
⭐
32
Code accompanying our ECCV-2020 paper on 3D Neural Listeners.
Cfcnet
⭐
25
CFCNet for depth completion, NeurIPS 2019.
Revive
⭐
16
Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering (NeurIPS 2022)
Drml
⭐
16
Official Code Release for Diagnosing and Rectifying Vision Models using Language
Edis
⭐
15
Entity-Driven Image Search over Multimodal Web Content (EMNLP 2023)
Whos Waldo
⭐
13
Who's Waldo? Linking People Across Text and Images. ICCV 2021.
Vision_audio_and_multimodal_projects
⭐
13
This repository includes all computer vision, audio, document AI, and multimodal projects.
Flair 2
⭐
7
Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.
Stacked Attention Networks For Visual Question Answering
⭐
7
Implementation of the paper "Stacked Attention Networks for Image Question Answering" in Tensorflow
Deepfake Detection Challenge Dfad2023
⭐
6
Implementation of solution for the Media Analytics Challenge.
Memsem
⭐
5
A Multi-modal Framework for Sentimental Analysis of Meme
Related Searches
Python Computer Vision (4,369)
Deep Learning Computer Vision (3,558)
Machine Learning Computer Vision (2,342)
Jupyter Notebook Computer Vision (2,183)
Computer Vision Opencv (1,326)
Pytorch Computer Vision (1,230)
Convolutional Neural Networks Computer Vision (1,072)
Artificial Intelligence Computer Vision (949)
Computer Vision Object Detection (938)
Tensorflow Computer Vision (905)
1-22 of 22 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.