Awesome Open Source

Programming Languages

Search results for computer vision multimodal deep learning

computer-vision x

multimodal-deep-learning x

22 search results found

Awesome Grounding ⭐ 689

awesome grounding: A curated list of research papers in visual grounding

Advancedliteratemachinery ⭐ 464

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Blended Latent Diffusion ⭐ 458

Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]

Awesome Parameter Efficient Transfer Learning ⭐ 288

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

Eccv2022 Papers With Code Demo ⭐ 207

收集 ECCV 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！

Cav Mae ⭐ 201

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Pseudo Q ⭐ 116

[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Awesome 3d Vision And Language ⭐ 62

A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.

3dcompat V2 ⭐ 57

3DCoMPaT++: An improved large-scale 3D vision dataset for compositional recognition

Visual Spatial Reasoning ⭐ 38

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.

Referit3d ⭐ 32

Code accompanying our ECCV-2020 paper on 3D Neural Listeners.

CFCNet for depth completion, NeurIPS 2019.

Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering (NeurIPS 2022)

Official Code Release for Diagnosing and Rectifying Vision Models using Language

Entity-Driven Image Search over Multimodal Web Content (EMNLP 2023)

Whos Waldo ⭐ 13

Who's Waldo? Linking People Across Text and Images. ICCV 2021.

Vision_audio_and_multimodal_projects ⭐ 13

This repository includes all computer vision, audio, document AI, and multimodal projects.

Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.

Stacked Attention Networks For Visual Question Answering ⭐ 7

Implementation of the paper "Stacked Attention Networks for Image Question Answering" in Tensorflow

Deepfake Detection Challenge Dfad2023 ⭐ 6

Implementation of solution for the Media Analytics Challenge.

A Multi-modal Framework for Sentimental Analysis of Meme

Related Searches

Python Computer Vision (4,369)

Deep Learning Computer Vision (3,558)

Machine Learning Computer Vision (2,342)

Jupyter Notebook Computer Vision (2,183)

Computer Vision Opencv (1,326)

Pytorch Computer Vision (1,230)

Convolutional Neural Networks Computer Vision (1,072)

Artificial Intelligence Computer Vision (949)

Computer Vision Object Detection (938)

Tensorflow Computer Vision (905)

1-22 of 22 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.