Awesome Open Source

Programming Languages

Search results for python vqa

156 search results found

Interngpt ⭐ 2,976

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Prismer ⭐ 1,245

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Oscar and VinVL

Clipbert ⭐ 649

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

Bottom Up Attention Vqa ⭐ 606

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Vqa.pytorch ⭐ 536

Visual Question Answering in Pytorch

Ban Vqa ⭐ 527

Bilinear attention networks for visual question answering

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

Visual Qa ⭐ 476

[Reimplementation Antol et al 2015] Keras-based LSTM/CNN models for Visual Question Answering

Mac Network ⭐ 445

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

Omninet ⭐ 426

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

Neural module networks

Vision-Language Pre-training for Image Captioning and Question Answering

Multi Modality Arena ⭐ 308

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Vqa2.0 Recent Approachs 2018.pytorch ⭐ 260

A pytroch reimplementation of "Bilinear Attention Network", "Intra- and Inter-modality Attention", "Learning Conditioned Graph Structures", "Learning to count object", "Bottom-up top-down" for Visual Question Answering 2.0

Neural-symbolic visual question answering

Openvqa ⭐ 225

A lightweight, scalable, and general framework for visual question answering research

Neural Vqa Tensorflow ⭐ 221

Visual Question Answering in Tensorflow.

Pytorch Vqa ⭐ 213

Strong baseline for visual question answering

Nscl Pytorch Release ⭐ 209

PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).

Grid Feats Vqa ⭐ 192

Grid features pre-training code for visual question answering

Block.bootstrap.pytorch ⭐ 187

BLOCK (AAAI 2019), with a multimodal fusion library for deep learning models

Mcan Vqa ⭐ 181

Deep Modular Co-Attention Networks for Visual Question Answering

Vqa Mcb ⭐ 179

Vqa Counting ⭐ 162

[ICLR 2018] Learning to Count Objects in Natural Images for Visual Question Answering

Vqa Winner Cvprw 2017 ⭐ 160

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Lrv Instruction ⭐ 160

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Murel.bootstrap.pytorch ⭐ 141

MUREL (CVPR 2019), a multimodal relational reasoning module for VQA

Vqa Keras Visual Question Answering ⭐ 141

Visual Question Answering task written in Keras that answers questions about images

Tgif Qa ⭐ 139

Repository for our CVPR 2017 and IJCV: TGIF-QA

Vlmevalkit ⭐ 137

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks

Vqa Mfb ⭐ 135

Vqa_regat ⭐ 130

Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"

Visual_turing_test Tutorial ⭐ 121

Tutorial for Visual Turing Test (visual question answering, image question answering).

Attention On Attention For Vqa ⭐ 120

Visual Question Answering Project with state of the art single Model performance.

Frozenbilm ⭐ 120

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Vqa Project ⭐ 115

Code for our paper: Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Dense Coattention Network ⭐ 81

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

Slotformer ⭐ 77

Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models

Bvqa_benchmark ⭐ 72

A resource list and performance benchmark for blind video quality assessment (BVQA) models on user-generated content (UGC) datasets. [IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik

Visual_question_answering ⭐ 71

Tensorflow implementation of "Dynamic Memory Networks for Visual and Textual Question Answering"

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Counterfactual Samples Synthesizing for Robust VQA

Using pretrained encoder and language models to generate captions from multimedia inputs.

Slotdiffusion ⭐ 59

Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models

Vqa Mfb.pytorch ⭐ 55

This project is out of date, I don't remember the details inside...

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Zs F Vqa ⭐ 53

Code and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]

Probnmn Clevr ⭐ 52

Code for ICML 2019 paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" [long-oral]

Transformers Vqa ⭐ 50

An implementation that downstreams pre-trained V+L models to VQA tasks. Now support: VisualBERT, LXMERT, and UNITER

Rubi.bootstrap.pytorch ⭐ 47

RUBi : Reducing Unimodal Biases for Visual Question Answering

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

Bottom Up Features ⭐ 44

Bottom-up features extractor implemented in PyTorch.

Awesome Vqa Latest ⭐ 42

Visual Question Answering Paper List.

Ssbaseline ⭐ 41

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Information Maximizing Visual Question Generation

Conditional Batch Norm ⭐ 40

Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"

Convolutional Vqa ⭐ 38

[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

Vqa 2016 Cvprw ⭐ 38

Visual question answering for CVPR16 VQA Challenge.

An implementation of Probabilistic Soft Logic Engine using Python/Gurobi

Vqa Demo Gui ⭐ 36

This repository gives a GUI using PyQt4 for VQA demo using Keras Deep Learning Library. The VQA model is created using Pre-trained VGG-16 Weight for image Features and glove vectors for question Features.

Ask_me_anything ⭐ 36

An easy-to-use app to visualise attentions of various VQA models.

Chinese Vqa ⭐ 33

Chinese Visual Question Answering 中文看图问答

Visual Question Answering ⭐ 32

CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering

Neural State Machine implemented in PyTorch

Perm Optim ⭐ 32

[ICLR 2019] Learning Representations of Sets through Optimized Permutations

Mmgnn_textvqa ⭐ 32

A Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Vqa_task_discovery ⭐ 28

Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

Bottom Up Attention Tf ⭐ 28

Tensorflow implementation of "Bottom-up and Top-down attention for VQA" (TF v. 1.13)

Vctree Visual Question Answering ⭐ 28

Code for the Visual Question Answering (VQA) part of CVPR 2019 oral paper: "Learning to Compose Dynamic Tree Structures for Visual Contexts"

Hcrn Videoqa ⭐ 27

Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)

Miccai19 Medvqa ⭐ 27

AIOZ AI - Overcoming Data Limitation in Medical Visual Question Answering

Activitynet Qa ⭐ 26

An VideoQA dataset based on the videos from ActivityNet

Multimodalexplanations ⭐ 25

Code release for Park et al. Multimodal Multimodal Explanations: Justifying Decisions and Pointing to the Evidence. in CVPR, 2018

Miccai21_mmq ⭐ 23

Multiple Meta-model Quantifying for Medical Visual Question Answering

Figureqa Baseline ⭐ 23

TensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"

Simple Vqa Pylib ⭐ 22

A simple Python library and dataset for VQA

Vqa_keras ⭐ 21

Modular and Simple approach to VQA in Keras

Vqa Text ⭐ 21

Dual Attention Network ⭐ 21

Tensorflow implementation of Dual Attention Network

Reproducibility Report Countvqa ⭐ 21

Visual Question Answering ⭐ 20

📷 ❓ Visual Question Answering Demo and Algorithmia API

Vqa Transfer Externaldata ⭐ 20

Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".

Youmakeup_baseline ⭐ 19

[Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Simplevqa ⭐ 18

A Deep Learning based No-reference Quality Assessment Model for UGC Videos

Iccv19_vqa Cti ⭐ 17

Repo for our ICCV 19 paper: "Compact Trilinear Interaction for Visual Question Answering"

San Vqa Tensorflow ⭐ 17

World Knowledge Based Visual Question Answering

Omnifusion ⭐ 16

OmniFusion — a multimodal model to communicate using text and images

Bottom Up Attention Vqa ⭐ 16

An updated PyTorch implementation of hengyuan-hu's version for 'Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering'

The good practice in the VQA system such as pos-tag attention, structed triplet learning and triplet attention is very general and can be inserted into almost any visual and language task

Coarse-to-Fine Reasoning for Visual Question Answering

Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering (NeurIPS 2022)

Vqa Mcb Model Tensorflow ⭐ 15

Tensorflow implementation of Multimodal Compact Bilinear Pooling for VQA

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

Viewport-based CNN for visual quality assessment on 360° video

Related Searches

Python Django (28,897)

Python Machine Learning (20,195)

Python Flask (17,643)

Python Dataset (14,792)

Python Pytorch (14,673)

Python Docker (13,757)

Python Tensorflow (13,737)

Python Command Line (13,351)

Python Deep Learning (13,095)

Python Jupyter Notebook (12,976)

1-100 of 156 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.