Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python vqa
python
x
vqa
x
156 search results found
Interngpt
⭐
2,976
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Prismer
⭐
1,245
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Oscar
⭐
995
Oscar and VinVL
Clipbert
⭐
649
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Bottom Up Attention Vqa
⭐
606
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
Vqa.pytorch
⭐
536
Visual Question Answering in Pytorch
Ban Vqa
⭐
527
Bilinear attention networks for visual question answering
Lxmert
⭐
493
PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
Visual Qa
⭐
476
[Reimplementation Antol et al 2015] Keras-based LSTM/CNN models for Visual Question Answering
Mac Network
⭐
445
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Omninet
⭐
426
Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain
Nmn2
⭐
369
Neural module networks
Vlp
⭐
320
Vision-Language Pre-training for Image Captioning and Question Answering
Multi Modality Arena
⭐
308
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Vqa2.0 Recent Approachs 2018.pytorch
⭐
260
A pytroch reimplementation of "Bilinear Attention Network", "Intra- and Inter-modality Attention", "Learning Conditioned Graph Structures", "Learning to count object", "Bottom-up top-down" for Visual Question Answering 2.0
Ns Vqa
⭐
233
Neural-symbolic visual question answering
Openvqa
⭐
225
A lightweight, scalable, and general framework for visual question answering research
Neural Vqa Tensorflow
⭐
221
Visual Question Answering in Tensorflow.
Pytorch Vqa
⭐
213
Strong baseline for visual question answering
Nscl Pytorch Release
⭐
209
PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
Grid Feats Vqa
⭐
192
Grid features pre-training code for visual question answering
Block.bootstrap.pytorch
⭐
187
BLOCK (AAAI 2019), with a multimodal fusion library for deep learning models
Mcan Vqa
⭐
181
Deep Modular Co-Attention Networks for Visual Question Answering
Vqa Mcb
⭐
179
Vqa Counting
⭐
162
[ICLR 2018] Learning to Count Objects in Natural Images for Visual Question Answering
Vqa Winner Cvprw 2017
⭐
160
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17
Lrv Instruction
⭐
160
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Murel.bootstrap.pytorch
⭐
141
MUREL (CVPR 2019), a multimodal relational reasoning module for VQA
Vqa Keras Visual Question Answering
⭐
141
Visual Question Answering task written in Keras that answers questions about images
Tgif Qa
⭐
139
Repository for our CVPR 2017 and IJCV: TGIF-QA
Vlmevalkit
⭐
137
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks
Vqa Mfb
⭐
135
Vqa_regat
⭐
130
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Visual_turing_test Tutorial
⭐
121
Tutorial for Visual Turing Test (visual question answering, image question answering).
Attention On Attention For Vqa
⭐
120
Visual Question Answering Project with state of the art single Model performance.
Frozenbilm
⭐
120
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Vqa Project
⭐
115
Code for our paper: Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Dense Coattention Network
⭐
81
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Slotformer
⭐
77
Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models
Bvqa_benchmark
⭐
72
A resource list and performance benchmark for blind video quality assessment (BVQA) models on user-generated content (UGC) datasets. [IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
Visual_question_answering
⭐
71
Tensorflow implementation of "Dynamic Memory Networks for Visual and Textual Question Answering"
Cfvqa
⭐
70
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Css Vqa
⭐
69
Counterfactual Samples Synthesizing for Robust VQA
Clipcap
⭐
64
Using pretrained encoder and language models to generate captions from multimedia inputs.
Slotdiffusion
⭐
59
Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models
Vqa Mfb.pytorch
⭐
55
This project is out of date, I don't remember the details inside...
Rosita
⭐
53
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Zs F Vqa
⭐
53
Code and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]
Probnmn Clevr
⭐
52
Code for ICML 2019 paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" [long-oral]
Transformers Vqa
⭐
50
An implementation that downstreams pre-trained V+L models to VQA tasks. Now support: VisualBERT, LXMERT, and UNITER
Rubi.bootstrap.pytorch
⭐
47
RUBi : Reducing Unimodal Biases for Visual Question Answering
Villa
⭐
46
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
Bottom Up Features
⭐
44
Bottom-up features extractor implemented in PyTorch.
Awesome Vqa Latest
⭐
42
Visual Question Answering Paper List.
Ssbaseline
⭐
41
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
Iq
⭐
41
Information Maximizing Visual Question Generation
Conditional Batch Norm
⭐
40
Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"
Convolutional Vqa
⭐
38
Relvit
⭐
38
[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
Vqa 2016 Cvprw
⭐
38
Visual question answering for CVPR16 VQA Challenge.
Pslqa
⭐
38
An implementation of Probabilistic Soft Logic Engine using Python/Gurobi
Vqa Demo Gui
⭐
36
This repository gives a GUI using PyQt4 for VQA demo using Keras Deep Learning Library. The VQA model is created using Pre-trained VGG-16 Weight for image Features and glove vectors for question Features.
Ask_me_anything
⭐
36
An easy-to-use app to visualise attentions of various VQA models.
Chinese Vqa
⭐
33
Chinese Visual Question Answering 中文看图问答
Visual Question Answering
⭐
32
CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering
Nsm
⭐
32
Neural State Machine implemented in PyTorch
Perm Optim
⭐
32
[ICLR 2019] Learning Representations of Sets through Optimized Permutations
Mmgnn_textvqa
⭐
32
A Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Vqa_task_discovery
⭐
28
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
Bottom Up Attention Tf
⭐
28
Tensorflow implementation of "Bottom-up and Top-down attention for VQA" (TF v. 1.13)
Vctree Visual Question Answering
⭐
28
Code for the Visual Question Answering (VQA) part of CVPR 2019 oral paper: "Learning to Compose Dynamic Tree Structures for Visual Contexts"
Hcrn Videoqa
⭐
27
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
Miccai19 Medvqa
⭐
27
AIOZ AI - Overcoming Data Limitation in Medical Visual Question Answering
Activitynet Qa
⭐
26
An VideoQA dataset based on the videos from ActivityNet
Multimodalexplanations
⭐
25
Code release for Park et al. Multimodal Multimodal Explanations: Justifying Decisions and Pointing to the Evidence. in CVPR, 2018
Miccai21_mmq
⭐
23
Multiple Meta-model Quantifying for Medical Visual Question Answering
Figureqa Baseline
⭐
23
TensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"
Simple Vqa Pylib
⭐
22
A simple Python library and dataset for VQA
Vqa_keras
⭐
21
Modular and Simple approach to VQA in Keras
Vqa Text
⭐
21
Dual Attention Network
⭐
21
Tensorflow implementation of Dual Attention Network
Reproducibility Report Countvqa
⭐
21
Visual Question Answering
⭐
20
📷 ❓ Visual Question Answering Demo and Algorithmia API
Vqa Transfer Externaldata
⭐
20
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
Vqs
⭐
19
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Iconqa
⭐
19
Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".
Youmakeup_baseline
⭐
19
Lako
⭐
18
[Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
Simplevqa
⭐
18
A Deep Learning based No-reference Quality Assessment Model for UGC Videos
Iccv19_vqa Cti
⭐
17
Repo for our ICCV 19 paper: "Compact Trilinear Interaction for Visual Question Answering"
San Vqa Tensorflow
⭐
17
Wk Vqa
⭐
17
World Knowledge Based Visual Question Answering
Omnifusion
⭐
16
OmniFusion — a multimodal model to communicate using text and images
Bottom Up Attention Vqa
⭐
16
An updated PyTorch implementation of hengyuan-hu's version for 'Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering'
Stl Vqa
⭐
16
The good practice in the VQA system such as pos-tag attention, structed triplet learning and triplet attention is very general and can be inserted into almost any visual and language task
Cfr_vqa
⭐
16
Coarse-to-Fine Reasoning for Visual Question Answering
Revive
⭐
16
Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering (NeurIPS 2022)
Vqa Mcb Model Tensorflow
⭐
15
Tensorflow implementation of Multimodal Compact Bilinear Pooling for VQA
Mplug
⭐
15
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
V Cnn
⭐
14
Viewport-based CNN for visual quality assessment on 360° video
Related Searches
Python Django (28,897)
Python Machine Learning (20,195)
Python Flask (17,643)
Python Dataset (14,792)
Python Pytorch (14,673)
Python Docker (13,757)
Python Tensorflow (13,737)
Python Command Line (13,351)
Python Deep Learning (13,095)
Python Jupyter Notebook (12,976)
1-100 of 156 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.