Awesome Open Source

Programming Languages

Search results for multimodal image captioning

image-captioning x

9 search results found

Interngpt ⭐ 2,976

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.

Awesome Foundation And Multimodal Models ⭐ 223

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]

Language Models Can See: Plugging Visual Controls in Text Generation

Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)

Image Text Papers ⭐ 71

Image Caption and Text to Image papers.

Cvt2distilgpt2 ⭐ 46

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Inverse Dall E For Optical Character Recognition ⭐ 24

Inverse DALL-E for Optical Character Recognition

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

Spatial Reasoning ⭐ 6

Grounding Language Models for Compositional and Spatial Reasoning

Related Searches

Python Multimodal (186)

Python Image Captioning (175)

1-9 of 9 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.