Awesome Open Source

Programming Languages

Search results for python multimodality

multimodality x

37 search results found

Llava ⭐ 12,514

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Deep Daze ⭐ 4,104

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

Otter ⭐ 3,322

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Big Sleep ⭐ 1,726

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

Multimodal Maestro ⭐ 871

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Internlm Xcomposer ⭐ 820

A Comparative Framework for Multimodal Recommender Systems

Clip4clip ⭐ 663

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Automated modeling and machine learning framework FEDOT

Woodpecker ⭐ 473

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

Build, Deploy, and Scale Reliable Swarms of Autonomous Agents for Workflow Automation. Join our Community: https://discord.gg/DbjBMJTSWD

Gpt4roi ⭐ 330

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

Collaborative Diffusion ⭐ 320

Collaborative Diffusion (CVPR 2023)

Multi Modality Arena ⭐ 308

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Cm3leon ⭐ 288

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

Mm Diffusion ⭐ 287

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

DANCE: A Deep Learning Library and Benchmark Platform for Single-Cell Analysis

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

Clip Guided Diffusion ⭐ 267

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

Multimodal Sentiment Analysis ⭐ 205

Attention-based multimodal fusion for sentiment analysis

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

[ICCV2019] Robust Multi-Modality Multi-Object Tracking

Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)

How2 Dataset ⭐ 125

This repository contains code and metadata of How2 dataset

Fuse Med Ml ⭐ 121

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)

Fusilli ⭐ 120

A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

The Compiler ⭐ 119

Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

Emmental ⭐ 93

A deep learning framework for building multimodal multi-task learning systems.

Andromeda ⭐ 92

An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast

Cvpr21chal Slr ⭐ 89

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"

Mirasol Pytorch ⭐ 74

Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch

Prompt Highlighter ⭐ 69

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Swarms Pytorch ⭐ 67

Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊

Multi_token ⭐ 54

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

Kosmos2.5 ⭐ 34

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

Long-Term Rhythmic Video Soundtracker, ICML2023

Tfce_mediation ⭐ 27

Fast regression and mediation analysis of vertex or voxel MRI data with TFCE

Trar Vqa ⭐ 23

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Composeae ⭐ 23

Official code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval

Cris.pytorch ⭐ 20

An official PyTorch implementation of the CRIS paper

Visual Dialog: Light-weight Transformer for Many Inputs (ECCV 2020)

Matcha Agent ⭐ 16

Official implementation of Matcha-agent

Behaviopy ⭐ 15

Behavioral data analysis and plotting in Python.

Documentclip ⭐ 14

The code of CorrI2P

Semantic_segmentation ⭐ 11

KERAS: Multimodal Deep Learning for Semantic Segmentation (RGB, NIR Streams) - multiple architectures

The implementation of MoCA

Mode normalization (in PyTorch).

Collects a multimodal dataset of Wikipedia articles and their images

Multimodal Fully Convolutional Neural networks for Semantic Segmentation.

The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention"

A repository for the AI2D-RST corpus.

Plug in and play Implementation of "A Generalist Agent" by Deepmind.

Simple Implementation of TinyGPTV in super simple Zeta lego blocks

Multimodal Autoencoder For Breast Cancer ⭐ 7

Prognostically Relevant Subtypes and Survival Prediction for Breast Cancer Based on Multimodal Genomics Data

Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in pytorch and zeta

Acl2018 Multimodalmultitasksentimentanalysis ⭐ 6

Codes for ACL2018 Multimodal Language Workshop paper

Multimodal Tot ⭐ 6

Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement

A Multi-modal Framework for Sentimental Analysis of Meme

Diverse_sampling ⭐ 5

Official project of DiverseSampling (ACMMM2022 Paper)

Mlxtransformer ⭐ 5

Simple Implementation of a Transformer in the new framework MLX by Apple

Related Searches

Python Machine Learning (20,195)

Python Dataset (14,792)

Python Tensorflow (13,736)

Python Deep Learning (13,092)

Python Network (11,495)

Python Natural Language Processing (9,064)

Python Artificial Intelligence (8,580)

Python Pytorch (7,877)

Python Convolutional Neural Networks (7,435)

Python Keras (6,821)

1-37 of 37 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.