Awesome Open Source

Programming Languages

Search results for computer vision multimodal

computer-vision x

30 search results found

Courses ⭐ 4,018

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

Rerun ⭐ 3,895

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

Chinese Clip ⭐ 2,816

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Torchscale ⭐ 2,804

Foundation Architecture for (M)LLMs

Metatransformer ⭐ 1,325

Meta-Transformer for Unified Multimodal Learning

Autodistill ⭐ 1,286

Images to inference with no labeling (use foundation models to train supervised models)

Hcaptcha Challenger ⭐ 1,247

🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.

Transformer In Vision ⭐ 1,220

Recent Transformer-based CV and related works.

This repository is the official implementation of Disentangling Writer and Character Styles for Handwriting Generation (CVPR23).

Papermage ⭐ 494

library supporting NLP and CV research on scientific papers

Advancedliteratemachinery ⭐ 464

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Blended Latent Diffusion ⭐ 458

Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]

Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!

Llavavision ⭐ 409

A simple "Be My Eyes" web app with a llama.cpp/llava backend

Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)

Awesome Foundation And Multimodal Models ⭐ 223

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code]

Cav Mae ⭐ 201

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Vlmevalkit ⭐ 137

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks

[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"

Qd Detr ⭐ 113

Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)

[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Domain Adaptation"

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"

Minklocmultimodal ⭐ 73

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

Alpaca Glassoff ⭐ 72

Image Acceptable Alpaca (Image-Text Chat AI).

Mkg_analogy ⭐ 56

Code and datasets for the ICLR2023 paper "Multimodal Analogical Reasoning over Knowledge Graphs."

Kosmos X ⭐ 53

The Next Generation Multi-Modality Superintelligence

Glami 1m ⭐ 47

The largest multilingual image-text classification dataset. It contains fashion products.

Tokencompose ⭐ 41

(arXiv) 🧩 TokenCompose: Grounding Diffusion with Token-level Supervision

Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️

Vlog_action_reason ⭐ 17

Identifying reasons for human actions in lifestyle vlogs.

Whereisai ⭐ 10

AI company, product, and tool collection.

Multimodal Datasets ⭐ 9

Multimodal datasets.

Engage in a semantic segmentation challenge for land cover description using multimodal remote sensing earth observation data, delving into real-world scenarios with a dataset comprising 70,000+ aerial imagery patches and 50,000 Sentinel-2 satellite acquisitions.

My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"

Spatial Reasoning ⭐ 6

Grounding Language Models for Compositional and Spatial Reasoning

Icdar Emoreccom ⭐ 6

The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"

Robopilot ⭐ 5

Live Dense Multi Modal 3D Mapping — A system designed for real time 3D reconstruction using a fusion of multiple depth and camera sensors simultaneously at real time speed — A Generic Framework for Distributed Deep Neural Networks over the Cloud, the Edge, and End Devices for Computer Vision Applications.

A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant

Cbvs Uniclip ⭐ 5

A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios

Related Searches

Python Computer Vision (5,055)

Deep Learning Computer Vision (3,558)

Machine Learning Computer Vision (2,342)

Jupyter Notebook Computer Vision (1,834)

Computer Vision Opencv (1,326)

Pytorch Computer Vision (1,093)

Convolutional Neural Networks Computer Vision (1,072)

Artificial Intelligence Computer Vision (952)

Tensorflow Computer Vision (905)

C Plus Plus Computer Vision (809)

1-30 of 30 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.