Awesome Open Source

Programming Languages

Search results for python multi modal learning

37 search results found

Chinese Clip ⭐ 2,816

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Hcaptcha Challenger ⭐ 1,247

🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.

Prismer ⭐ 1,245

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Macaw Llm ⭐ 1,090

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Cvpr 2023 Papers ⭐ 185

CVPR 2023 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. A star in the development of visual intelligence!

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

Embodiedscan ⭐ 130

[arXiv 2023] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Achelous ⭐ 116

Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar

Build high-performance AI models with modular building blocks

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

A detection/segmentation dataset with class names characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

Japanese Clip ⭐ 54

Japanese CLIP by rinna Co., Ltd.

[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Sugar Crepe ⭐ 40

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Hyperdensenet_pytorch ⭐ 29

Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation

Trar Vqa ⭐ 23

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

MMEA: Entity Alignment for Multi-Modal Knowledge Graphs

Adaptive Confidence Multi-View Hashing

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding

Zsd Sc Resolver ⭐ 19

Resolving semantic confusions for improved zero-shot detection (BMVC 2022)

Mrm Pytorch ⭐ 18

An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Wss Cmer ⭐ 14

Code for the paper : "Weakly supervised segmentation with cross-modality equivariant constraints", available at https://arxiv.org/pdf/2104.02488.pdf

Neuralmerger ⭐ 13

Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen, "Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence, IJCAI-ECAI-2018

Multimodal Remote Sensing Toolkit ⭐ 13

A python tool to perform deep learning experiments on multimodal remote sensing data.

Xmodal Vit ⭐ 12

Official implementation of "Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval", BMVC 2022.

Multimodal Math Pretraining ⭐ 11

[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"

Managertower ⭐ 9

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

Gimme_signals_action_recognition ⭐ 9

Multi-Modal action recognition for skeleton sequences, inertial measurements, motion capturing data and Wi-Fi CSI fingerprints.

DramaQA Starter Code (2021)

Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.

PyTorch code for the paper "Complementarity is the king: A multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval"

Multiviewcropclassification ⭐ 5

Public repository of our IGARSS 2023 submission

Related Searches

Python Dataset (14,792)

Python Machine Learning (14,099)

Python Tensorflow (13,736)

Python Deep Learning (13,092)

Python Natural Language Processing (9,064)

Python Artificial Intelligence (8,580)

Python Pytorch (7,877)

Python Convolutional Neural Networks (6,861)

Python Paper (6,578)

Python Segmentation (4,571)

1-37 of 37 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.