Awesome Open Source

Programming Languages

Search results for flash attention

flash-attention x

9 search results found

Qwen ⭐ 11,085

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Chinese Llama Alpaca 2 ⭐ 5,810

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

Internlm ⭐ 4,412

Official release of InternLM2 7B and 20B base and chat models. 200K context support

Awesome Llm Inference ⭐ 715

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

Flash_attention_inference ⭐ 72

Performance of the C++ interface of flash attention, flash attention v2 and self quantized decoding attention in large language model (LLM) inference scenarios.

Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

Fastckpt ⭐ 18

Python package for rematerialization-aware gradient checkpointing

Flashperceiver ⭐ 7

Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.

Utilities for efficient fine-tuning, inference and evaluation of code generation models

Related Searches

Python Flash Attention (4)

Natural Language Processing Flash Attention (4)

Llm Flash Attention (4)

1-9 of 9 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.