Marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Alternatives To Marlin
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Marlin160
3 months ago4apache-2.0Python
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Tensorquant44
4 years agoapache-2.0Python
Yolov3_lite32
4 years ago8C++
yolov3 model compress and acceleration (quantization, sparse), c++ version
Wlq20
5 years agootherC++
caffe implementation of single level quantization
Tensorflow_model_quantization8
3 years agoPython
A tutorial of model quantization using TensorFlow
Alternatives To Marlin
Select To Compare


Alternative Project Comparisons
Popular Kernel Projects
Popular Quantization Projects
Popular Operating Systems Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Kernel
Quantization