Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Marlin | 160 | 3 months ago | 4 | apache-2.0 | Python | |||||
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens. | ||||||||||
Tensorquant | 44 | 4 years ago | apache-2.0 | Python | ||||||
Yolov3_lite | 32 | 4 years ago | 8 | C++ | ||||||
yolov3 model compress and acceleration (quantization, sparse), c++ version | ||||||||||
Wlq | 20 | 5 years ago | other | C++ | ||||||
caffe implementation of single level quantization | ||||||||||
Tensorflow_model_quantization | 8 | 3 years ago | Python | |||||||
A tutorial of model quantization using TensorFlow |