Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Transformers | 88,463 | 64 | 911 | 15 hours ago | 91 | June 21, 2022 | 618 | apache-2.0 | Python | |
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. | ||||||||||
Pytorch | 64,520 | 146 | 15 hours ago | 23 | August 10, 2022 | 11,507 | other | C++ | ||
Tensors and Dynamic neural networks in Python with strong GPU acceleration | ||||||||||
Stable Diffusion Webui | 55,336 | 16 hours ago | 1,786 | agpl-3.0 | Python | |||||
Stable Diffusion web UI | ||||||||||
Real Time Voice Cloning | 40,272 | 8 days ago | 104 | other | Python | |||||
Clone a voice in 5 seconds to generate arbitrary speech in real-time | ||||||||||
Yolov5 | 36,800 | 19 hours ago | 35 | May 21, 2022 | 285 | gpl-3.0 | Python | |||
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite | ||||||||||
Made With Ml | 32,763 | 11 days ago | 5 | May 15, 2019 | 8 | mit | Jupyter Notebook | |||
Learn how to responsibly develop, deploy and maintain production machine learning applications. | ||||||||||
Mockingbird | 27,604 | 2 | 2 days ago | 9 | February 28, 2022 | 391 | other | Python | ||
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time | ||||||||||
Gfpgan | 26,981 | 1 | a month ago | 11 | February 15, 2022 | 200 | other | Python | ||
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration. | ||||||||||
Pytorch Tutorial | 25,860 | 19 days ago | 88 | mit | Python | |||||
PyTorch Tutorial for Deep Learning Researchers | ||||||||||
Ray | 24,807 | 80 | 199 | 16 hours ago | 76 | June 09, 2022 | 2,872 | apache-2.0 | Python | |
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads. |
this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC
This tutorial was used as a basis for implementation, as well as NVIDIA's cuda code
python setup.py install
,python benchmark.py {cpu, cuda}
,python grad_check.py --backend {cpu, cuda}
.This module is expected to compile for Pytorch 1.6
.
this module is available on pip
pip install spatial-correlation-sampler
For a cpu-only version, you can install from source with
python setup_cpu.py install
This module needs compatible gcc version and CUDA to be compiled. Namely, CUDA 9.1 and below will need gcc5, while CUDA 9.2 and 10.0 will need gcc7 See this issue for more information
API has a few difference with NVIDIA's module
input (B x C x H x W) -> output (B x PatchH x PatchW x oH x oW)
oH
and oW
are no longer dependant of patch size, but only of kernel size and paddingpatch_size
is now the whole patch, and not only the radii.stride1
is now stride
andstride2
is dilation_patch
, which behave like dilated convolutionsmax_displacement
is then dilation_patch * (patch_size - 1) / 2
.dilation
is a new parameter, it acts the same way as dilated convolution regarding the correlation kernelkernel_size=1
patch_size=21,
stride=1,
padding=0,
dilation=1
dilation_patch=2
import torch
from spatial_correlation_sampler import SpatialCorrelationSampler,
device = "cuda"
batch_size = 1
channel = 1
H = 10
W = 10
dtype = torch.float32
input1 = torch.randint(1, 4, (batch_size, channel, H, W), dtype=dtype, device=device, requires_grad=True)
input2 = torch.randint_like(input1, 1, 4).requires_grad_(True)
#You can either use the function or the module. Note that the module doesn't contain any parameter tensor.
#function
out = spatial_correlation_sample(input1,
input2,
kernel_size=3,
patch_size=1,
stride=2,
padding=0,
dilation=2,
dilation_patch=1)
#module
correlation_sampler = SpatialCorrelationSampler(
kernel_size=3,
patch_size=1,
stride=2,
padding=0,
dilation=2,
dilation_patch=1)
out = correlation_sampler(input1, input2)
benchmark.py
, FlowNetC parameters are same as use in FlowNetC
with a batch size of 4, described in this paper, implemented here and here.CUDA_LAUNCH_BLOCKING
set to 1
.float32
is benchmarked.CUDA_LAUNCH_BLOCKING=1 python benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256 cuda -d float
CUDA_LAUNCH_BLOCKING=1 python NV_correlation_benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256
implementation | Correlation parameters | device | pass | min time | avg time |
---|---|---|---|---|---|
ours | default | 980 GTX | forward | 5.745 ms | 5.851 ms |
ours | default | 980 GTX | backward | 77.694 ms | 77.957 ms |
NVIDIA | default | 980 GTX | forward | 13.779 ms | 13.853 ms |
NVIDIA | default | 980 GTX | backward | 73.383 ms | 73.708 ms |
ours | FlowNetC | 980 GTX | forward | 26.102 ms | 26.179 ms |
ours | FlowNetC | 980 GTX | backward | 208.091 ms | 208.510 ms |
NVIDIA | FlowNetC | 980 GTX | forward | 35.363 ms | 35.550 ms |
NVIDIA | FlowNetC | 980 GTX | backward | 283.748 ms | 284.346 ms |
kernel_size
> 1 during backward needs some investigation, feel free to
dive in the code to improve it !Correlation parameters | device | pass | min time | avg time |
---|---|---|---|---|
default | E5-2630 v3 @ 2.40GHz | forward | 159.616 ms | 188.727 ms |
default | E5-2630 v3 @ 2.40GHz | backward | 282.641 ms | 294.194 ms |
FlowNetC | E5-2630 v3 @ 2.40GHz | forward | 2.138 s | 2.144 s |
FlowNetC | E5-2630 v3 @ 2.40GHz | backward | 7.006 s | 7.075 s |