Pytorch Correlation Extension

Custom implementation of Corrleation Module
Alternatives To Pytorch Correlation Extension
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Transformers88,4636491115 hours ago91June 21, 2022618apache-2.0Python
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pytorch64,52014615 hours ago23August 10, 202211,507otherC++
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Stable Diffusion Webui55,336
16 hours ago1,786agpl-3.0Python
Stable Diffusion web UI
Real Time Voice Cloning40,272
8 days ago104otherPython
Clone a voice in 5 seconds to generate arbitrary speech in real-time
19 hours ago35May 21, 2022285gpl-3.0Python
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Made With Ml32,763
11 days ago5May 15, 20198mitJupyter Notebook
Learn how to responsibly develop, deploy and maintain production machine learning applications.
Mockingbird27,60422 days ago9February 28, 2022391otherPython
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Gfpgan26,9811a month ago11February 15, 2022200otherPython
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Pytorch Tutorial25,860
19 days ago88mitPython
PyTorch Tutorial for Deep Learning Researchers
Ray24,8078019916 hours ago76June 09, 20222,872apache-2.0Python
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
Alternatives To Pytorch Correlation Extension
Select To Compare

Alternative Project Comparisons


Pytorch Correlation module

this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC

This tutorial was used as a basis for implementation, as well as NVIDIA's cuda code

  • Build and Install C++ and CUDA extensions by executing python install,
  • Benchmark C++ vs. CUDA by running python {cpu, cuda},
  • Run gradient checks on the code by running python --backend {cpu, cuda}.


This module is expected to compile for Pytorch 1.6.


this module is available on pip

pip install spatial-correlation-sampler

For a cpu-only version, you can install from source with

python install

Known Problems

This module needs compatible gcc version and CUDA to be compiled. Namely, CUDA 9.1 and below will need gcc5, while CUDA 9.2 and 10.0 will need gcc7 See this issue for more information


API has a few difference with NVIDIA's module

  • output is now a 5D tensor, which reflects the shifts horizontal and vertical.
input (B x C x H x W) -> output (B x PatchH x PatchW x oH x oW)
  • Output sizes oH and oW are no longer dependant of patch size, but only of kernel size and padding
  • Patch size patch_size is now the whole patch, and not only the radii.
  • stride1 is now stride andstride2 is dilation_patch, which behave like dilated convolutions
  • equivalent max_displacement is then dilation_patch * (patch_size - 1) / 2.
  • dilation is a new parameter, it acts the same way as dilated convolution regarding the correlation kernel
  • to get the right parameters for FlowNetC, you would have


import torch
from spatial_correlation_sampler import SpatialCorrelationSampler, 

device = "cuda"
batch_size = 1
channel = 1
H = 10
W = 10
dtype = torch.float32

input1 = torch.randint(1, 4, (batch_size, channel, H, W), dtype=dtype, device=device, requires_grad=True)
input2 = torch.randint_like(input1, 1, 4).requires_grad_(True)

#You can either use the function or the module. Note that the module doesn't contain any parameter tensor.


out = spatial_correlation_sample(input1,


correlation_sampler = SpatialCorrelationSampler(
out = correlation_sampler(input1, input2)


  • default parameters are from, FlowNetC parameters are same as use in FlowNetC with a batch size of 4, described in this paper, implemented here and here.
  • Feel free to file an issue to add entries to this with your hardware !

CUDA Benchmark

  • See here for a benchmark script working with NVIDIA's code, and Pytorch.
  • Benchmark are launched with environment variable CUDA_LAUNCH_BLOCKING set to 1.
  • Only float32 is benchmarked.
  • FlowNetC correlation parameters where launched with the following command:
CUDA_LAUNCH_BLOCKING=1 python --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256 cuda -d float

CUDA_LAUNCH_BLOCKING=1 python --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256
implementation Correlation parameters device pass min time avg time
ours default 980 GTX forward 5.745 ms 5.851 ms
ours default 980 GTX backward 77.694 ms 77.957 ms
NVIDIA default 980 GTX forward 13.779 ms 13.853 ms
NVIDIA default 980 GTX backward 73.383 ms 73.708 ms
ours FlowNetC 980 GTX forward 26.102 ms 26.179 ms
ours FlowNetC 980 GTX backward 208.091 ms 208.510 ms
NVIDIA FlowNetC 980 GTX forward 35.363 ms 35.550 ms
NVIDIA FlowNetC 980 GTX backward 283.748 ms 284.346 ms


  • The overhead of our implementation regarding kernel_size > 1 during backward needs some investigation, feel free to dive in the code to improve it !
  • The backward pass of NVIDIA is not entirely correct when stride1 > 1 and kernel_size > 1, because not everything is computed, see here.

CPU Benchmark

  • No other implementation is avalaible on CPU.
  • It is obviously not recommended to run it on CPU if you have a GPU.
Correlation parameters device pass min time avg time
default E5-2630 v3 @ 2.40GHz forward 159.616 ms 188.727 ms
default E5-2630 v3 @ 2.40GHz backward 282.641 ms 294.194 ms
FlowNetC E5-2630 v3 @ 2.40GHz forward 2.138 s 2.144 s
FlowNetC E5-2630 v3 @ 2.40GHz backward 7.006 s 7.075 s
Popular Deep Learning Projects
Popular Pytorch Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Deep Learning