Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Cccl | 523 | 2 months ago | 566 | other | C++ | |||||
CUDA C++ Core Libraries | ||||||||||
How_to_optimize_in_gpu | 346 | 8 months ago | 4 | apache-2.0 | Cuda | |||||
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit. | ||||||||||
Parallelreductionsbenchmark | 58 | 4 months ago | 1 | C++ | ||||||
Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast! | ||||||||||
Ramsesgpu | 50 | a year ago | other | C++ | ||||||
Astrophysics MHD simulation code optimized for large cluster of GPU | ||||||||||
Slate | 49 | 2 months ago | 24 | bsd-3-clause | C++ | |||||
SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy Exascale Computing Project (ECP). | ||||||||||
Self | 42 | 3 months ago | 8 | other | Fortran | |||||
Spectral Element Library in Fortran | ||||||||||
Spfft | 36 | 2 years ago | 5 | February 18, 2022 | 2 | bsd-3-clause | C++ | |||
Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support | ||||||||||
Ptxprofiler | 30 | 3 months ago | other | C++ | ||||||
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis. | ||||||||||
Neon | 28 | 5 months ago | 5 | other | C++ | |||||
Multi-GPU Framework for Voxel Grid Computations | ||||||||||
Care | 26 | 5 months ago | 31 | bsd-3-clause | C++ | |||||
CHAI and RAJA provide an excellent base on which to build portable codes. CARE expands that functionality, adding new features such as loop fusion capability and a portable interface for many numerical algorithms. It provides all the basics for anyone wanting to write portable code. |