How_to_optimize_in_gpu

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Alternatives To How_to_optimize_in_gpu
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Cccl523
3 months ago566otherC++
CUDA C++ Core Libraries
How_to_optimize_in_gpu346
9 months ago4apache-2.0Cuda
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Parallelreductionsbenchmark58
5 months ago1C++
Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast!
Ramsesgpu50
a year agootherC++
Astrophysics MHD simulation code optimized for large cluster of GPU
Slate49
3 months ago24bsd-3-clauseC++
SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy Exascale Computing Project (ECP).
Self42
4 months ago8otherFortran
Spectral Element Library in Fortran
Spfft36
2 years ago5February 18, 20222bsd-3-clauseC++
Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support
Ptxprofiler30
4 months agootherC++
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Neon28
5 months ago5otherC++
Multi-GPU Framework for Voxel Grid Computations
Care26
6 months ago31bsd-3-clauseC++
CHAI and RAJA provide an excellent base on which to build portable codes. CARE expands that functionality, adding new features such as loop fusion capability and a portable interface for many numerical algorithms. It provides all the basics for anyone wanting to write portable code.
Alternatives To How_to_optimize_in_gpu
Select To Compare


Alternative Project Comparisons
Popular Gpu Acceleration Projects
Popular Hpc Projects
Popular Machine Learning Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Hpc
High Performance Computing
Gpu Acceleration