Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
How_to_optimize_in_gpu | 346 | 9 months ago | 4 | apache-2.0 | Cuda | |||||
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit. | ||||||||||
Vuh | 329 | 6 months ago | 19 | mit | C++ | |||||
Vulkan compute for people | ||||||||||
Radar Electrooptical Simulation | 50 | 3 months ago | mit | C++ | ||||||
(REOS) Radar and Electro-Optical Simulation Framework written in C++. | ||||||||||
Rbcuda | 50 | 5 years ago | 4 | bsd-3-clause | C | |||||
CUDA bindings for Ruby | ||||||||||
Radar_electrooptical_simulation | 44 | 3 months ago | lgpl-3.0 | Fortran | ||||||
(REOS) Radar and ElectroOptical Simulation Framework written in Fortran. | ||||||||||
Parsec | 39 | 15 days ago | 113 | other | C | |||||
PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse. | ||||||||||
Tvm Lesson | 19 | 3 years ago | 2 | Python | ||||||
动手学习TVM核心原理教程 | ||||||||||
Gpu Cuda Self Organising Maps | 7 | a year ago | mit | C++ | ||||||
🧠 💡 📈 A project based in High Performance Computing. This project was built using CUDA (Compute Unified Device Architecture), C++ (C Plus Plus), C, CMake and JetBrains CLion. The scenario of the project was a GPU-based implementation of the Self-Organising-Maps (S.O.M.) algorithm for Artificial Neural Networks (A.N.N.), with the support of CUDA (Compute Unified Device Architecture), using its offered parallel optimisations and tunings. The final goal of the project was to test the several GPU-based implementations of the algorithm against a given CPU-based implementation of the same algorithm and, evaluate and compare the overall performance (speedup, efficiency and cost). | ||||||||||
Gpu Normal Computation | 7 | 6 years ago | lgpl-3.0 | C++ | ||||||
Performing normal computation for big point clouds on the gpu using openCL | ||||||||||
Custen | 7 | 4 years ago | 1 | apache-2.0 | Cuda | |||||
CUDA Finite Difference Library |