Dplasma

DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Alternatives To Dplasma
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
How_to_optimize_in_gpu346
9 months ago4apache-2.0Cuda
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Vuh329
6 months ago19mitC++
Vulkan compute for people
Radar Electrooptical Simulation50
3 months agomitC++
(REOS) Radar and Electro-Optical Simulation Framework written in C++.
Rbcuda50
5 years ago4bsd-3-clauseC
CUDA bindings for Ruby
Radar_electrooptical_simulation44
3 months agolgpl-3.0Fortran
(REOS) Radar and ElectroOptical Simulation Framework written in Fortran.
Parsec39
17 days ago113otherC
PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.
Tvm Lesson19
3 years ago2Python
动手学习TVM核心原理教程
Gpu Cuda Self Organising Maps7
a year agomitC++
🧠 💡 📈 A project based in High Performance Computing. This project was built using CUDA (Compute Unified Device Architecture), C++ (C Plus Plus), C, CMake and JetBrains CLion. The scenario of the project was a GPU-based implementation of the Self-Organising-Maps (S.O.M.) algorithm for Artificial Neural Networks (A.N.N.), with the support of CUDA (Compute Unified Device Architecture), using its offered parallel optimisations and tunings. The final goal of the project was to test the several GPU-based implementations of the algorithm against a given CPU-based implementation of the same algorithm and, evaluate and compare the overall performance (speedup, efficiency and cost).
Gpu Normal Computation7
6 years agolgpl-3.0C++
Performing normal computation for big point clouds on the gpu using openCL
Custen7
4 years ago1apache-2.0Cuda
CUDA Finite Difference Library
Alternatives To Dplasma
Select To Compare


Alternative Project Comparisons
Popular High Performance Computing Projects
Popular Gpu Acceleration Projects
Popular Software Performance Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
C
High Performance Computing
Gpu Acceleration
Gpu Computing
Dataflow Programming