Awesome Open Source
Awesome Open Source

Awesome machine learning for compilers and program optimisation

Awesome Maintenance

A curated list of awesome research papers, datasets, and tools for applying machine learning techniques to compilers and program optimisation.




Iterative Compilation and Compiler Option Tuning

Instruction-level Optimisation

Auto-tuning and Design Space Exploration

Parallelism Mapping and Task Scheduling

Domain-specific Optimisation

Languages and Compilation

Code Size Reduction

Cost and Performance Models

Learning Program Representation

Enabling ML in Compilers

Memory/Cache Modeling/Analysis

  • 10-pages Learning Memory Access Patterns - Milad Hashemi, Kevin Swersky, Jamie A. Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, Parthasarathy Ranganathan. ICML 2018


Talks and Tutorials


  • CompilerGym - reinforcement learning environments for compiler optimizations
  • CodeBert - pre-trained DNN models for programming languages (paper).
  • programl - LLVM and XLA IR program representation for machine learning (paper).
  • NeuroVectorizer - Using deep reinforcement learning (RL) to predict optimal vectorization compiler pragmas (paper).
  • TVM - Open Deep Learning Compiler Stack for cpu, gpu and specialized accelerators (paper; slides).
  • clgen - Benchmark generator using LSTMs (paper; slides).
  • COBAYN - Compiler Autotuning using BNs (paper).
  • OpenTuner - Framework for building domain-specific multi-objective program autotuners (paper; slides)
  • ONNX-MLIR - Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure (paper).

Benchmarks and Datasets

  • The Alberta Workloads for the SPEC CPU® 2017 Benchmark Suite - Additional workloads for the SPEC CPU2017 Benchmark Suite.
  • Project CodeNet - Code samples written in 50+ programming languages, annotated with info, such as code size, memory footprint, CPU run time, and status (acceptance/error types)
  • CodeXGLUE - A Machine Learning Benchmark Dataset for Code Understanding and Generation (paper)
  • ANGHABENCH - A suite with One Million Compilable C Benchmarks (paper)
  • BHive - A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models (paper).
  • cBench - 32 C benchmarks with datasets and driver scripts.
  • PolyBench - Dataset - Multiple datasets for Polybench (paper)
  • PolyBench - 31 Stencil and Linear-algebra benchmarks with datasets and driver scripts.
  • PolyBench - Original - 30 Stencil and Linear-algebra benchmarks with datasets and driver scripts.
  • DeepDataFlow - 469k LLVM-IR files and 8.6B data-flow analysis labels for classification (paper).
  • devmap - 650 OpenCL benchmark features and CPU/GPU classification labels (paper; slides).



How to Contribute

See Contribution Guidelines. TL;DR: send one of the maintainers a pull request.

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
machine-learning (3,741
artificial-intelligence (674
compiler (516
parallel-computing (90
parallelism (40
operating-systems (26
parallel-programming (18