Issue #15

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from June 18 to July 1 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

LLVM and Clang

Discussions

Commits

New NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 matrix operations: wmma.load, wmma.store, wmma.mma, and mma. D104847
AMDGPU learned to optimize VGPR live-ranges in simple divergent if-else statements. D102212
New AMDGPU target: gfx1035. D104804
AMDGPU gfx90a memory model has been updated. D105137
New 224-bit vector types for AMDGPU. These map to v7i32/v7f32, while existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32. D104622
The ReplaceLDS AMDGPU pass is now disabled by default in preparation to later remove the code. D104962

MLIR

Discussions

Commits

New NVPTX ops for warp synchronous matrix operations for the GPU and NNVM dialects. D95330, D95331, D105175

OpenMP (Target Offloading)

Discussions

Commits

Multiple globalization improvements:
- GPU memory globalization got simplified. The old implementation in the frontend that emulated standard CPU stack sharing is now replaced with a single allocation command, mimicking an alloca instruction for variables that must be shared between threads. D97680, D97818
- OpenMP device routines will be internalized to facilitate interprocedural optimizations. D102824
- The number of Attributor iterations is doubled from 64 to 128 on the GPU target. D104920
- Remaining globalization optimizations will be reported as missed remarks instead of analysis remarks. D104735
clang-offload-bundler can now unbundle archives containing bundled object files into device-specific archives. D93525

External Compilers

LLPC

LLPC can now generate out-of-bounds checks for scratch accesses (stack variables). LLPC#1260
New utilities for iterating over enums using C++ iterators and ranges. LLPC#1273, LLPC#1299

Industry News and Conferences

LLVM and Clang

Discussions

Commits

MLIR

Discussions

Commits

OpenMP (Target Offloading)

Discussions

Commits

External Compilers

LLPC

Mesa