Issue #15
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from June 18 to July 1 2021.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conferences
LLVM and Clang
Discussions
Commits
- New NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 matrix operations:
wmma.load
,wmma.store
,wmma.mma
, andmma
. D104847 - AMDGPU learned to optimize VGPR live-ranges in simple divergent if-else statements. D102212
- New AMDGPU target: gfx1035. D104804
- AMDGPU gfx90a memory model has been updated. D105137
- New 224-bit vector types for AMDGPU. These map to
v7i32
/v7f32
, while existing 192-bit types to newly addedv3i64
/v3f64
/v6i32
/v6f32
. D104622 - The
ReplaceLDS
AMDGPU pass is now disabled by default in preparation to later remove the code. D104962
MLIR
Discussions
Commits
- New NVPTX ops for warp synchronous matrix operations for the GPU and NNVM dialects. D95330, D95331, D105175
OpenMP (Target Offloading)
Discussions
Commits
- Multiple globalization improvements:
- GPU memory globalization got simplified. The old implementation in the frontend that emulated standard CPU stack sharing is now replaced with a single allocation command, mimicking an
alloca
instruction for variables that must be shared between threads. D97680, D97818 - OpenMP device routines will be internalized to facilitate interprocedural optimizations. D102824
- The number of Attributor iterations is doubled from 64 to 128 on the GPU target. D104920
- Remaining globalization optimizations will be reported as missed remarks instead of analysis remarks. D104735
- GPU memory globalization got simplified. The old implementation in the frontend that emulated standard CPU stack sharing is now replaced with a single allocation command, mimicking an
clang-offload-bundler
can now unbundle archives containing bundled object files into device-specific archives. D93525
External Compilers
LLPC
- LLPC can now generate out-of-bounds checks for scratch accesses (stack variables). LLPC#1260
- New utilities for iterating over enums using C++ iterators and ranges. LLPC#1273, LLPC#1299