Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from June 18 to July 1 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

LLVM and Clang

Discussions

Commits

  • New NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 matrix operations: wmma.load, wmma.store, wmma.mma, and mma. D104847
  • AMDGPU learned to optimize VGPR live-ranges in simple divergent if-else statements. D102212
  • New AMDGPU target: gfx1035. D104804
  • AMDGPU gfx90a memory model has been updated. D105137
  • New 224-bit vector types for AMDGPU. These map to v7i32/v7f32, while existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32. D104622
  • The ReplaceLDS AMDGPU pass is now disabled by default in preparation to later remove the code. D104962

MLIR

Discussions

Commits

  • New NVPTX ops for warp synchronous matrix operations for the GPU and NNVM dialects. D95330, D95331, D105175

OpenMP (Target Offloading)

Discussions

Commits

  • Multiple globalization improvements:
    • GPU memory globalization got simplified. The old implementation in the frontend that emulated standard CPU stack sharing is now replaced with a single allocation command, mimicking an alloca instruction for variables that must be shared between threads. D97680, D97818
    • OpenMP device routines will be internalized to facilitate interprocedural optimizations. D102824
    • The number of Attributor iterations is doubled from 64 to 128 on the GPU target. D104920
    • Remaining globalization optimizations will be reported as missed remarks instead of analysis remarks. D104735
  • clang-offload-bundler can now unbundle archives containing bundled object files into device-specific archives. D93525

External Compilers

LLPC

  • LLPC can now generate out-of-bounds checks for scratch accesses (stack variables). LLPC#1260
  • New utilities for iterating over enums using C++ iterators and ranges. LLPC#1273, LLPC#1299

Mesa