Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from August 20 to September 9 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

LLVM and Clang

Discussions

Commits

  • Clang now reports CUDA 11.4 as fully supported. Not all features offered by NVCC are actually supported, but Clang is expected to handle CUDA headers and produce binaries for all GPUs supported by NVCC. The default GPU architecture is now sm_35. D108239, D108248, D108235
  • Various AMDGPU MIR peephole optimizations for comparison instructions.
  • Various AMDGPU attribute handling and propagation improvements.

MLIR

Discussions

  • Xuanhuo asked about the meaning of gpu.all_reduce. Alex Zinenko explained how this relates to collective operations thread index linearization.

Commits

  • A GPU memset op is introduced for CUDA and ROCm.
  • Weiwei Li started to improve how image operands are represented in the SPIR-V dialect for graphics use cases.

OpenMP (Target Offloading)

Discussions

Commits

  • The SPMDzation optimization (introduced in D102307) has been extended with guarding to enlarge the scope of possible kernels amenable to the optimization, see D106892. Additionally, guarding has been implemented more effectively to batch multiple side-effect instructions in a single guarded region when they share the same block being only separated by non-side-effect instructions, see D109070. Further, generic regions without any parallelism are no longer transformed by SPMDzation to avoid unnecessary guarding, see D109438.
  • OpenMP assumes can now be used to provide information for the OpenMP-Opt pass, what information is required to perform an optimization is communicated via optimization remarks.
  • OpenMP declare variant now works with functions that use reference types, this fixes a problem reported for certain C++ math functions.
  • Initial support for AMDGPU gfx10 offloading. D108708

External Compilers

LLPC

Mesa