Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from October 29 to November 11 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

LLVM and Clang

Discussions

  • Johannes Doerfert is gathering feedback on starting a ‘GPU working group’ which would focus on ‘development of GPU-specific optimizations, organization of driver and backend code, dealing with issues like convergent functions,’ etc. The thread received a number of positive replies with some ideas for additional topics of interest.

Commits

  • Two new architectures, spirv32 and spirv64, were ported from the LLVM SPIR-V backend Khronos repository. This initially reuses the target information and code generation from the existing SPIR support. D109144
  • The latest supported CUDA version was bumped to 11.5. D113249
  • Debug locations will not be added to code inside AMDGPU function prologue as it is entirely compiler-generated. D113100
  • The AMDGPU_Gfx calling convention now defines s[4:29] SGPR registers as callee-saved. D111637
  • CUDA landed support for assumptions predicates for pointer address spaces, e.g., __builtin_assume(__isGlobal(p)). These predicates are portable across Clang and NVCC. D112041
  • Special NVPTX registers were marked as reserved to avoid machine verifier errors. D113008

MLIR

Discussions

Commits

  • SPIR-V to LLVM conversion now supports shufflevector with static indices. D112161
  • arith.bitcast, std.br, and std.cond_br to SPIR-V conversions were added. D112819

OpenMP (Target Offloading)

Discussions

Commits

External Compilers

LLPC

  • Middle-end now supports the New Pass Manager but still default to the Legacy Pass Manager. LLPC#1459

oneAPI DPC++

CUDA/HIP support

  • Added initial support for the Tensorcore matrix extension.
  • CUDA and HIP plugins were renamed to align with the SYCL 2020 specification.
  • Fixed preprocessor-only compilation mode for the NVPTX target.

SYCL 2020 support

  • Implemented property traits.
  • Fixed interoperability kernel API returning invalid kernel bundle.
  • Improved performance of sycl::queue::submit() for commands without dependencies.

Non-standard extensions

  • Fixed a bug in SVM gather/scatter emulation on host for ESIMD extension.
  • Implemented no_offset compile-time accessor property.
  • Enhanced matrix extension specification with JIT support for AMX and DPAS ISA.
  • The SPV_INTEL_hw_thread_queries extension support was enabled by default.

Misc