Issue #23

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from October 29 to November 11 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

2021 LLVM Developers’ Meeting is on November 17-19. The agenda features a number of GPU-related sessions:
- LTO and JIT Support in LLVM OpenMP Target Offloading (Thursday)
- OpenCL support in clang status: OpenCL C 3.0, improvements and future directions (Thursday)
- Late Divergence Analysis (Thursday)
- SPIR-V support in LLVM and Clang (Friday)
- Optimizing OpenMP GPU Execution in LLVM (Friday)

LLVM and Clang

Discussions

Johannes Doerfert is gathering feedback on starting a ‘GPU working group’ which would focus on ‘development of GPU-specific optimizations, organization of driver and backend code, dealing with issues like convergent functions,’ etc. The thread received a number of positive replies with some ideas for additional topics of interest.

Commits

Two new architectures, spirv32 and spirv64, were ported from the LLVM SPIR-V backend Khronos repository. This initially reuses the target information and code generation from the existing SPIR support. D109144
The latest supported CUDA version was bumped to 11.5. D113249
Debug locations will not be added to code inside AMDGPU function prologue as it is entirely compiler-generated. D113100
The AMDGPU_Gfx calling convention now defines s[4:29] SGPR registers as callee-saved. D111637
CUDA landed support for assumptions predicates for pointer address spaces, e.g., __builtin_assume(__isGlobal(p)). These predicates are portable across Clang and NVCC. D112041
Special NVPTX registers were marked as reserved to avoid machine verifier errors. D113008

MLIR

Discussions

Commits

SPIR-V to LLVM conversion now supports shufflevector with static indices. D112161
arith.bitcast, std.br, and std.cond_br to SPIR-V conversions were added. D112819

OpenMP (Target Offloading)

Discussions

Commits

External Compilers

LLPC

Middle-end now supports the New Pass Manager but still default to the Legacy Pass Manager. LLPC#1459

oneAPI DPC++

CUDA/HIP support

Added initial support for the Tensorcore matrix extension.
CUDA and HIP plugins were renamed to align with the SYCL 2020 specification.
Fixed preprocessor-only compilation mode for the NVPTX target.

SYCL 2020 support

Implemented property traits.
Fixed interoperability kernel API returning invalid kernel bundle.
Improved performance of sycl::queue::submit() for commands without dependencies.

Non-standard extensions

Fixed a bug in SVM gather/scatter emulation on host for ESIMD extension.
Implemented no_offset compile-time accessor property.
Enhanced matrix extension specification with JIT support for AMX and DPAS ISA.
The SPV_INTEL_hw_thread_queries extension support was enabled by default.

Misc

Release notes for September were published.