Issue #8

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from March 5 to March 18 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conference Talks

LLVM and Clang

Discussions

Johannes Doerfert asks about NVPTX support for llvm math functions (e.g., llvm.sin). NVPTX does not provide libc and libm, though some math functions are implemented through the libdevice bitcode module. A solution would be to teach Clang or the LLVM middleend how to match __nv_* functions to the LLVM ones. Johannes implemented a prototype that adds such function mapping support through LLVM IR attributes.
Jay Foad expressed interest in using llvm-mca for AMDGPU and asked about the difference between MicroOpBufferSize=0/1. Based on the response from Andrew Trick, Jay implemented a patch that adds llvm-mca support for in-order CPUs.
Konrad Trifunovic summarized the discussion on upstreaming a SPIR-V backend and shared a rough plan with short and long-term objectives.
Anastasia Stulova summarized the discussion on a new file extension for C++ OpenCL sources. The default would be .clcpp now, matching the compiler option -cl-std=clc++. The Phabricator Clang patch is awaiting any last feedback before committing.

Commits

AMDGPU switched from using individual cache operands (GLC, SLC, DLC) to a single cache_policy bitmask operand. This reduces the amount of Machine IR code.
Fixes for the GFX90a AMDGPU target:
- disable lds_direct,
- SCC support on buffer atomics.
Split some of the AMDGPU instructions predicated on the dot2-insts target feature into a new dot7-insts, in preparation for subtargets that have some but not all of these instructions.
SYCL driver options were reworked. A new language option (SYCLIsHost) is used to identify host executions. -fsycl and -fno-sycl became driver-only options rejected when passed to -cc1.
(In-review) HIP diagnostic for aggregate arguments containing half-precision types. GCC and Clang do not have a consistent ABI for half-precision types, so passing these between the two compilers may result in Undefined Behavior.

MLIR

Discussions

Commits

CUDA/ROCDL kernel to blob conversion is now in a pass registered to mlir-opt.
mlir-cuda-runner and mlir-rocm-runner are gone; integration tests now use mlir-opt and mlir-cpu-runner.
The SPIR-V dialect sees more ops for Vulkan graphics: spv.Image.
A few more patches landed into the SPIR-V dialect to improve op naming consistency.

OpenMP (Target Offloading)

Discussions

The redesign of the memory globalization for GPUs is making progress. The original patch has been refined and entered the testing stage. Alone it will regress performance significantly but it opens the possibility to optimize the code further. The first optimization has been approved.
A redesign of the device runtime has been started, based on earlier, smaller patches ([1], [2]). The overall process is not done but the various bugs in our OpenMP handling have been already found: [3], [4], [5].

Commits

Initial support for the OpenMP 5.1 interop directive has been committed. This adds basic parsing/sema/serialization support for #pragma omp interop.
Only build one bitcode library for each SM on NVPTX targets.
The AMDGPU host plugin is now built by default.
The AMDGPU device runtime was briefly built by default but there are issues if the AMDGPU target is not available and the patch has been reverted until those are cleared.
As a middle step in the device runtime redesign we removed 20% of the memory allocated to support dynamic scheduling in favor of dynamic allocations. You will notice only the memory savings if you do not run dynamic schedules on the device (which you probably should not).

External Compilers

LLPC

LLPC switched to using the upstream LLVM implementation of demote to helper. This is used by the discard-to-demote transformation that allows shaders with the OpKill SPIR-V instructions to behave like a helper invocation (see OpDemoteToHelperInvocationEXT) instead of terminating the thread.

Mesa

Initial support for GFX90a AMDGPU landed.

Industry News and Conference Talks

LLVM and Clang

Discussions

Commits

MLIR

Discussions

Commits

OpenMP (Target Offloading)

Discussions

Commits

External Compilers

LLPC

Mesa

SYCL