Issue #8
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from March 5 to March 18 2021.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conference Talks
LLVM and Clang
Discussions
- Johannes Doerfert asks about NVPTX support for llvm math functions (e.g.,
llvm.sin
). NVPTX does not providelibc
andlibm
, though some math functions are implemented through thelibdevice
bitcode module. A solution would be to teach Clang or the LLVM middleend how to match__nv_*
functions to the LLVM ones. Johannes implemented a prototype that adds such function mapping support through LLVM IR attributes. - Jay Foad expressed interest in using llvm-mca for AMDGPU and asked about the difference between
MicroOpBufferSize=0/1
. Based on the response from Andrew Trick, Jay implemented a patch that adds llvm-mca support for in-order CPUs. - Konrad Trifunovic summarized the discussion on upstreaming a SPIR-V backend and shared a rough plan with short and long-term objectives.
- Anastasia Stulova summarized the discussion on a new file extension for C++ OpenCL sources. The default would be
.clcpp
now, matching the compiler option-cl-std=clc++
. The Phabricator Clang patch is awaiting any last feedback before committing.
Commits
- AMDGPU switched from using individual cache operands (GLC, SLC, DLC) to a single
cache_policy
bitmask operand. This reduces the amount of Machine IR code. - Fixes for the GFX90a AMDGPU target:
- Split some of the AMDGPU instructions predicated on the
dot2-insts
target feature into a newdot7-insts
, in preparation for subtargets that have some but not all of these instructions. - SYCL driver options were reworked. A new language option (
SYCLIsHost
) is used to identify host executions.-fsycl
and-fno-sycl
became driver-only options rejected when passed to-cc1
. - (In-review) HIP diagnostic for aggregate arguments containing half-precision types. GCC and Clang do not have a consistent ABI for half-precision types, so passing these between the two compilers may result in Undefined Behavior.
MLIR
Discussions
Commits
- CUDA/ROCDL kernel to blob conversion is now in a pass registered to
mlir-opt
. mlir-cuda-runner
andmlir-rocm-runner
are gone; integration tests now usemlir-opt
andmlir-cpu-runner
.- The SPIR-V dialect sees more ops for Vulkan graphics:
spv.Image
. - A few more patches landed into the SPIR-V dialect to improve op naming consistency.
OpenMP (Target Offloading)
Discussions
- The redesign of the memory globalization for GPUs is making progress. The original patch has been refined and entered the testing stage. Alone it will regress performance significantly but it opens the possibility to optimize the code further. The first optimization has been approved.
- A redesign of the device runtime has been started, based on earlier, smaller patches ([1], [2]). The overall process is not done but the various bugs in our OpenMP handling have been already found: [3], [4], [5].
Commits
- Initial support for the OpenMP 5.1
interop
directive has been committed. This adds basic parsing/sema/serialization support for#pragma omp interop
. - Only build one bitcode library for each SM on NVPTX targets.
- The AMDGPU host plugin is now built by default.
- The AMDGPU device runtime was briefly built by default but there are issues if the AMDGPU target is not available and the patch has been reverted until those are cleared.
- As a middle step in the device runtime redesign we removed 20% of the memory allocated to support dynamic scheduling in favor of dynamic allocations. You will notice only the memory savings if you do not run dynamic schedules on the device (which you probably should not).
External Compilers
LLPC
- LLPC switched to using the upstream LLVM implementation of demote to helper. This is used by the discard-to-demote transformation that allows shaders with the
OpKill
SPIR-V instructions to behave like a helper invocation (seeOpDemoteToHelperInvocationEXT
) instead of terminating the thread.
Mesa
- Initial support for GFX90a AMDGPU landed.