Issue #34

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 30 to May 13 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Community Events

LLVM and Clang

Discussions

Artem Belevich started a thread on conditional dependencies on proprietary tools in LIT tests. The discussion was kicked off by the recent addition of NVPTX tests that utilized the proprietary assembler ptxas from Nvidia.
MoZeWei reported issues with compiling CUDA 11.5.0 variadic functions and other definitions like thread_scope. Artem Belevich explained that this flow is not fully supported and some compiler builtins used by libcudacxx have not been implemented in Clang yet.

Commits

The new driver can now compile CUDA code in RDC-mode and in LTO-mode using the new driver via --offload-new-driver flag. D123812
Added SPIR-V-specific intrinsics required to keep during translation from llvm IR to MIR. D124416
HLSL and DXIL support:
- The half type is now enabled in HLSL. D124790
- The -fcgl flag was added to allow checking codegen output for HSLS. D124983
- A pass to lower llvm intrinsics into DXIL op functions landed. D124805
- A pass to emit DXIL metadata based on llvm IR metadata landed. D125158
- Added DXBC file magic identification for the DirectX container file format. 966c40a
A new string attribute, amdgpu-requires-module-lds, was introduced to allow eliding the module.lds block from kernels. D122091
Tablegen definitions of AMDGPU gfx11 subtarget features were added. D125261

MLIR

Discussions

Commits

An AMDGPU dialect was added. The dialect contains AMDGPU-specific wrappers for raw buffer intrinsics. D122765
Async copy ops were moved to the NVGPU dialect. D125244
A new canonicalizer for gpu.memcpy was added. D124257

OpenMP (Target Offloading)

Discussions

Added a new clang-offload-packager tool to bundle offloading binaries together similar to CUDA’s fatbinary. D125165
Improved static library support for offloading, now only includes architectures that are used by the application. D125092
OpenMP can now be compiled with multiple architectures at once via --offload-arch. D124721
OpenMP can now infer the target triple from --offload-arch so -fopenmp-targets= is not always necessary. D125050
Fixed an alignment bug when extracting archive members for offloading compilation. 42a1fb5ca

Commits

atomicrmw can be emitted when using atomic update with floating-point variables (FP32 and FP64). D124724

External Compilers

LLPC

The standalone compiler amdllpc tool now links statically with spvgen. LLPC#1774