Issue #34
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 30 to May 13 2022.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Community Events
LLVM and Clang
Discussions
- Artem Belevich started a thread on conditional dependencies on proprietary tools in LIT tests. The discussion was kicked off by the recent addition of NVPTX tests that utilized the proprietary assembler
ptxas
from Nvidia. - MoZeWei reported issues with compiling CUDA 11.5.0 variadic functions and other definitions like
thread_scope
. Artem Belevich explained that this flow is not fully supported and some compiler builtins used bylibcudacxx
have not been implemented in Clang yet.
Commits
- The new driver can now compile CUDA code in RDC-mode and in LTO-mode using the new driver via
--offload-new-driver
flag. D123812 - Added SPIR-V-specific intrinsics required to keep during translation from llvm IR to MIR. D124416
- HLSL and DXIL support:
- The
half
type is now enabled in HLSL. D124790 - The
-fcgl
flag was added to allow checking codegen output for HSLS. D124983 - A pass to lower llvm intrinsics into DXIL op functions landed. D124805
- A pass to emit DXIL metadata based on llvm IR metadata landed. D125158
- Added DXBC file magic identification for the DirectX container file format. 966c40a
- The
- A new string attribute,
amdgpu-requires-module-lds
, was introduced to allow eliding themodule.lds
block from kernels. D122091 - Tablegen definitions of AMDGPU gfx11 subtarget features were added. D125261
MLIR
Discussions
Commits
- An AMDGPU dialect was added. The dialect contains AMDGPU-specific wrappers for raw buffer intrinsics. D122765
- Async copy ops were moved to the NVGPU dialect. D125244
- A new canonicalizer for
gpu.memcpy
was added. D124257
OpenMP (Target Offloading)
Discussions
- Added a new
clang-offload-packager
tool to bundle offloading binaries together similar to CUDA’s fatbinary. D125165 - Improved static library support for offloading, now only includes architectures that are used by the application. D125092
- OpenMP can now be compiled with multiple architectures at once via
--offload-arch
. D124721 - OpenMP can now infer the target triple from
--offload-arch
so-fopenmp-targets=
is not always necessary. D125050 - Fixed an alignment bug when extracting archive members for offloading compilation. 42a1fb5ca
Commits
atomicrmw
can be emitted when usingatomic update
with floating-point variables (FP32 and FP64). D124724