Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 30 to May 13 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Community Events

LLVM and Clang

Discussions

Commits

  • The new driver can now compile CUDA code in RDC-mode and in LTO-mode using the new driver via --offload-new-driver flag. D123812
  • Added SPIR-V-specific intrinsics required to keep during translation from llvm IR to MIR. D124416
  • HLSL and DXIL support:
    • The half type is now enabled in HLSL. D124790
    • The -fcgl flag was added to allow checking codegen output for HSLS. D124983
    • A pass to lower llvm intrinsics into DXIL op functions landed. D124805
    • A pass to emit DXIL metadata based on llvm IR metadata landed. D125158
    • Added DXBC file magic identification for the DirectX container file format. 966c40a
  • A new string attribute, amdgpu-requires-module-lds, was introduced to allow eliding the module.lds block from kernels. D122091
  • Tablegen definitions of AMDGPU gfx11 subtarget features were added. D125261

MLIR

Discussions

Commits

  • An AMDGPU dialect was added. The dialect contains AMDGPU-specific wrappers for raw buffer intrinsics. D122765
  • Async copy ops were moved to the NVGPU dialect. D125244
  • A new canonicalizer for gpu.memcpy was added. D124257

OpenMP (Target Offloading)

Discussions

  • Added a new clang-offload-packager tool to bundle offloading binaries together similar to CUDA’s fatbinary. D125165
  • Improved static library support for offloading, now only includes architectures that are used by the application. D125092
  • OpenMP can now be compiled with multiple architectures at once via --offload-arch. D124721
  • OpenMP can now infer the target triple from --offload-arch so -fopenmp-targets= is not always necessary. D125050
  • Fixed an alignment bug when extracting archive members for offloading compilation. 42a1fb5ca

Commits

  • atomicrmw can be emitted when using atomic update with floating-point variables (FP32 and FP64). D124724

External Compilers

LLPC

  • The standalone compiler amdllpc tool now links statically with spvgen. LLPC#1774