Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from June 18 to July 8, 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Community Events

LLVM and Clang

Discussions

Commits

  • A large number of patches for the gfx11 AMDGPU in-development target landed.
  • --offload-arch= now supports multiple comma separated values. D128206
  • Improved the binary handling of the offloading section by adding a new ELF section. D129052
  • Introduced !exclude metadata to make globals use the SHF_EXCLUDE section flag to better support the offloading section. D129151
  • Added the --offloading option to llvm-objdump to display embedded device code in the offloading section, similar to cuobjdump. D126904
  • Introduced SPIR-V global entity tracking and deduplication infrastructure. D128471
  • Added thread/group ID DXIL operations. D127990
  • Added support for opaque pointers for ValueAsMetadata in DXILBitcodeWriter. D127705
  • Added a new HIP option: -fhip-kernel-arg-name. D128022

MLIR

Discussions

Commits

  • Added a shared memory access optimization pass. D127457
  • Defined MLIR wrappers around new MFMA intrinsics. D128079
  • Added --chipset option to AMDGPUToROCDL. D129228
  • Added conversion from math.round to SPIR-V GLSL/OpenCL ops. D129236
  • Added more comparison directions in arith.cmpi to SPIR-V conversion. D128692
  • Added InferIntRangeInterface to gpu.launch. D129036

OpenMP (Target Offloading)

Discussions

Commits

  • atomic compare and atomic compare capture now support floating-point variables. D127041, D127042
  • Fixed the issue that peer-to-peer memory copy on Nvidia GPU doesn’t work D122764.
  • Heap2Stack (also used to remove globalized locals) is now loop-aware which often allows placing new allocas in the entry block. This can improve performance and avoid issues with stacksave/restore intrinsics introduced by the inliner. commit 1, commit2
  • Implemented a unified interface for kernel launches in libomptarget. D128549, D128817
  • Improved link times and temporary file handling in the linker wrapper by writing to disk only when necessary. D127246
  • Added an extension to omp variant begin that mangles function declarations as well. D124624
  • Reworked argument handling in the linker wrapper to make adding new arguments easier. commit

External Compilers

LLPC

  • Transition to opaque pointers continues. LLPC#1839