Issue #37

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from June 18 to July 8, 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Community Events

The call for proposals for the 2022 LLVM Developers’ Meeting is open until August 30.

LLVM and Clang

Discussions

The discussion on representing pointers to special resource types continued on discourse thanks to Joshua Cranmer who put together an RFC. Alex Bradbury noticed similarities to that the problem of supporting WebAssembly GC type support. The discussion is scheduled to continue during the upcoming LLVM GPU Working Group meeting on July 15.
Nicolai Hähnle proposed to ‘make LLVM play nice(r) when used as a shared library in a plugin setting’. The two primary use cases are plugin dynamic loading/unloading and dynamic linking of the LLVM shared object used by one or more plugins. The initial change stack includes the removal of MagicStatics and unregistering cl::Options upon destruction.

Commits

A large number of patches for the gfx11 AMDGPU in-development target landed.
--offload-arch= now supports multiple comma separated values. D128206
Improved the binary handling of the offloading section by adding a new ELF section. D129052
Introduced !exclude metadata to make globals use the SHF_EXCLUDE section flag to better support the offloading section. D129151
Added the --offloading option to llvm-objdump to display embedded device code in the offloading section, similar to cuobjdump. D126904
Introduced SPIR-V global entity tracking and deduplication infrastructure. D128471
Added thread/group ID DXIL operations. D127990
Added support for opaque pointers for ValueAsMetadata in DXILBitcodeWriter. D127705
Added a new HIP option: -fhip-kernel-arg-name. D128022

MLIR

Discussions

Bruno Cardoso Lopes and Nathan Lanza proposed an MLIR-based Clang IR. The discussion summary clarified the initial target is being a part of the regular compilation pipeline, both for codegen and static analysis. A new MLIR C/C++ Frontend Working Group is being formed.

Commits

Added a shared memory access optimization pass. D127457
Defined MLIR wrappers around new MFMA intrinsics. D128079
Added --chipset option to AMDGPUToROCDL. D129228
Added conversion from math.round to SPIR-V GLSL/OpenCL ops. D129236
Added more comparison directions in arith.cmpi to SPIR-V conversion. D128692
Added InferIntRangeInterface to gpu.launch. D129036

OpenMP (Target Offloading)

Discussions

Commits

atomic compare and atomic compare capture now support floating-point variables. D127041, D127042
Fixed the issue that peer-to-peer memory copy on Nvidia GPU doesn’t work D122764.
Heap2Stack (also used to remove globalized locals) is now loop-aware which often allows placing new allocas in the entry block. This can improve performance and avoid issues with stacksave/restore intrinsics introduced by the inliner. commit 1, commit2
Implemented a unified interface for kernel launches in libomptarget. D128549, D128817
Improved link times and temporary file handling in the linker wrapper by writing to disk only when necessary. D127246
Added an extension to omp variant begin that mangles function declarations as well. D124624
Reworked argument handling in the linker wrapper to make adding new arguments easier. commit

External Compilers

LLPC

Transition to opaque pointers continues. LLPC#1839