Issue #27

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from January 14 to January 27 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

Alyssa Rosenzweig blogged about ‘Writing an open source GPU driver – without the hardware’. The post describes various testing techniques that allowed them to implement driver/compiler support for ‘Valhall’ ARM GPUs before they were able to test it on a real Linux system: ‘It only took me a few days after getting the hardware and a serial cable to pass hundreds of tests on the new architecture. Months of speculatively developing the driver came with a huge pay off.’

LLVM and Clang

Discussions

Artem Belevich proposed to change the NVPTX calling convention for aggregate function arguments passed by value. This would make Clang pass aggregate values directly as opposed to via byval pointers. The motivation is to allow SROA (the Scalar Replacement of Aggregates pass) to eliminate more local copies.
Vasileios Porpodas posted an RFC on ‘Spill2Reg: Selectively replace spills to stack with spills to vector registers’. Although the optimization was primarily evaluated on x86 (Skylake), Madhur Amilkanthwar noted that the AMDGPU backend has a similar pass.

Commits

Added sycl_special_class attribute for SYCL types that need additional processing as kernel parameters. D114483

MLIR

Discussions

Commits

Replaced reference to LLVMFuncOp with FuncOpInterface in the GPU dialect. D118172
Added support for lowering ExpM1 to both SPIR-V GLSL/OpenCL instructions. D118081
Added support for lowering math.fma to SPIR-V. D117704
Added support for size-1 vector inserts to SPIR-V conversion. D115517

OpenMP (Target Offloading)

Discussions

Commits

The annoying incompatible triple warnings have been fixed. D118495, D117634
The new OpenMP offloading driver and linker will be upstreamed and available in LLVM 14 with support for Nvidia and AMD offloading. D116541
An optimization to remove redundant barriers will be upstreamed. D118002
The old OpenMP device offloading runtime is not built by default anymore in preparation for it being removed. D118268
Linked-in math libraries (libm) are now optimized on AMDGPU. D116906

External Compilers

LLPC

oneAPI DPC++

CUDA/HIP support

Improved atomics support on the CUDA backend:
- Added atomic loads and stores with various memory orders and scopes. DPCPP#5191, DPCPP#4853
- Added remaining atomics. DPCPP#5192
Added support for bf16, (u)int8, and half data types in matrix operations on the CUDA backend. DPCPP#5009
Disabled by default the image support on the CUDA backend. DPCPP#5256
Fixed PI_MEM_ALLOC_DEVICE USM pointer query for CUDA and HIP backends. DPCPP#5325
CUDA_PATH is used as a candidate for path discovery. DPCPP#5194
Fixed up and down shuffles and reductions on AMD GPU. DPCPP#5359
Enabled part of hierarchical parallelism functionality on AMD GPU by applying an LLVM transformation pass. DPCPP#5149

SYCL 2020 support

Added declaration of sycl::decorated enum class. DPCPP#5399

Non-standard extensions

Updated property_list specification to revision 2 (DPCPP#5338 and DPCPP#5410) and added an implementation design document (DPCPP#4937 and DPCPP#5341).
Updated the KernelProperties extension. DPCPP#5343
Clarified the interaction between InvokeSIMD and SYCL_EXTERNAL in the InvokeSIMD extension specification. DPCPP#5353
Updated documentation for Explicit SIMD. DPCPP#5383, DPCPP#5368
Fixed sampler/stream parameter handling for ESIMD kernels. DPCPP#5165
Added the esimd::merge free function. DPCPP#5308

Misc

Migrate a number of SYCL specific LLVM passes to the New Pass Manager. DPCPP#5324, DPCPP#5300, DPCPP#5337, DPCPP#5301, DPCPP#5291, DPCPP#5292, DPCPP#5323
Added experimental support for L0 host ptr import to USM buffer. DPCPP#4891