Issue #17

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from July 23 to August 5 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

The talk schedule for the 2021 X.Org Developers Conference is available. This year, the conference is virtual and happens on September 15-17. The compiler/GPU-related talks include:

LLVM and Clang

Discussions

Luke Kenneth Casson Leighton posted an RFC: “Vector/SIMD ISA Context Abstraction”. Luke is working on SVP64 Cray-like Vector Extensions for the Power ISA, which is being designed for Hybrid CPU, VPU and 3D GPU workloads. One of the problems mentioned is that some ISA designs may lead to combinatorial explosion in the number of intrinsics, which can be avoided by “separating out ‘scalar base’ from ‘augmentation’ throughout the IR”. Renato Golin replied that, historically, LLVM tried to keep as many instructions as native IR as possible to avoid the explosion of intrinsics. However, intrinsics tend to reduce the number of program instructions, so there needs to be some balance.
席致寧 asked about a quick way to add a new instruction to generated PTX files, without having to implement full support for the new instruction in the backend. There are no replies as of writing.

Commits

HIP switched to using DWARF version 5 by default. D107190
It is now possible to force-enable MemCpyOpt with a new LLVM flag -enable-memcpyopt-without-libcalls. For now, only the CUDA frontend opts into it, to better exercise this optimization. D106401
A new Attributor pass for deducing AMDGPU-specific attributes was added. D104997
The NVPTX matrix operation intrinsics were extended with the ldmatrix.sync.aligned warp-level matrix load instructions introduced in PTX 6.5. D107046
Clang learned to preserve ASAN library functions when targeting HIP. D106315
A number of GlobalISel enhancements for AMDGPU.

MLIR

Discussions

Commits

A populateGpuToLLVMConversionPatterns entry point is added for collecting all LLVM GPU to LLVM conversion patterns. D107218
Two boolean loading/storing issues were fixed in SPIR-V conversion.
A few issues in the SPIR-V module combiner were fixed. D106886
MemRef/Math to SPIR-V conversions are split into their own directories and files.

OpenMP (Target Offloading)

Discussions

Andrew Marshall is having issues with building LLVM 12 for OpenMP. There are no replies as of writing.

Commits

Users can now enable the new experimental device runtime library by passing the -fopenmp-target-new-runtime flag. D106793
Linking of match libraries is now supported for AMDGPU when -lm is specified. D104904, D105981

External Compilers

LLPC

Mesa

LLVMpipe gained a linear rasterizer optimized for 2D rendering. The changes yield a 2x to 3x performance improvement for 2D workloads.