Issue #1

Welcome to the first issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from November 27 to December 10 2020.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conference Talks

AMD published the RDNA 2 Instruction Set Architecture manual. Some notable changes from the previous GCN ISA are:
- ray tracing support,
- new dot product ALU operations for accelerated inference and deep learning,
- VGPR and LDS allocation-unit size were doubled,
- legacy multiply-add instructions were removed (superseded by fused-multiply-add).

LLVM and Clang

Discussions

Jay Foad ran into issues with preserved and required transitive analyses in the Legacy Pass Manager in AMDGPU. Jay proposes to add a new pass preservation rule, but some existing passes currently violate it. There are no replies as of writing.
Arthur Eubanks is working towards enabling the New Pass Manager. Arthur looked into AMDGPU support for the NPN and points out that passes that depended on TargetMachine::adjustPassManager need to be tweaked to work with the NPN.
João Paulo L. de Carvalho asked about modeling address space casts in the Scalar Evolution analysis. This prevents simple SYCL loops from being vectorized. There are no replies as of writing.
Nichols A. Romero proposed to add Fortran tests to the LLVM Test Suite. The tests will focus on language features, high-performance proxy programs, and OpenMP multi-threading and GPU offloading. The response seems overwhelmingly positive so far.

Commits

(In-review) Ongoing work and discussion on Adding convergence control operand bundle and intrinsics to LLVM IR.
Clang Offload Bundler gained AMDGPU code object V4 ABI documentation.
Various fixes to AMDGPU assembler diagnostics: [1], [2], [3].
(In-review) Don’t sink ptrtoint/inttoptr sequences into non-noop address space casts. This resolves an illegal memory access with atomic shared memory JuliaGPU bug.
CUDA/HIP hostness function overloading fixes. A new -fgpu-exclude-wrong-side-overloads Clang flag controls the related behavior.

MLIR

Discussions

Commits

gpu.allocate and gpu.deallocate ops were added to runtime function calls.
The GpuAsyncRegionPass learned to move gpu.wait ops from async.execute regions to its dependencies. This prevents unnecessary host synchronization.

External Compilers

Please submit pointers to your mailing lists, forums, or newsletters if you want your LLVM- or MLIR-based GPU compiler project to be covered in future LLVM GPU News issues.

Issue #1

Industry News and Conference Talks

LLVM and Clang

Discussions

Commits

MLIR

Discussions

Commits

External Compilers

CUDA

JuliaGPU

LLPC

Mesa

SYCL