Issue #1
Welcome to the first issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from November 27 to December 10 2020.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conference Talks
- AMD published the RDNA 2 Instruction Set Architecture manual.
Some notable changes from the previous GCN ISA are:
- ray tracing support,
- new dot product ALU operations for accelerated inference and deep learning,
- VGPR and LDS allocation-unit size were doubled,
- legacy multiply-add instructions were removed (superseded by fused-multiply-add).
LLVM and Clang
Discussions
- Jay Foad ran into issues with preserved and required transitive analyses in the Legacy Pass Manager in AMDGPU. Jay proposes to add a new pass preservation rule, but some existing passes currently violate it. There are no replies as of writing.
- Arthur Eubanks is working towards enabling the New Pass Manager.
Arthur looked into AMDGPU support for the NPN and points out that
passes that depended on
TargetMachine::adjustPassManager
need to be tweaked to work with the NPN. - João Paulo L. de Carvalho asked about modeling address space casts in the Scalar Evolution analysis. This prevents simple SYCL loops from being vectorized. There are no replies as of writing.
- Nichols A. Romero proposed to add Fortran tests to the LLVM Test Suite. The tests will focus on language features, high-performance proxy programs, and OpenMP multi-threading and GPU offloading. The response seems overwhelmingly positive so far.
Commits
- (In-review) Ongoing work and discussion on Adding convergence control operand bundle and intrinsics to LLVM IR.
- Clang Offload Bundler gained AMDGPU code object V4 ABI documentation.
- Various fixes to AMDGPU assembler diagnostics: [1], [2], [3].
- (In-review) Don’t sink ptrtoint/inttoptr sequences into non-noop address space casts. This resolves an illegal memory access with atomic shared memory JuliaGPU bug.
- CUDA/HIP hostness function overloading fixes.
A new
-fgpu-exclude-wrong-side-overloads
Clang flag controls the related behavior.
MLIR
Discussions
Commits
gpu.allocate
andgpu.deallocate
ops were added to runtime function calls.- The
GpuAsyncRegionPass
learned to movegpu.wait
ops fromasync.execute
regions to its dependencies. This prevents unnecessary host synchronization.
External Compilers
Please submit pointers to your mailing lists, forums, or newsletters if you want your LLVM- or MLIR-based GPU compiler project to be covered in future LLVM GPU News issues.