Issue #18
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from August 6 to August 19 2021.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conferences
LLVM and Clang
Discussions
- Henry Linjamäki posted an RFC on ‘Emitting HIP device code as SPIR-V’. The initial focus is to use SPIR-V binaries for OpenCL-like environments: HIPCL, and HIPLZ (Intel Level Zero API). Others point out that this proposal has a large overlap with what changes would be necessary for OpenCL and OpenMP offloading and that it would be preferred to keep non-HIP parts generic. Initially, the project may use the SPIRV-LLVM translator as an external tool until the SPIR-V backend (SPIR-V BE) is available. Anastasia Stulova suggests waiting for the SPIR-V BE, to help with its adoption and avoid having to redesign parts of the codebase.
- Frank Winter is working on a JIT compiler targeting AMDGPU GCN and asked for help with using LLD to link files passed through memory instead of files. Fangrui Song responded that LLD is generally not ready to be used as a library and gave a few suggestions.
- Dustin Favorite suspects that
__nvvm_redux_sync_*
CUDA intrinsics are not working as expected and result in ‘Illegal instruction’ errors. There are no replies at the time of writing.
Commits
- New Clang language defines for C++ for OpenCL version 2021. The language standard is
lang_openclcpp2021
, enabled with the-cl-std=clc++2021
flag. - Rematerialization of virtual register uses is not allowed. The change affected AMDGPU, ARM/Thumb, MIPS, RISC-V, and x86. D106408
- Machine LICM (Loop Invariant Code Motion) will not rematterialzie instructions with virtual register uses. Such an instruction will not be hoisted out of the loop in a belief that the register allocator will sink it back if needed. This impacts AMDGPU. D107677
alias.scope
metadata is added to lowered AMDGPU LDS struct to improve alias analysis. D108315GlobalISel
gained a combine forPTR_ADD
with register banks. D103326
MLIR
Discussions
- There are questions about marking
gpu::LaunchFuncOp
async and how to lower and runlinalg.matmul
using CUDA.
Commits
spv.(InBounds)PtrAccessChain
ops are defined.
OpenMP (Target Offloading)
Discussions
Commits
- A bunch of refactoring patches by Jon Chesterfield: