Issue #32
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 2 to April 15 2022.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Community Events
LLVM and Clang
Discussions
- A new document describing the plans for Clang HLSL Support has been posted on Clang’s documentation site
- The HLSL effort is going to be using a GitHub project to track progress publicly.
- Rama reported issues with OpenMP offloading (sm_75) installation with LLVM 14. There are no replies at the time of writing.
Commits
- Added DXILPrepare CodeGen pass. D122081
- Added HLSL Semantic parsing. D122699
- HLSL does not support pointers or references. D123167
- (In-review) DXIL CodeGen:
- (In-review) HLSL:
- Add DXC driver mode and target profile flag. D122865
- The patch necessary to compile CUDA with the new driver is up for review D123812. This enables the following features:
- Full RDC-compilation with Clang.
- LTO when using CUDA.
- Static library support for CUDA.
- Interoperability with OpenMP Offloading.
MLIR
Discussions
- Thomas started a discussion to land a new dialect to host NVIDIA specific GPU operations.
- Thomas revived the discussion to improve vector dialect for GPU SIMT programming.
- Runxin Zhong noticed that running MLIR on GPUs without a JIT comes at the cost of runtime calls to load and unload cubin modules. They are interested in avoiding this runtime call overhead, especially for host loopy code invoking small kernels. There are no replies at the time of writing.
Commits
gpu.lane_id
,gpu.mma.sync
andgpu.mma.ldmatrix
ops are added into the GPU dialect to target low-level NVVM intrinsics. D123647- A new
nvgpu
dialect was created to host NVIDIA GPU specific operations. D123266 gpu.mma.sync
andgpu.mma.ldmatrix
ops are moved into the newnvgpu
dialect. D123824
OpenMP (Target Offloading)
Discussions
- A presentation on the new driver was given at the LLVM performance workshop at CGO 2022.
Commits
- A new binary format for the new driver was added. This is conceptually similar to CUDA’s fatbinary format. D122069
- The offloading sections used by the new driver are now
SHF_EXCLUDE
. D122987 - Fixed a crash on AMDGPU involving multiple registers with DWARF information. D123717
- Fixed the vectorizer’s dependence on a
bitcast
instruction following opaque pointers. D123694.
External Compilers
LLPC
- Added support for inline assembly in GLSL’s
debugPrintfEXT
instructions. LLPC#1769 - The part-pipeline compilation scheme has been merged. This scheme provides separate fragment shader compilation and input packing. LLPC#1704