Issue #32

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 2 to April 15 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Community Events

LLVM and Clang

Discussions

A new document describing the plans for Clang HLSL Support has been posted on Clang’s documentation site
The HLSL effort is going to be using a GitHub project to track progress publicly.
Rama reported issues with OpenMP offloading (sm_75) installation with LLVM 14. There are no replies at the time of writing.

Commits

Added DXILPrepare CodeGen pass. D122081
Added HLSL Semantic parsing. D122699
HLSL does not support pointers or references. D123167
(In-review) DXIL CodeGen:
- Add DXIL Bitcode Writer and DXIL testing.D122082
- Three additional patches add support for opaque pointers:
  - Add pointer type analysis. D122268
  - Update DXIL Prepare to emit no-op bitcasts. D122269
  - Convert opaque to typed pointers in DXIL emission. D122270
(In-review) HLSL:
- Add DXC driver mode and target profile flag. D122865
The patch necessary to compile CUDA with the new driver is up for review D123812. This enables the following features:
- Full RDC-compilation with Clang.
- LTO when using CUDA.
- Static library support for CUDA.
- Interoperability with OpenMP Offloading.

MLIR

Discussions

Thomas started a discussion to land a new dialect to host NVIDIA specific GPU operations.
Thomas revived the discussion to improve vector dialect for GPU SIMT programming.
Runxin Zhong noticed that running MLIR on GPUs without a JIT comes at the cost of runtime calls to load and unload cubin modules. They are interested in avoiding this runtime call overhead, especially for host loopy code invoking small kernels. There are no replies at the time of writing.

Commits

gpu.lane_id, gpu.mma.sync and gpu.mma.ldmatrix ops are added into the GPU dialect to target low-level NVVM intrinsics. D123647
A new nvgpu dialect was created to host NVIDIA GPU specific operations. D123266
gpu.mma.sync and gpu.mma.ldmatrix ops are moved into the new nvgpu dialect. D123824

OpenMP (Target Offloading)

Discussions

A presentation on the new driver was given at the LLVM performance workshop at CGO 2022.

Commits

A new binary format for the new driver was added. This is conceptually similar to CUDA’s fatbinary format. D122069
The offloading sections used by the new driver are now SHF_EXCLUDE. D122987
Fixed a crash on AMDGPU involving multiple registers with DWARF information. D123717
Fixed the vectorizer’s dependence on a bitcast instruction following opaque pointers. D123694.

External Compilers

LLPC

Added support for inline assembly in GLSL’s debugPrintfEXT instructions. LLPC#1769
The part-pipeline compilation scheme has been merged. This scheme provides separate fragment shader compilation and input packing. LLPC#1704