Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 2 to April 15 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Community Events

LLVM and Clang

Discussions

Commits

  • Added DXILPrepare CodeGen pass. D122081
  • Added HLSL Semantic parsing. D122699
  • HLSL does not support pointers or references. D123167
  • (In-review) DXIL CodeGen:
    • Add DXIL Bitcode Writer and DXIL testing.D122082
    • Three additional patches add support for opaque pointers:
      • Add pointer type analysis. D122268
      • Update DXIL Prepare to emit no-op bitcasts. D122269
      • Convert opaque to typed pointers in DXIL emission. D122270
  • (In-review) HLSL:
    • Add DXC driver mode and target profile flag. D122865
  • The patch necessary to compile CUDA with the new driver is up for review D123812. This enables the following features:
    • Full RDC-compilation with Clang.
    • LTO when using CUDA.
    • Static library support for CUDA.
    • Interoperability with OpenMP Offloading.

MLIR

Discussions

  • Thomas started a discussion to land a new dialect to host NVIDIA specific GPU operations.
  • Thomas revived the discussion to improve vector dialect for GPU SIMT programming.
  • Runxin Zhong noticed that running MLIR on GPUs without a JIT comes at the cost of runtime calls to load and unload cubin modules. They are interested in avoiding this runtime call overhead, especially for host loopy code invoking small kernels. There are no replies at the time of writing.

Commits

  • gpu.lane_id, gpu.mma.sync and gpu.mma.ldmatrix ops are added into the GPU dialect to target low-level NVVM intrinsics. D123647
  • A new nvgpu dialect was created to host NVIDIA GPU specific operations. D123266
  • gpu.mma.sync and gpu.mma.ldmatrix ops are moved into the new nvgpu dialect. D123824

OpenMP (Target Offloading)

Discussions

Commits

  • A new binary format for the new driver was added. This is conceptually similar to CUDA’s fatbinary format. D122069
  • The offloading sections used by the new driver are now SHF_EXCLUDE. D122987
  • Fixed a crash on AMDGPU involving multiple registers with DWARF information. D123717
  • Fixed the vectorizer’s dependence on a bitcast instruction following opaque pointers. D123694.

External Compilers

LLPC