Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from January 14 to January 27 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

  • Alyssa Rosenzweig blogged about ‘Writing an open source GPU driver – without the hardware’. The post describes various testing techniques that allowed them to implement driver/compiler support for ‘Valhall’ ARM GPUs before they were able to test it on a real Linux system: ‘It only took me a few days after getting the hardware and a serial cable to pass hundreds of tests on the new architecture. Months of speculatively developing the driver came with a huge pay off.’

LLVM and Clang

Discussions

Commits

  • Added sycl_special_class attribute for SYCL types that need additional processing as kernel parameters. D114483

MLIR

Discussions

Commits

  • Replaced reference to LLVMFuncOp with FuncOpInterface in the GPU dialect. D118172
  • Added support for lowering ExpM1 to both SPIR-V GLSL/OpenCL instructions. D118081
  • Added support for lowering math.fma to SPIR-V. D117704
  • Added support for size-1 vector inserts to SPIR-V conversion. D115517

OpenMP (Target Offloading)

Discussions

Commits

  • The annoying incompatible triple warnings have been fixed. D118495, D117634
  • The new OpenMP offloading driver and linker will be upstreamed and available in LLVM 14 with support for Nvidia and AMD offloading. D116541
  • An optimization to remove redundant barriers will be upstreamed. D118002
  • The old OpenMP device offloading runtime is not built by default anymore in preparation for it being removed. D118268
  • Linked-in math libraries (libm) are now optimized on AMDGPU. D116906

External Compilers

LLPC

oneAPI DPC++

CUDA/HIP support

  • Improved atomics support on the CUDA backend:
  • Added support for bf16, (u)int8, and half data types in matrix operations on the CUDA backend. DPCPP#5009
  • Disabled by default the image support on the CUDA backend. DPCPP#5256
  • Fixed PI_MEM_ALLOC_DEVICE USM pointer query for CUDA and HIP backends. DPCPP#5325
  • CUDA_PATH is used as a candidate for path discovery. DPCPP#5194
  • Fixed up and down shuffles and reductions on AMD GPU. DPCPP#5359
  • Enabled part of hierarchical parallelism functionality on AMD GPU by applying an LLVM transformation pass. DPCPP#5149

SYCL 2020 support

  • Added declaration of sycl::decorated enum class. DPCPP#5399

Non-standard extensions

  • Updated property_list specification to revision 2 (DPCPP#5338 and DPCPP#5410) and added an implementation design document (DPCPP#4937 and DPCPP#5341).
  • Updated the KernelProperties extension. DPCPP#5343
  • Clarified the interaction between InvokeSIMD and SYCL_EXTERNAL in the InvokeSIMD extension specification. DPCPP#5353
  • Updated documentation for Explicit SIMD. DPCPP#5383, DPCPP#5368
  • Fixed sampler/stream parameter handling for ESIMD kernels. DPCPP#5165
  • Added the esimd::merge free function. DPCPP#5308

Misc