Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from December 17 to January 13 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

LLVM and Clang

Discussions

Commits

  • A SPIR-V toolchain was added to Clang. SPIR-V code is generated by the external SPIRV-LLVM translator tool llvm-spirv, as a temporary solution until a SPIR-V backend lands in LLVM. D112410
  • The OpenCL documentation was updated with C++ for OpenCL 2021 support in Clang. D116271
  • CUDA/HIP now allow __int128 on the host side, even if not supported by the target device. D111047
  • NVPTX intrinsics and builtins for CUDA PTX cvt instructions were added for sm80 architectures and above. D116673
  • Enabled AMDGPU divergence predicates for not, min/max, and ctlz/cttz. D115884, D115954, D116044

MLIR

Discussions

Commits

  • The gpu.printf op is defined to support debugging. D110448
  • GPU kernel outlining supports the datalayout spec attribute (dlspec) now. D115722
  • A few SPIR-V serialization bugs regarding nested control flows were fixed. D115560, D115582
  • SPIR-V serialization allows explicit control over debug information emission. D115531

OpenMP (Target Offloading)

Discussions

  • The implementation of the new OpenMP offloading driver was discussed, slides.

Commits

  • 100% of OvO tests pass on AMDGPU after applying D116906 with optimizations.
  • The new OpenMP offloading driver is up for review at D116541 (click ‘Stack’ to see the full list of revisions) and can be pulled from here. It currently supports the following features:
    • Support for AMDGPU and NVPTX offloading targets.
    • New offloading object files are compatible with the host linker.
    • Functional static linking using archive libraries.
    • Device-side LTO for offloading applications.
    • Embedding LLVM IR for future JIT functionality.

External Compilers

LLPC

  • Added multi-threaded compilation support to the standalone compiler tool amdllpc. The main goal is to exercise threading in the compiler in the CI, without having to run the full AMDVLK Vulkan driver on a machine with a GPU. LLPC#1601
  • A new GitHub Actions workflow was added to automatically produce code coverage reports. Links to coverage reports are posted as Pull Request comments. See a sample report. LLPC#1627, LLPC#1629

oneAPI DPC++

CUDA/HIP support

  • Added group collective functions (reduce, scans, broadcast) for HIP. #5202
  • Added HIP backend support to filter selector extension. #5171, #5176
  • Improved queue barrier support on HIP backend. #4975
  • Made a number of small functional fixes improving device information, support for stream, hierarchical parallelism, etc. #4951, #5168, #5293, #5115
  • Added -fcuda-prec-sqrt flag enabling correctly rounded results of the sqrt function on CUDA backend (equivalent to the nvcc -prec-sqrt). #5141
  • Enabled in-kernel asserts support for the CUDA backend. #5174
  • Fixed the nan() builtin for double types on the CUDA backend. #5173
  • Fixed out-of-bound behavior for read_image in none addressing mode on the CUDA backend. #5204

SYCL 2020 support

  • Added basic support for the generic_space address space. #5148
  • Added property list support to the stream class. #4898

Non-standard extensions

  • ESIMD: Added infrastructure to support non-standard C++ types sycl::half, sycl::bfloat, etc., and basic operations support for sycl::half. #5123
  • ESIMD: Enabled a number of math and conversion intrinsics for sycl::half. #5271
  • Matrix: Enabled joint_matrix_fill (#4994 and #5277) and wi_slice (#4979) operations for joint_matrix.
  • Added new SPIR-V specification extensions documentation to support “module” debug information (#3976) and composite types as joint matrix elements (#5228).
  • Added new SYCL specification extensions documentation to support “device global” objects (#4686), compile time properties (#4937), and std::complex data type in group collective algorithms (#5108).
  • Added initial support for format strings in non-constant address space for the printf function. #5069
  • Moved group sort extension to experimental namespace. #5169

Misc

  • Enhanced SYCL accessor (#5249) and buffer (#5161) instrumentation with XPTI and reduced XPTI instrumentation overhead #5158.
  • Turned on the -fsycl-dead-args-optimization flag by default. #3004
  • Enabled SPIR-V device image format for fat objects in the driver. #4608, #5251, #4683
  • Made a number of performance improvements in the runtime library and Level Zero plug-in like caching command lists (#5197), batching of copy commands (#5155).
  • Added static linking of device code to the Level Zero backend. #5266, #5267