Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from November 12 to December 2 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

LLVM and Clang

Discussions

Commits

  • (In-review) Anastasia Stulova submitted a patch adding a toolchain for SPIR-V in Clang. The toolchain is incomplete but functional enough to produce SPIR-V assembly and object code directly via Clang. D112410
  • AMDGPU GX10 memory model was updated to account for MALL (memory attached last-level) cache added in GXF10.3. D114076

MLIR

Discussions

Commits

  • A chain of CLs have landed to better support GPU to NVVM MMA conversion. D112969, D113383, D113618
  • GPU to ROCm now supports target chipset during conversion. D114107
  • ROCm integration tests are runnable now. D114184
  • SPIR-V dialect definitions were refreshed to catch up with the latest spec. D113667
  • scf.while to SPIR-V conversion is now supported. D113007
  • Math ops to SPIR-V conversion now can generate OpenCL extended instructions. D113780
  • spv.AtomicFAddEXTOp is defined and capability bugs for atomics were fixed. D113764, D114551

OpenMP (Target Offloading)

Discussions

  • Rewriting the target offloading driver and requiring LLD for OpenMP offloading was discussed, Slides.

Commits

  • The new device runtime is now enabled by default for Nvidia offloading. To use the old runtime, -fno-openmp-target-new-runtime must be passed. D114890

External Compilers

LLPC

  • The standalone compiler tool switched to handling recoverable errors with llvm::Error, using the standard LLVM error handling utilities. LLPC#1545, LLPC#1553
  • Continued worked towards supporting the New Pass Manager. LLPC#1519
  • The pre-merge checks will now warn about typos using the typos CLI tool. LLPC#1516

oneAPI DPC++

CUDA/HIP support

  • Enabled cuda-gpu-arch, cuda-path, nocudalib, and fno-sycl-libspirv options in MSVC compatible driver (clang-cl).
  • Improved diagnostics for using unsupported work-group size with HIP backend.
  • Added half precision floating point data type support for the nextafter function.
  • Added atomics with scopes and memory orders to CUDA backend. Patch adding NVPTX intrinsics required for this implementation is uploaded for review. D112718
  • Added HIP backed implementation for 40+ math functions.

SYCL 2020 support

  • Improved diagnostics for using non-forward declarable kernel name types and unsupported data types in device code (added is_device_copyable type trait check for SYCL buffers).
  • Added user-defined and pointer types support to group_broadcast operation.
  • Made sycl::marray implementation trivially copyable.
  • Added info::device::built_in_kernel_ids information query support.

Non-standard extensions

  • [ESIMD] Added support for align flags to simd::copy_from/to operations.
  • Added ESIMD-specific IR verification pass.
  • Added specification for discard_events queue property.

Misc

  • Fixed a number of memory leaks in the DPC++ runtime library.
  • Added stripped PDB files for the DPC++ runtime library and plugins when building with MSVC.
  • Reduced compiler memory consumption during link step.