Issue #24

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from November 12 to December 2 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conferences

LLVM and Clang

Discussions

A group of contributors from UIUC, Intel, and AWS posted a proposal for TLX: Tensor LLVM eXtensions. The extension introduces target-agnostic intrinsics using ‘flat vectors’ that would require target-specific lowering. As noted by Florian Hahn, the proposal is large in both the text length and implementation effort. Chris Lattner expressed a concern that the RFC proposes a single tensor model for LLVM, which may be inappropriate for some frameworks and not general enough.

Commits

(In-review) Anastasia Stulova submitted a patch adding a toolchain for SPIR-V in Clang. The toolchain is incomplete but functional enough to produce SPIR-V assembly and object code directly via Clang. D112410
AMDGPU GX10 memory model was updated to account for MALL (memory attached last-level) cache added in GXF10.3. D114076

MLIR

Discussions

Commits

A chain of CLs have landed to better support GPU to NVVM MMA conversion. D112969, D113383, D113618
GPU to ROCm now supports target chipset during conversion. D114107
ROCm integration tests are runnable now. D114184
SPIR-V dialect definitions were refreshed to catch up with the latest spec. D113667
scf.while to SPIR-V conversion is now supported. D113007
Math ops to SPIR-V conversion now can generate OpenCL extended instructions. D113780
spv.AtomicFAddEXTOp is defined and capability bugs for atomics were fixed. D113764, D114551

OpenMP (Target Offloading)

Discussions

Rewriting the target offloading driver and requiring LLD for OpenMP offloading was discussed, Slides.

Commits

The new device runtime is now enabled by default for Nvidia offloading. To use the old runtime, -fno-openmp-target-new-runtime must be passed. D114890

External Compilers

LLPC

The standalone compiler tool switched to handling recoverable errors with llvm::Error, using the standard LLVM error handling utilities. LLPC#1545, LLPC#1553
Continued worked towards supporting the New Pass Manager. LLPC#1519
The pre-merge checks will now warn about typos using the typos CLI tool. LLPC#1516

oneAPI DPC++

CUDA/HIP support

Enabled cuda-gpu-arch, cuda-path, nocudalib, and fno-sycl-libspirv options in MSVC compatible driver (clang-cl).
Improved diagnostics for using unsupported work-group size with HIP backend.
Added half precision floating point data type support for the nextafter function.
Added atomics with scopes and memory orders to CUDA backend. Patch adding NVPTX intrinsics required for this implementation is uploaded for review. D112718
Added HIP backed implementation for 40+ math functions.

SYCL 2020 support

Improved diagnostics for using non-forward declarable kernel name types and unsupported data types in device code (added is_device_copyable type trait check for SYCL buffers).
Added user-defined and pointer types support to group_broadcast operation.
Made sycl::marray implementation trivially copyable.
Added info::device::built_in_kernel_ids information query support.

Non-standard extensions

[ESIMD] Added support for align flags to simd::copy_from/to operations.
Added ESIMD-specific IR verification pass.
Added specification for discard_events queue property.

Misc

Fixed a number of memory leaks in the DPC++ runtime library.
Added stripped PDB files for the DPC++ runtime library and plugins when building with MSVC.
Reduced compiler memory consumption during link step.