Issue #6

Welcome to the sixth issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from February 5 to February 18 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conference Talks

Vulkan, a cross-platform graphics API, is five years old now.
In another Apple M1 GPU tinkering effort, Dougall Johnson published an in-progress doc attempting to explain the M1 GPU architecture. The project repository contains various tools, including an assembler, disassembler, emulator, and a test suite.

LLVM and Clang

Discussions

David Blaikie is looking for volunteers with GPU and/or LLVM middle-end background to help review the “Abstracting over SSA form IRs to implement generic analyses” proposal. One of the main uses of the proposed abstractions is supposed to be the Divergence Analysis.
Sameer Sahasrabuddhe continues the attempts to enable Divergence Analysis with the New Pass Manager. Alina Sbirlea pointed out that there are two feasible ways to make the SimpleLoopUnswitch pass work: either disable non-trivial unswitching for targets with divergence, or compute Diverge Analysis results within the pass.

Commits

Fixes to AMDGPU maximum memory scope for scratch, LDS, and GDS address spaces.
Support for the AMDGPU gfx90a target was posted, but may have been committed prematurely.
CUDA/HIP option for specifying compilation unit ID, -fuse-cuid.
(In-review) HIP option to enable sanitizer support for the AMDGPU target, -fgpu-sanitize. This is experimental and off by default.
(In-review) A new clspv target for libclc. clspv is an open-source OpenCL C to Vulkan SPIR-V compiler.

MLIR

Discussions

Commits

NVVM/ROCDL kernel function conversions now rely on target-specific attributes for better control.
NVVM/ROCDL to LLVM IR conversions now adopt the interface-based LLVM translation.
In SPIR-V dialect, more types and ops were defined to support graphics use cases.
More patterns were added to convert vector ops to SPIR-V ops.

OpenMP (Target Offloading)

Discussions

Konstantin Sidorov is interested in Google Summer of Code project ideas related to Machine Learning-assisted compiler optimizations. Johannes Doerfert suggested a predictor for grid/block/thread block size for OpenMP GPU kernels.

Commits

NVIDIA devices will from now on require CUDA 9.0 or higher.
We will natively support CUDA 11.1 and 11.2.
All target directives, not only target regions, will now utilize asynchronous actions if the plugin supports them (which includes the CUDA plugin).
The NVIDIA device runtime and the AMDGPU device runtime are now build as C++ with OpenMP code, not as CUDA/HIP anymore.
The CUDA plugin can be built without having CUDA installed on a system (or known to clang), this should allow us to distribute LLVM with OpenMP offload support more easily.
Various bugs have been fixed, including but not limited to:
- PR49158 fixed by allowing unused functions in declare target regions if they are not emitted,
- PR49207 fixed by avoiding stack locations in asynchronous actions.

External Compilers

LLPC

Mesa

llvmpipe, a CPU OpenGL implementation, landed support for more SPIR-V extensions, bringing it closer to full GL4.6 support.

SYCL

Khronos released the final version of the SYCL 2020 spec. SYCL 2020 is based on C++17 and contains over 40 new features, including Unified Shared Memory, built-in parallel reduction operations, atomic operations with C++ atomics semantics.