Issue #11

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 16 to April 29 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conference Talks

CuPy v9 has been released. CuPy is a NumPy-compatible array library accelerated by CUDA. The main highlights are:
- New JIT API for defining CUDA kernels with Python code.
- NVIDIA cuSPARSELt Python bindings to accelerate sparse matrix multiplication on Nvidia Ampere GPUs.
- AMD ROCm platform improvements, including a binary package for ROCm 4.0.
The IWOCL and SYCLcon 2021 conferences happened this week. These conferences focus on OpenCL and SYCL, respectively. Video recordings and presentation slides are already available publicly.

LLVM and Clang

Discussions

The discussion on how to allow math functions and intrinsics (and friends) when compiling for GPUs has been revived.

Commits

New SYCL documentation has been added: “SYCL Compiler and Runtime architecture design”. The initial version of the document covers address space handling.
Global Dead Code Elimination is now scheduled to run before the Internalization pass in the AMDGPU pass pipeline. This is so that unused global variables, whose only users are dead, can be internalized.
HIP gained a new option, -fgpu-inline-threshold, that controls the inlining threshold for device compilation only.

MLIR

Discussions

Commits

Some basic Python support was added to the GPU dialect and passes.
Boolean std.xor to SPIR-V conversion and vector<1xT> vector.extract to SPIR-V conversion were added.

OpenMP (Target Offloading)

Discussions

Pierre Kestener is facing issues with building NVPTX targets after upgrading to OpenMP 12. The suggested solution is to install gcc-multilib.
The amdgpu device runtime builds by default and no longer requires LLVM to have the amdgpu target enabled.
Simple amdgpu offloading (i.e. if it does not use any libc) now works out of the box on systems with ROCr (runtime for ROCm) installed.
Various initial bugs found in the Clang driver, early adopters beware.

Commits

A new clang tool to list AMD GPUs installed, amdgpu-arch, was committed. The output is used to fill -march when the latter is not explicitly provided in -Xopenmp-target. This tool is built only if HSA is installed.
Simplified clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single kmpc_parallel_51 runtime entry point for parallel regions.
A new runtime function __tgt_set_info_flag that allows the user to set the information level at runtime without using the environment variable.

External Compilers

LLPC

It is now possible to build the amdllpc compiler as a standalone tool, i.e., without the rest of the AMDLVK driver.

Industry News and Conference Talks

LLVM and Clang

Discussions

Commits

MLIR

Discussions

Commits

OpenMP (Target Offloading)

Discussions

Commits

External Compilers

LLPC

Mesa