Issue #11
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 16 to April 29 2021.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conference Talks
- CuPy v9 has been released. CuPy is a NumPy-compatible array library accelerated by CUDA. The main highlights are:
- New JIT API for defining CUDA kernels with Python code.
- NVIDIA cuSPARSELt Python bindings to accelerate sparse matrix multiplication on Nvidia Ampere GPUs.
- AMD ROCm platform improvements, including a binary package for ROCm 4.0.
- The IWOCL and SYCLcon 2021 conferences happened this week. These conferences focus on OpenCL and SYCL, respectively. Video recordings and presentation slides are already available publicly.
LLVM and Clang
Discussions
- The discussion on how to allow math functions and intrinsics (and friends) when compiling for GPUs has been revived.
Commits
- New SYCL documentation has been added: “SYCL Compiler and Runtime architecture design”. The initial version of the document covers address space handling.
- Global Dead Code Elimination is now scheduled to run before the Internalization pass in the AMDGPU pass pipeline. This is so that unused global variables, whose only users are dead, can be internalized.
- HIP gained a new option,
-fgpu-inline-threshold
, that controls the inlining threshold for device compilation only.
MLIR
Discussions
Commits
- Some basic Python support was added to the GPU dialect and passes.
- Boolean
std.xor
to SPIR-V conversion andvector<1xT>
vector.extract
to SPIR-V conversion were added.
OpenMP (Target Offloading)
Discussions
- Pierre Kestener is facing issues with building NVPTX targets after upgrading to OpenMP 12. The suggested solution is to install
gcc-multilib
. - The amdgpu device runtime builds by default and no longer requires LLVM to have the amdgpu target enabled.
- Simple amdgpu offloading (i.e. if it does not use any libc) now works out of the box on systems with ROCr (runtime for ROCm) installed.
- Various initial bugs found in the Clang driver, early adopters beware.
Commits
- A new clang tool to list AMD GPUs installed,
amdgpu-arch
, was committed. The output is used to fill-march
when the latter is not explicitly provided in-Xopenmp-target
. This tool is built only if HSA is installed. - Simplified clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single
kmpc_parallel_51
runtime entry point for parallel regions. - A new runtime function
__tgt_set_info_flag
that allows the user to set the information level at runtime without using the environment variable.
External Compilers
LLPC
- It is now possible to build the
amdllpc
compiler as a standalone tool, i.e., without the rest of the AMDLVK driver.