Issue #12

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 30 to May 20 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conference Talks

The X.Org Developers’ Conference 2021 is now accepting submissions and is open for registration. The conference will happen virtually on September 15-17. There is no registration fee.
Portable Computing Language (PoCL) v1.7, a portable open-source OpenCL implementation, has been released. The new release features Clang/LLVM 12.0 support and can execute SPIR-V binaries on CPUs. The project is looking for people interested in taking the roles of ARM and RISC-V CPU maintainers.
ROCm 4.2 has been released. The new HIP enhancements include target platform macros for AMD and Nvidia, platform-specific include directories, and extended support for Stream Memory Operations that enable direct synchronization between network nodes and GPU.
Nvidia proposed a new Vulkan extension to allow application to import CUDA binaries (cubin ELF files) and execute them.

LLVM and Clang

Discussions

Reshabh Kumar Sharma revived the RFC on implementing the sanitizer runtimes for heterogeneous devices. The idea is to isolate the changes using a new macro, e.g., SANITIZER_AMDGPU. Reshabh is requesting feedback on this idea. There are no replies as of writing.

Commits

NVPTX intrinsics for CUDA’s redux.sync and cp.async were added as Clang target builtins.
A few optimization passes got disabled when compiling for AMDGPU with -O1 to reduce compilation times.
AMDGPU support for architected flat scratch: readonly flat scratch register initialized by the SPI.
New AMDGPU gfx1034 target was added to Clang/LLVM. The list of AMDGPU processors lists gfx1034 as an unreleased discrete RDNA2 GPU.

MLIR

Discussions

Commits

warp synchronous matrix-multiply accumulate ops landed in the GPU dialect.
spv.BranchConditional’s (de)serialization is properly implemented now.
More progress on supporting graphics in SPIR-V: spv.ImageQuerySize is defined now.
A few corner cases in vector/std to SPIR-V conversion are addressed.

OpenMP (Target Offloading)

Discussions

Jon Chesterfield proposed Johannes Doerfert as the code owner for OpenMP offloading. All replies are in favor so far.

Commits

unified_shared_memory is now supported for Pascal-generation Nvidia GPUs.

External Compilers

LLPC

A new ELF disassembler, lgcdis, has been added. The disassembler can be used either through a standalone tool or as a library, potentially replacing existing non-LLVM-based ELF dumpers. Compared to llvm-objdump, the new disassembler can decode PAL metadata.

Industry News and Conference Talks

LLVM and Clang

Discussions

Commits

MLIR

Discussions

Commits

OpenMP (Target Offloading)

Discussions

Commits

External Compilers

LLPC

Mesa