Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 30 to May 20 2021.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conference Talks
- The X.Org Developers’ Conference 2021 is now accepting submissions and is open for registration. The conference will happen virtually on September 15-17. There is no registration fee.
- Portable Computing Language (PoCL) v1.7, a portable open-source OpenCL implementation, has been released. The new release features Clang/LLVM 12.0 support and can execute SPIR-V binaries on CPUs. The project is looking for people interested in taking the roles of ARM and RISC-V CPU maintainers.
- ROCm 4.2 has been released. The new HIP enhancements include target platform macros for AMD and Nvidia, platform-specific include directories, and extended support for Stream Memory Operations that enable direct synchronization between network nodes and GPU.
- Nvidia proposed a new Vulkan extension to allow application to import CUDA binaries (cubin ELF files) and execute them.
LLVM and Clang
- Reshabh Kumar Sharma revived the RFC on implementing the sanitizer runtimes for heterogeneous devices. The idea is to isolate the changes using a new macro, e.g.,
SANITIZER_AMDGPU. Reshabh is requesting feedback on this idea. There are no replies as of writing.
- NVPTX intrinsics for CUDA’s
cp.asyncwere added as Clang target builtins.
- A few optimization passes got disabled when compiling for AMDGPU with
-O1to reduce compilation times.
- AMDGPU support for architected flat scratch: readonly flat scratch register initialized by the SPI.
- New AMDGPU gfx1034 target was added to Clang/LLVM. The list of AMDGPU processors lists gfx1034 as an unreleased discrete RDNA2 GPU.
- warp synchronous matrix-multiply accumulate ops landed in the GPU dialect.
spv.BranchConditional’s (de)serialization is properly implemented now.
- More progress on supporting graphics in SPIR-V:
spv.ImageQuerySizeis defined now.
- A few corner cases in vector/std to SPIR-V conversion are addressed.
OpenMP (Target Offloading)
- Jon Chesterfield proposed Johannes Doerfert as the code owner for OpenMP offloading. All replies are in favor so far.
unified_shared_memoryis now supported for Pascal-generation Nvidia GPUs.
- A new ELF disassembler,
lgcdis, has been added. The disassembler can be used either through a standalone tool or as a library, potentially replacing existing non-LLVM-based ELF dumpers. Compared to
llvm-objdump, the new disassembler can decode PAL metadata.