Issue #12
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from April 30 to May 20 2021.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conference Talks
- The X.Org Developers’ Conference 2021 is now accepting submissions and is open for registration. The conference will happen virtually on September 15-17. There is no registration fee.
- Portable Computing Language (PoCL) v1.7, a portable open-source OpenCL implementation, has been released. The new release features Clang/LLVM 12.0 support and can execute SPIR-V binaries on CPUs. The project is looking for people interested in taking the roles of ARM and RISC-V CPU maintainers.
- ROCm 4.2 has been released. The new HIP enhancements include target platform macros for AMD and Nvidia, platform-specific include directories, and extended support for Stream Memory Operations that enable direct synchronization between network nodes and GPU.
- Nvidia proposed a new Vulkan extension to allow application to import CUDA binaries (cubin ELF files) and execute them.
LLVM and Clang
Discussions
- Reshabh Kumar Sharma revived the RFC on implementing the sanitizer runtimes for heterogeneous devices. The idea is to isolate the changes using a new macro, e.g.,
SANITIZER_AMDGPU
. Reshabh is requesting feedback on this idea. There are no replies as of writing.
Commits
- NVPTX intrinsics for CUDA’s
redux.sync
andcp.async
were added as Clang target builtins. - A few optimization passes got disabled when compiling for AMDGPU with
-O1
to reduce compilation times. - AMDGPU support for architected flat scratch: readonly flat scratch register initialized by the SPI.
- New AMDGPU gfx1034 target was added to Clang/LLVM. The list of AMDGPU processors lists gfx1034 as an unreleased discrete RDNA2 GPU.
MLIR
Discussions
Commits
- warp synchronous matrix-multiply accumulate ops landed in the GPU dialect.
spv.BranchConditional
’s (de)serialization is properly implemented now.- More progress on supporting graphics in SPIR-V:
spv.ImageQuerySize
is defined now. - A few corner cases in vector/std to SPIR-V conversion are addressed.
OpenMP (Target Offloading)
Discussions
- Jon Chesterfield proposed Johannes Doerfert as the code owner for OpenMP offloading. All replies are in favor so far.
Commits
unified_shared_memory
is now supported for Pascal-generation Nvidia GPUs.
External Compilers
LLPC
- A new ELF disassembler,
lgcdis
, has been added. The disassembler can be used either through a standalone tool or as a library, potentially replacing existing non-LLVM-based ELF dumpers. Compared tollvm-objdump
, the new disassembler can decode PAL metadata.