Issue #17
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from July 23 to August 5 2021.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conferences
- The talk schedule for the 2021 X.Org Developers Conference is available. This year, the conference is virtual and happens on September 15-17. The compiler/GPU-related talks include:
- SSA-based Register Allocation for GPU Architectures. The talk will be followed by an in-depth workshop on GPU register allocation.
- The Occult and the Apple GPU
- Compiling Vulkan shaders in the browser: A tale of control flow graphs and WebAssembly
- Ray-tracing in Vulkan pt. 2: Implementation
- Redefining the Future of Accelerator Computing with Level Zero
LLVM and Clang
Discussions
- Luke Kenneth Casson Leighton posted an RFC: “Vector/SIMD ISA Context Abstraction”. Luke is working on SVP64 Cray-like Vector Extensions for the Power ISA, which is being designed for Hybrid CPU, VPU and 3D GPU workloads. One of the problems mentioned is that some ISA designs may lead to combinatorial explosion in the number of intrinsics, which can be avoided by “separating out ‘scalar base’ from ‘augmentation’ throughout the IR”. Renato Golin replied that, historically, LLVM tried to keep as many instructions as native IR as possible to avoid the explosion of intrinsics. However, intrinsics tend to reduce the number of program instructions, so there needs to be some balance.
- 席致寧 asked about a quick way to add a new instruction to generated PTX files, without having to implement full support for the new instruction in the backend. There are no replies as of writing.
Commits
- HIP switched to using DWARF version 5 by default. D107190
- It is now possible to force-enable
MemCpyOpt
with a new LLVM flag-enable-memcpyopt-without-libcalls
. For now, only the CUDA frontend opts into it, to better exercise this optimization. D106401 - A new Attributor pass for deducing AMDGPU-specific attributes was added. D104997
- The NVPTX matrix operation intrinsics were extended with the
ldmatrix.sync.aligned
warp-level matrix load instructions introduced in PTX 6.5. D107046 - Clang learned to preserve ASAN library functions when targeting HIP. D106315
- A number of
GlobalISel
enhancements for AMDGPU.
MLIR
Discussions
Commits
- A
populateGpuToLLVMConversionPatterns
entry point is added for collecting all LLVM GPU to LLVM conversion patterns. D107218 - Two boolean loading/storing issues were fixed in SPIR-V conversion.
- A few issues in the SPIR-V module combiner were fixed. D106886
- MemRef/Math to SPIR-V conversions are split into their own directories and files.
OpenMP (Target Offloading)
Discussions
- Andrew Marshall is having issues with building LLVM 12 for OpenMP. There are no replies as of writing.
Commits
- Users can now enable the new experimental device runtime library by passing the
-fopenmp-target-new-runtime
flag. D106793 - Linking of match libraries is now supported for AMDGPU when
-lm
is specified. D104904, D105981
External Compilers
LLPC
Mesa
- LLVMpipe gained a linear rasterizer optimized for 2D rendering. The changes yield a 2x to 3x performance improvement for 2D workloads.