Issue #3
Happy New Year! Welcome to the third issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from December 25 to January 7 2021.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conference Talks
- Alyssa Rosenzweig started a blog post series on dissecting the Apple M1 GPU, which doesn’t have any public documentation or open source drivers as of writing. The goal is to understand the new architecture and accelerate the development of an open source driver stack. Alyssa already committed an early work-in-progress disassembler implementation and described the methodology and wokflow used to develop it in the blog post.
LLVM and Clang
Discussions
- Madhur Amilkanthwar asked about using the Attributor framework to
propagate the
amdgpu-flat-work-group-size
attribute in the AMDGPU backend.
Commits
- (In-review) Remove a custom
amdgpu-inline
pass and replace it with new Target Transform Info hooks. As explained, this is because the custom inliner doesn’t fit well into the New Pass Manager’s pipeline and has few differences compared to the main LLVM inlining pass. - Clang won’t add debugging information to NVPTX target if optimization
remarks are enabled. This is because
ptxas
supports either debug builds with no optimizations or optimized builds without debug info. - Always print error messages in the
libomptarget
CUDA plugin, even with debugging disabled. - Make the AMDGPU OpenMP target call into
deviceRTL
instead ofockl
. This allows simple OpenMP code to run without ROCm device libraries installed.
MLIR
Discussions
- Lenny Guo asked for help with generating SPIR-V binaries from the SPIR-V MLIR dialect kernels in order to run them with OpenCL runtime. There are no replies as of writing.
Commits
External Compilers
Please submit pointers to your mailing lists, forums, or newsletters if you want your LLVM- or MLIR-based GPU compiler project to be covered in future LLVM GPU News issues.
JuliaGPU
LLPC
- The graphics API-agnostic LGC peephole optimizer learned to
fold
inttoptr ( add x, const )
intogep ( inttoptr x, const )
. This improves value tracking and facilitates load/store vectorization. LLVM’s instruction combining pass cannot generally perform the same optimization, because on some systemsconst
itself may be a valid pointer.
Mesa
- Always split typed vertex buffer loads on AMD GFX6 and GFX10+ in RADV/LLVM. This fixes hangs in Zink (an OpenGL over Vulkan implementation) tests.
SYCL
- Intel’s oneAPI DPC++ Compiler 2020-12 got released. The release notes contain a long list of SYCL compiler and library improvements.