Issue #3

Happy New Year! Welcome to the third issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from December 25 to January 7 2021.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Conference Talks

Alyssa Rosenzweig started a blog post series on dissecting the Apple M1 GPU, which doesn’t have any public documentation or open source drivers as of writing. The goal is to understand the new architecture and accelerate the development of an open source driver stack. Alyssa already committed an early work-in-progress disassembler implementation and described the methodology and wokflow used to develop it in the blog post.

LLVM and Clang

Discussions

Madhur Amilkanthwar asked about using the Attributor framework to propagate the amdgpu-flat-work-group-size attribute in the AMDGPU backend.

Commits

(In-review) Remove a custom amdgpu-inline pass and replace it with new Target Transform Info hooks. As explained, this is because the custom inliner doesn’t fit well into the New Pass Manager’s pipeline and has few differences compared to the main LLVM inlining pass.
Clang won’t add debugging information to NVPTX target if optimization remarks are enabled. This is because ptxas supports either debug builds with no optimizations or optimized builds without debug info.
Always print error messages in the libomptarget CUDA plugin, even with debugging disabled.
Make the AMDGPU OpenMP target call into deviceRTL instead of ockl. This allows simple OpenMP code to run without ROCm device libraries installed.

MLIR

Discussions

Lenny Guo asked for help with generating SPIR-V binaries from the SPIR-V MLIR dialect kernels in order to run them with OpenCL runtime. There are no replies as of writing.

Commits

External Compilers

Please submit pointers to your mailing lists, forums, or newsletters if you want your LLVM- or MLIR-based GPU compiler project to be covered in future LLVM GPU News issues.

JuliaGPU

LLPC

The graphics API-agnostic LGC peephole optimizer learned to fold inttoptr ( add x, const ) into gep ( inttoptr x, const ). This improves value tracking and facilitates load/store vectorization. LLVM’s instruction combining pass cannot generally perform the same optimization, because on some systems const itself may be a valid pointer.

Mesa

Always split typed vertex buffer loads on AMD GFX6 and GFX10+ in RADV/LLVM. This fixes hangs in Zink (an OpenGL over Vulkan implementation) tests.

SYCL

Intel’s oneAPI DPC++ Compiler 2020-12 got released. The release notes contain a long list of SYCL compiler and library improvements.