Issue #31
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from March 19 to April 1, 2022.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Community Events
LLVM and Clang
Discussions
- A new HLSL Working Group was formed to coordinate efforts for adding HLSL and related code generation support to LLVM & Clang.
- A living Agenda/Meeting Notes document is available.
- Meetings are planned to be 30 minutes long bi-weekly and will adjust as appropriate.
- Issue tracking will be done on LLVM’s GitHub issues.
- Tom Stellard created an
#hlsl
channel on the LLVM Discord server. - Ben Wibking reported CUDA compilation failures caused by
__nv_is_extended_device_lambda_closure_type
not getting recognized by Clang. After applying a workaround, Ben discovered missing__float128
support to be another blocker. Artem Belevich explained that in Clang’s CUDA compilation model, the same source must be ‘reasonably valid’ for both CPU and GPU targets, but there is no FP128 support on existing Nvidia GPUs and suggested a soft-float approach. - Frank Winter noticed NVPTX code generation slowness on some functions and narrowed it down to the ‘GPU Load and Store Vectorizer’. Matt Arsenault confirmed that the pass does have some quadratic behavior.
Commits
- NVPTX vectorization was improvements for
ld.param
andst.param
. D120129 - DirectX Backend stub has landed. D112080
- Added a DXIL target triple. D122031
- (In-review) DXIL CodeGen:
- Landed HLSL changes:
- (In-review) HLSL Semantic parsing. D122699
- Continued work on the AMDGPU gfx940 target.
MLIR
Discussions
- Md Abdullah Shahneous Bari asked about calling external functions in the SPIR-V dialect using
LinkageAttributes
. There are no replies at the time of writing.
Commits
gpu.mma_*
ops is relaxed to support a more flexible layout. D122452func.call
andmath.copysign
to SPIR-V conversion are supported. D122368, D122910
OpenMP (Target Offloading)
Discussions
Commits
- Fixed an issue that can potentially cause segmentation fault for some applications (such as OpenMC, MiniFMM). D122014
- Fixed static or hidden variables causing AMDGPU offloading to fail. D122352
- Fixed global constructors and destructors not being found on AMDGPU. D122515
- Fixed a race condition when deleting entries from the device map. D121058
- Device LTO now uses the default optimization pipeline to address performance regressions when using LTO. D122133
- The new driver will be made the default very soon, users will be able to use static libraries and LTO without manually enabling it. D122831
External Compilers
LLPC
- (In-review) Final patch to switch middle-end passes to the New Pass Manager. This reduces compilation times by 1.2% on average. LLPC#1754
- Added a new class responsible for task/mesh shader lowering. LLPC#1735