Issue #31

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from March 19 to April 1, 2022.

We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.

Industry News and Community Events

LLVM and Clang

Discussions

A new HLSL Working Group was formed to coordinate efforts for adding HLSL and related code generation support to LLVM & Clang.
- A living Agenda/Meeting Notes document is available.
- Meetings are planned to be 30 minutes long bi-weekly and will adjust as appropriate.
- Issue tracking will be done on LLVM’s GitHub issues.
Tom Stellard created an #hlsl channel on the LLVM Discord server.
Ben Wibking reported CUDA compilation failures caused by __nv_is_extended_device_lambda_closure_type not getting recognized by Clang. After applying a workaround, Ben discovered missing __float128 support to be another blocker. Artem Belevich explained that in Clang’s CUDA compilation model, the same source must be ‘reasonably valid’ for both CPU and GPU targets, but there is no FP128 support on existing Nvidia GPUs and suggested a soft-float approach.
Frank Winter noticed NVPTX code generation slowness on some functions and narrowed it down to the ‘GPU Load and Store Vectorizer’. Matt Arsenault confirmed that the pass does have some quadratic behavior.

Commits

NVPTX vectorization was improvements for ld.param and st.param. D120129
DirectX Backend stub has landed. D112080
Added a DXIL target triple. D122031
(In-review) DXIL CodeGen:
- Add DXILPrepare CodeGen pass. D122081
- Add DXIL Bitcode Writer and DXIL testing.D122082
- Three additional patches add support for opaque pointers:
  - Add pointer type analysis. D122268
  - Update DXIL Prepare to emit no-op bitcasts. D122269
  - Convert opaque to typed pointers in DXIL emission. D122270
Landed HLSL changes:
- HLSL Language and version standards. D122087
- Initial support for HLSL attribute parsing. D112627
(In-review) HLSL Semantic parsing. D122699
Continued work on the AMDGPU gfx940 target.

MLIR

Discussions

Md Abdullah Shahneous Bari asked about calling external functions in the SPIR-V dialect using LinkageAttributes. There are no replies at the time of writing.

Commits

gpu.mma_* ops is relaxed to support a more flexible layout. D122452
func.call and math.copysign to SPIR-V conversion are supported. D122368, D122910

OpenMP (Target Offloading)

Discussions

Commits

Fixed an issue that can potentially cause segmentation fault for some applications (such as OpenMC, MiniFMM). D122014
Fixed static or hidden variables causing AMDGPU offloading to fail. D122352
Fixed global constructors and destructors not being found on AMDGPU. D122515
Fixed a race condition when deleting entries from the device map. D121058
Device LTO now uses the default optimization pipeline to address performance regressions when using LTO. D122133
The new driver will be made the default very soon, users will be able to use static libraries and LTO without manually enabling it. D122831

External Compilers

LLPC

(In-review) Final patch to switch middle-end passes to the New Pass Manager. This reduces compilation times by 1.2% on average. LLPC#1754
Added a new class responsible for task/mesh shader lowering. LLPC#1735

Industry News and Community Events

LLVM and Clang

Discussions

Commits

MLIR

Discussions

Commits

OpenMP (Target Offloading)

Discussions

Commits

External Compilers

LLPC

oneAPI DPC++