Welcome to the fourth issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from January 8 to January 21 2021.
LLVM GPU News gained a new section: OpenMP Target Offloading, maintained by Johannes Doerfert.
We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.
Industry News and Conference Talks
- Dmitrii Tolmachev published a blog post on real-time image registration on GPU with VkFFT – a self-made Vulkan Fast Fourier Transform library. Image registration is the problem of determining what coordinate system transformation to apply to an image in order to match it against a different image of the same object. Using a highly-optimized FFT implementation on a commodity GPU (Nvidia 1660Ti) allowed Dimitri to run the image registration algorithm in real time. Matching a pair of 1024x1024 screenshots from Cyberpunk 2077 took around 3ms. The readme on Dimitrii’s GitHub mentions that they are looking for a PhD position or a job.
LLVM and Clang
- The AMDGPU backend is no longer the blocker for switching to the New Pass Manager. The last failing test was pinned to use the Legacy Pass Manager, while the work on making Divergence Analysis work with the New Pass Manager is still in progress.
- Burlen Loring asked about Clang/LLVM and CUDA version compatibility on Fedora. There are no replies as of writing.
- (In-review) Add AMDGPU lower function LDS pass. The strategy is to create a new struct with a field for each LDS variable and allocate that struct at the same address for every kernel. This allows some OpenMP kernels for AMDGPU to work with the deviceRTL runtime library that uses CUDA shared variables from functions that cannot be inlined.
AMDGPUSubtargets.hwas split into two subtargets:
GCNSubtarget.h. This reduces include dependencies and improves LLVM build times.
- (In-review) Implement HIP codegen support for the
__managed__attribute. This attribute can be applied to global variables. Managed variables can be used by both device and host code. The ROCm programming guide mentions managed variables as not supported and does not describe their semantics yet.
- Lenny Guo expressed intention to work on OpenCL conversions via SPIR-V and bring up an mlir-opencl-runner.
- The SPIR-V dialect now knows traits like
UsableInSpecConstantOpto process ops in these categories uniformly.
spv.SpecConstantOperationis fully supported now, including serialization and deserialization.
OpenMP (Target Offloading)
- Discussions usually happen on the mailing list (firstname.lastname@example.org) or in our weekly “OpenMP in LLVM” meeting, feel free to join in!
- The LLVM/OpenMP documentation has been online for a few weeks. Initial content is there but the FAQ and other sections are still very much empty; we are looking for volunteers and topics.
- The memory management APIs proposed for OpenMP 6.0, i.a., to allocate managed memory, are discussed for an (potentially opt-in) inclusion into the LLVM runtime very soon.
declare mapperAPI was the last part of data mapping that did not transfer source information to the runtime (location and name of the mapped objects). This was changed now which will cause
LIBOMPTARGET_INFOto display plenty of useful information about mapped objects.
- The second patch for the OpenMP target profiling allows us to trace multiple threads that are offloading concurrently. See the documentation for
- Support for an PTX device runtime has been dropped in favor of the superior way, using an LLVM-IR device runtime. The latter is now easy to build, simply move
openmpfrom the enabled projects to the enabled runtimes (see how to build an OpenMP offloading capable compiler).
omp targetdirectives via “hidden helper threads” was upstreamed. Given some problems encountered afterwards it might need to be refined slightly and might not make it for LLVM 12 after all.
- (In-review) Driver support for OpenMP offloading onto AMD GPUs.
- (In-review) A series of patches is underway to allow building the device runtime in pure OpenMP + C++. An overview of the effort can be found here.
- (In-review) A patch to build the CUDA plugin without having CUDA installed on the build machine. Together with a CUDA free device runtime and a pre-build selection of device runtimes for various architectures, this will allow us to enable OpenMP offloading in LLVM releases, e.g., via Linux distributions.
- Lu Jiao added support for compiling SPIR-V shaders with the linkage capability. Such library SPIR-V shaders do not have an entry function, so it is required to create a dummy entry function.