aboutsummaryrefslogtreecommitdiff
path: root/libclc
AgeCommit message (Collapse)AuthorFilesLines
2025-11-26[libclc] Use __scoped_atomic_udec/uinc_wrap to implement _clc_atomic_dec/inc ↵Wenju He3-3/+3
(#168327)
2025-11-25[libclc] Add atomic_init, atomic_flag_clear and atomic_flag_test_and_set ↵Wenju He16-0/+541
(#168329)
2025-11-19[libclc] Use CLC atomic functions for legacy OpenCL atom/atomic builtins ↵Wenju He26-265/+190
(#168325) Main changes: * OpenCL legacy atom/atomic builtins now call CLC atomic functions (which use Clang __scoped_atomic_*), replacing previous Clang __sync_* functions. * Change memory order from seq_cst to relaxed; keep device scope (spec permits broader than workgroup). LLVM IR for _Z8atom_decPU3AS1Vi in amdgcn--amdhsa.bc: Before: %2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1 syncscope("agent") seq_cst After: %2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1 syncscope("agent") monotonic * Also adds OpenCL 1.0 atom_* variants without volatile on the pointer. They are added for backward compatibility.
2025-11-17[libclc] Fix link to source in index.html (#167494)Konrad Kleine1-1/+1
2025-11-13[libclc] Fix floating-point __clc_atomic_store/exchange cast mismatch (#167625)Wenju He4-17/+24
When pointer element type is casted to integer type, the stored value should be casted to integer type to avoid type mistmatch. LLVM IR change in function _Z18__clc_atomic_storePU3AS1Vffii: > %5 = bitcast float %1 to i32 (New) < %5 = fptosi float %1 to i32 (Old)
2025-11-05[NFC][libclc] Rename clc_cbrt.inc to clc_cbrt.h (#166330)Wenju He3-1/+2
__clc_cbrt functions are declared in clc_cbrt.inc. Rename to .h for consistency with other headers.
2025-10-27[libclc] Implement integer __clc_abs using __builtin_elementwise_abs (#164957)Wenju He2-2/+26
Previous implementation was cmp, select and @llvm.smax sequence in LLVM IR. __CLC_GEN_U/__CLC_GEN_S is upstreamed from intel/llvm repo.
2025-10-22[NFC][libclc] Improve empty builtins error: include ARCH_SUFFIX in message ↵Wenju He1-1/+1
(#164527) To clarify which builtin set has no bytecode files.
2025-10-21[libclc] Fix double NAN_MASK in __clc_nan (#163522)Wenju He1-1/+1
0x7ff0000000000000 is +inf. Change it to quiet nan 0x7ff8000000000000.
2025-10-20[NFC][libclc] Simplify degrees, radians and smoothstep macros (#164203)Wenju He9-145/+97
Implementation doesn't change.
2025-10-20[libclc] Move functions definition from header clc_sincos_piby4.inc into ↵Wenju He12-194/+174
clc_sincos_helpers.cl (#164028) inline functions defined in clc_sincos_piby4.inc miss static specifier and are deleted by EliminateAvailableExternallyPass when not inlined. This PR fix the problem by removing inline and moving function definition into clc/lib/generic/math/clc_sincos_helpers.cl. It makes sense to put all sin/cos helpers definitions in one file clc_sincos_helpers.cl.
2025-10-20[libclc] Change libclc install dir to ${clang_resource_dir}/lib/libclc in ↵Wenju He2-3/+6
in-tree build (#163896) Commit df7473673214b placed libclc libraries into clang resource dir <resource-dir>/lib/libclc at build stage. This PR does it at install stage as well. Note that in standalone (not in-tree) build, libclc is still installed to old ${CMAKE_INSTALL_DATADIR}/clc dir.
2025-10-16[NFC][libclc] Add missing evaluation for variable ${tool}_target (#163540)Wenju He1-1/+1
The variabls should be evaluated before checking for empty.
2025-10-07[libclc] Remove -fno-builtin from compile options (#162075)Wenju He1-1/+1
The flag was added in 8ef48d07efa3 to suppress build warning and is no longer needed. It adds "no-builtins" attribute, which prevents libclc functions from being inlined into caller that don't have the attribute. The flag is meant to prevent folding standard library calls into optimized implementations. For libclc device targets, however, such target‑driven folding is desirable. llvm-diff shows no change to amdgcn--amdhsa.bc and nvptx--nvidiacl.bc. Co-authored-by: Mészáros Gergely <gergely.meszaros@intel.com>
2025-10-06libclc: Add gfx1250 and gfx1251 to amdgpu target list (#162034)Matt Arsenault1-1/+1
2025-09-29[libclc] Propose new libclc maintainer (#161141)Fraser Cormack1-0/+3
Wenju He has been active on the libclc project for a while now and has been contributing to the overall health and steering the future of the project.
2025-09-29[libclc] Move myself to the list of inactive maintainersFraser Cormack1-3/+9
Change my email address in the process. I will not be able to keep up maintainership duties on this project in the future. Adding the wording on the inactive maintainers section myself like this feels self-aggrandizing but was copied from other LLVM projects.
2025-09-22[libclc] Fix ctest failures after 7f3661128b1e: adjust external check and ↵Wenju He4-10/+10
make shuffle helpers static (#160036) * Replace call-site check with external declaration scan (grep declare) to avoid false positives for not-inlined __clc_* functions. * _clc_get_el* helpers are defined as inline in clc_shuffle2.cl, so they have available_externally attribute. When they fail to inline they are deleted by EliminateAvailableExternallyPass and become unresolved in cedar-r600--.bc. Mark them static to resolve the issue. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-19[Clang] Rename elementwise builtins to `clzg` and `ctzg` (#157128)Joseph Huber2-2/+2
Summary: The added bit counting builtins for vectors used `cttz` and `ctlz`, which is consistent with the LLVM naming convention. However, these are clang builtins and implement exactly the `__builtin_ctzg` and `__builtin_clzg` behavior. It is confusing to people familiar with other other builtins that these are the only bit counting intrinsics named differently. This includes the additional operation for the undefined zero case, which was added as a `clzg` extension.
2025-09-18[libclc] Remove __attribute__((always_inline)) (#158791)Wenju He6-23/+19
always_inline doesn't guarantee performance improvement. Target-specific optimizations decide whether inlining is profitable. Changes to amdgcn--amdhsa.bc: * _Z9__clc_logDv16_f and _Z15__clc_remainderDv16_fS_ are not inlined. * sincos vector function code size has doubled due to apparent duplication. Also replace typo _CLC_DECL with _CLC_DEF for function definition.
2025-09-15[libclc][NFC] Update README.md to use runtime build (#158283)Wenju He1-2/+2
LLVM_ENABLE_PROJECTS=libclc is deprecated, see https://github.com/llvm/llvm-project/blob/a2a9601ea49a/llvm/CMakeLists.txt#L223-L228
2025-09-12[libclc] Create LIBCLC_OUTPUT_LIBRARY_DIR directory before build (#158171)Wenju He1-0/+1
This fixes `No such file or directory` error when "Unix Makefiles" generator is used, see https://github.com/intel/llvm/issues/20058. Ninja generator implicitly creates output directory when generating libclc libraries, but "Unix Makefiles" generator does not.
2025-09-12libclc: Remove HAVE_LLVM version macros (#158257)Matt Arsenault2-21/+0
This doesn't need to pretend to support multiple versions of llvm and these are old anyway.
2025-09-09[NFC][libclc] Replace _CLC_V_V_VP_VECTORIZE macro with use of ↵Wenju He2-62/+115
unary_def_with_ptr_scalarize.inc (#157002) Commit d50f2ef437ae removes _CLC_V_V_VP_VECTORIZE from header file, but the macro is still used in our downstream code: https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/modf.cl#L30 https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/sincos.cl#L31 We can either revert d50f2ef437ae or replace macro with use of unary_def_with_ptr_scalarize.inc. This PR uses the latter approach.
2025-09-05[libclc] Implement erf/erfc vector function with loop since scalar function ↵Wenju He3-2/+30
is large (#157055) This PR reduces amdgcn--amdhsa.bc size by 1.8% and nvptx64--nvidiacl.bc size by 4%. Loop trip count is constant and backend can decide whether to unroll. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-05[libclc] Override generic symbol using llvm-link --override flag instead of ↵Wenju He4-12/+23
using weak linkage (#156778) Before this PR, weak linkage is applied to a few CLC generic functions to allow target specific implementation to override generic one. However, adding weak linkage has a side effect of preventing inter-procedural optimization, such as PostOrderFunctionAttrsPass, because weak function doesn't have exact definition (as determined by hasExactDefinition in the pass). This PR resolves the issue by adding --override flag for every non-generic bitcode file in llvm-link run. This approach eliminates the need for weak linkage while still allowing target-specific implementation to override generic one. llvm-diff shows imporoved attribute deduction for some functions in amdgcn--amdhsa.bc, e.g. %23 = tail call half @llvm.sqrt.f16(half %22) => %23 = tail call noundef half @llvm.sqrt.f16(half %22)
2025-09-05[NFC][libclc] Set MACRO_ARCH to ${ARCH} uncondionally before customizing ↵Wenju He1-1/+1
(#156789) Our downstream libclc add a few more targets that customizes build_flags and opt_flags. Then in each customization block, MACRO_ARCH is defined to be ${ARCH}. Hoisting MACRO_ARCH definition out of if-else-end block avoids code duplication. This also avoids potential error when MACRO_ARCH definition is forgotten, e.g. in https://github.com/intel/llvm/pull/19971.
2025-09-05[NFC][libclc] Remove unused -DCLC_INTERNAL build flag, remove unused ↵Wenju He2-7/+1
M_LOG210 (#156590)
2025-09-03[NFC][libclc] Move _CLC_V_V_VP_VECTORIZE macro into clc_lgamma_r.cl and ↵Wenju He48-116/+55
delete clcmacro.h (#156280) clcmacro.h only defines _CLC_V_V_VP_VECTORIZE which is only used in clc/lib/generic/math/clc_lgamma_r.cl.
2025-09-01[libclc] update __clc_mem_fence: add MemorySemantic arg and use ↵Wenju He12-44/+108
__builtin_amdgcn_fence for AMDGPU (#152275) It is necessary to add MemorySemantic argument for AMDGPU which means the memory or address space to which the memory ordering is applied. The MemorySemantic is also necessary for implementing the SPIR-V MemoryBarrier instruction. Additionally, the implementation of __clc_mem_fence on Intel GPUs requires the MemorySemantic argument. Using __builtin_amdgcn_fence for AMDGPU is follow-up of https://github.com/llvm/llvm-project/pull/151446#discussion_r2254006508 llvm-diff shows no change to nvptx64--nvidiacl.bc.
2025-08-28libclc: CMake: include GetClangResourceDir (#155836)Romaric Jodin1-0/+1
`get_clang_resource_dir` is not guarantee to be there. Make sure of it by including `GetClangResourceDir`.
2025-08-22[libclc] Only create a target per each compile command for cmake MSVC ↵Wenju He1-21/+48
generator (#154479) libclc sequential build issue addressed in commit 0c21d6b4c8ad is specific to cmake MSVC generator. Therefore, this PR avoids creating a large number of targets when a non-MSVC generator is used, such as the Ninja generator, which is used in pre-merge CI on Windows in llvm-project repo. We plan to migrate from MSVC generator to Ninja generator in our downstream CI to fix flaky cmake bug `Cannot restore timestamp`, which might be related to the large number of targets.
2025-08-21[libclc] Use elementwise ctlz/cttz builtins for CLC clz/ctz (#154535)Fraser Cormack4-64/+24
Using the elementwise builtin optimizes the vector case; instead of scalarizing we can compile directly to the vector intrinsics.
2025-08-19[libclc] Implement __clc_get_local_size/__clc_get_max_sub_group_size for ↵Wenju He3-0/+37
amdgcn (#153785) This simplifies downstream refactoring of libspirv workitem function in https://github.com/intel/llvm/tree/sycl/libclc/libspirv/lib/generic
2025-08-18[NFC][libclc] add missing __CLC_ prefix all internal macros (#153523)Wenju He665-2470/+2500
This unifies naming scheme of macros to address review comment https://github.com/intel/llvm/pull/19779#discussion_r2272194357 math constant value macros are not changed, e.g. `#define AU0 -9.86494292470009928597e-03`
2025-08-18[libclc] Fix out-of-bound value for workitem functions according to OpenCL ↵Wenju He4-4/+4
spec (#153784)
2025-08-12[libclc] Add __attribute__((const)) to functions that don't access memory ↵Wenju He80-151/+160
(#152456) Before this PR, PostOrderFunctionAttrsPass in opt run can deduce memory(none) for these functions. This PR explicitly adds the attribute to align with Clang's OpenCL headers and ensures the attribute is present throughout the compilation flow. Generated bitcode files amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc become slightly smaller.
2025-08-11[libclc] Fix libclc install on Windows when MSVC generator is used (#152703)Wenju He1-8/+29
Fix a regression of df7473673214. cmake MSVC generator is multiple configurations. Build type is not known at configure time and CMAKE_CFG_INTDIR is evaluated to $(Configuration) at configure time. libclc install fails since $(Configuration) in bitcode file path is unresolved in libclc/cmake_install.cmake at install time. We need a solution that resolves libclc bitcode file path at install time. This PR fixes the issue using CMAKE_INSTALL_CONFIG_NAME which can be evaluated at install time. This is the same solution as in https://reviews.llvm.org/D76827
2025-08-11[libclc] Implement clc_log/sinpi/sqrt with __nv_* functions (#150174)Wenju He7-0/+180
This is to upstream implementations in https://github.com/intel/llvm/tree/sycl/libclc/clc/lib/ptx-nvidiacl/math
2025-08-08[NFC][libclc] Delete unused ↵Wenju He1-15/+0
clc/shared/binary_decl_with_scalar_second_arg.inc (#152463)
2025-08-07[libclc] Add missing clc/lib/ptx-nvidiacl/SOURCES to CMAKE_CONFIGURE_DEPENDS ↵Wenju He1-0/+1
(#152431)
2025-08-07[libclc] Set TARGET_FILE property for prepare-${obj_suffix} target (#152245)Wenju He1-1/+4
The target's output bitcode `libclc_builtins_lib` is located in a sub-directory in clang resource directory since df7473673214. Setting TARGET_FILE property can allow targets in non-libclc project to obtain the path to `libclc_builtins_lib`.
2025-08-06[libclc] Move mem_fence and barrier to clc library (#151446)Wenju He14-30/+165
__clc_mem_fence and __clc_work_group_barrier function have two parameters memory_scope and memory_order. The design allows the clc functions to implement SPIR-V ControlBarrier and MemoryBarrier functions in the future. The default memory ordering in clc is set to __ATOMIC_SEQ_CST, which is also the default and strongest ordering in OpenCL and C++. OpenCL cl_mem_fence_flags parameter is converted to combination of __MEMORY_SCOPE_DEVICE and __MEMORY_SCOPE_WRKGRP, which is passed to clc. llvm-diff shows no change to nvptx64--nvidiacl.bc. llvm-diff show a small change to amdgcn--amdhsa.bc and the number of LLVM IR instruction is reduced by 1: https://alive2.llvm.org/ce/z/_Uhqvt
2025-08-05[libclc] Refine id in async_work_group_copy STRIDED_COPY (#151644)Wenju He1-2/+2
Move id first along 0th dimension to achieve coalesced memory access when stride is 1.
2025-08-04[clang] Add the ability to link libclc OpenCL libraries (#146503)Fraser Cormack1-3/+10
This commit adds driver support for linking libclc OpenCL libraries. It takes the form of a new optional flag: --libclc-lib=namespec. Nothing is linked unless this flag is specified. Not all libclc targets have corresponding clang targets. For this reason it is desirable for users to be able to specify a libclc library name. We support this by taking both a library name (without the .bc suffix) or a filename. Both of these are searched for in the clang resource directory. Filenames are also checked themselves so that absolute paths can be provided. The syntax for specifying filenames (as opposed to library names) uses a leading colon (:), inspired by the -l option. To accommodate this option, libclc libraries are now placed into clang's resource directory in an in-tree configuration. The libraries are all placed in <resource-dir>/lib/libclc and are not grouped under host-specific directories as some other runtime libraries are; it is not expected that OpenCL libraries will differ depending on the host toolchain. Currently only the AMDGPU toolchain supports this option as a proof of concept. Other targets such as NVPTX or SPIR/SPIR-V could support it too. We could optionally let target toolchains search for libclc libraries themselves, possibly when passed an empty --libclc-lib.
2025-08-01[libclc] Add an option to build SPIR-V targets with the LLVM backend (#151347)Fraser Cormack2-22/+44
This removes the dependency on an external tool to build the SPIR-V files. It may be of interest to projects such as Mesa. Note that the option is off by default as using the SPIR-V backend, at least on my machine, uses a *lot* of memory and the process is often killed in a parallelized build. It does complete, however. Fixes #135327.
2025-07-29[libclc] Optimize generic CLC fmin/fmax (#128506)Fraser Cormack9-214/+9
With this commit, the CLC fmin/fmax builtins use clang's __builtin_elementwise_(min|max)imumnum which helps us generate LLVM minimumnum/maximumnum intrinsics directly. These intrinsics uniformly select the non-NaN input over the (quiet or signalling) NaN input, which corresponds to what the OpenCL CTS tests. These intrinsics maintain the vector types, as opposed to scalarizing, which was previously happening. This commit therefore helps to optimize codegen for those targets. Note that there is ongoing discussion regarding how these builtins should handle signalling NaNs in the OpenCL specification and whether they should be able to return a quiet NaN as per the IEEE behaviour. If the specification and/or CTS is ever updated to allow or mandate returning a qNAN, these builtins could/should be updated to use __builtin_elementwise_(min|max)num instead which would lower to LLVM minnum/maxnum intrinsics. The SPIR-V targets maintain the old implementations, as the LLVM -> SPIR-V translator can't currently handle the LLVM intrinsics. The implementation has been simplifies to consistently use clang builtins, as opposed to before where the half version was explicitly defined. [1] https://github.com/KhronosGroup/OpenCL-CTS/pull/2285
2025-07-29[libclc] Fix building top-level 'libclc' target (#150972)Fraser Cormack2-1/+14
With libclc being a 'runtime', the top-level build assumes that there is a corresopnding 'libclc' target. We previously weren't providing this, leading to a build failure if the user tried to build it. This commit remedies this by adding support for building the 'libclc' target. It does so by adding dependencies from the OpenCL builtins to this target. It uses a configurable in-between target - libclc-opencl-builtins - to ease the possibility of adding non-OpenCL builtin libraries in the future.
2025-07-29[libclc] Add generic native half implementation of __clc_normalize (#150165)Wenju He1-9/+8
This is ported from https://github.com/intel/llvm/blob/sycl/libclc/libspirv/lib/generic/geometric/normalize.cl and can pass a closed-source OpenCL CTS "test_geometrics geom_normalize --half CL_DEVICE_TYPE_GPU" on intel GPU. llvm-diff amdgcn--amdhsa.bc shows fpext/fptrunc insts are now removed from normalize function.
2025-07-29[libclc] Simplify unary_def_scalarize.inc's use in __clc_erf/erfc/tgamma ↵Wenju He4-66/+21
(#150181) Also delete unary_def_via_fp32.inc. There are small changes in amdgcn--amdhsa.bc due to vector conversion is scalarized, e.g. %2 = fpext <4 x half> %0 to <4 x float> %3 = extractelement <4 x float> %2, i64 0 %4 = tail call float @llvm.fabs.f32(float %3) -> %2 = extractelement <4 x half> %0, i64 0 %3 = tail call half @llvm.fabs.f16(half %2) %4 = fpext half %3 to float