aboutsummaryrefslogtreecommitdiff
path: root/libclc
AgeCommit message (Collapse)AuthorFilesLines
2026-01-05[libclc] Fix incorrect argument formatJoseph Huber1-1/+1
2026-01-05[libclc] Suppress AMDGCN code object version for library code (#174412)Joseph Huber1-0/+3
Summary: The code object version defines the ABI the HSA executables follow. This prevents it from hard-coding it in the OpenCL libraries and preventing users from overriding it when using them.
2025-12-19[libclc] Improve __clc_min/max/clamp implementation (#172599)Wenju He6-5/+43
Replace __clc_max/min with __clc_fmax/fmin in __clc_clamp. FP __clc_min/max/clamp now lowers to @llvm.minimumnum/@llvm.maximumnum, and integer clamp lowers to @llvm.umin/@llvm.umax. This reduce fcmp+select chains and improving codegen. Example change to amdgcn--amdhsa.bc: ``` in function _Z5clamphhh: > %4 = icmp ugt i8 %0, %2 %4 = tail call noundef i8 @llvm.umax.i8(i8 %0, i8 %1) > %6 = select i1 %4, i8 %2, i8 %5 > ret i8 %6 < %5 = tail call noundef i8 @llvm.umin.i8(i8 %2, i8 %4) < ret i8 %5 in function _Z5clampddd: in block %3 / %3: > %4 = fcmp ogt double %0, %2 > %5 = fcmp olt double %0, %1 > %6 = select i1 %5, double %1, double %0 > %7 = select i1 %4, double %2, double %6 > ret double %7 < %4 = tail call noundef double @llvm.maximumnum.f64(double %0, double %1) < %5 = tail call noundef double @llvm.minimumnum.f64(double %4, double %2) < ret double %5 ```
2025-12-19[libclc][NFC] Move convert builtins from Python generator to .cl sources ↵Wenju He25-621/+1311
(#172634) Remove the Python dependency for generating convert builtins, aligning with how other builtins are defined. In addition, our downstream target relies on this PR to override convert implementations. llvm-diff shows no changes to all bitcodes: amdgcn--amdhsa.bc, barts-r600--.bc, cayman-r600--.bc, cedar-r600--.bc, clspv64--.bc, clspv--.bc, cypress-r600--.bc, nvptx64--.bc, nvptx64--nvidiacl.bc, nvptx--.bc, nvptx--nvidiacl.bc, tahiti-amdgcn--.bc and tahiti-amdgcn-mesa-mesa3d.bc.
2025-12-11[libclc] use clc functions in clspv/shared/vstore_half.cl (#171770)Romaric Jodin1-12/+18
2025-12-09[libclc] fix clspv/shared/vstore_half.cl (#171105)Romaric Jodin1-11/+12
Update as_type functions
2025-12-09[NFC][libclc] Delete OpenCL builtin declarations (#170803)Wenju He476-5389/+31
This is follow-up of comment https://github.com/llvm/llvm-project/pull/168318#discussion_r2588117855 libclc OpenCL library is already compiled with flag `-fdeclare-opencl-builtins -finclude-default-header`.
2025-12-05[libclc] Add OpenCL atomic_*_explicit builtins (#168318)Wenju He64-337/+364
Implement atomic_*_explicit (e.g. atomic_store_explicit) with memory_order plus optional memory_scope. OpenCL memory_order maps 1:1 to Clang (e.g. OpenCL memory_order_relaxed == Clang __ATOMIC_RELAXED), so we pass it unchanged to clc_atomic_* function which forwards to Clang _scoped_atomic* builtins. Other changes: * Add __opencl_get_clang_memory_scope helper in opencl/utils.h (OpenCL scope -> Clang scope). * Correct atomic_compare_exchange return type to bool. * Fix atomic_compare_exchange to return true when value stored in the pointer equals expected value. * Remove volatile from CLC functions so that volatile isn't present in LLVM IR. * Add '-fdeclare-opencl-builtins -finclude-default-header' flag to include declaration of memory_scope. Some constants in libclc are already provided by Clang’s OpenCL header; disable those in OpenCL library build and enable them only for CLC library build.
2025-12-04[libclc] Fix memory fence scope mapping for OpenCL (#170542)Victor Mustya1-6/+5
The function `__opencl_get_memory_scope` incorrectly assumed that the Clang built-in `__MEMORY_SCOPE_*` macros defined as bitmasks, while they are actually defined as distinct integer values. This led to incorrect mapping of OpenCL memory fence flags to LLVM memory scopes, causing issues in generated code. The fix involves updating the `__opencl_get_memory_scope` function to return the correct `__MEMORY_SCOPE_*` values based on the provided `cl_mem_fence_flags`. Additionally, the `__opencl_get_memory_semantics` and the `__opencl_get_memory_scope` functions are marked as `static` to avoid potential multiple definition issues during linking.
2025-12-02libclc: Fix build in atomic_def.inc (#170306)Matt Arsenault1-1/+2
2025-12-02[libclc] Fix bitfield_insert implementation (#170208)Victor Mustya1-1/+1
The `bitfield_insert` function in the OpenCL C library had an incorrect `__CLC_BODY` definition, that included the `.inc` file for the `__clc_bitfield_insert` declaration instead of the correct implementation. So, the function was not defined at all, leading to linker errors when trying to use it.
2025-11-26[libclc] Use __scoped_atomic_udec/uinc_wrap to implement _clc_atomic_dec/inc ↵Wenju He3-3/+3
(#168327)
2025-11-25[libclc] Add atomic_init, atomic_flag_clear and atomic_flag_test_and_set ↵Wenju He16-0/+541
(#168329)
2025-11-19[libclc] Use CLC atomic functions for legacy OpenCL atom/atomic builtins ↵Wenju He26-265/+190
(#168325) Main changes: * OpenCL legacy atom/atomic builtins now call CLC atomic functions (which use Clang __scoped_atomic_*), replacing previous Clang __sync_* functions. * Change memory order from seq_cst to relaxed; keep device scope (spec permits broader than workgroup). LLVM IR for _Z8atom_decPU3AS1Vi in amdgcn--amdhsa.bc: Before: %2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1 syncscope("agent") seq_cst After: %2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1 syncscope("agent") monotonic * Also adds OpenCL 1.0 atom_* variants without volatile on the pointer. They are added for backward compatibility.
2025-11-17[libclc] Fix link to source in index.html (#167494)Konrad Kleine1-1/+1
2025-11-13[libclc] Fix floating-point __clc_atomic_store/exchange cast mismatch (#167625)Wenju He4-17/+24
When pointer element type is casted to integer type, the stored value should be casted to integer type to avoid type mistmatch. LLVM IR change in function _Z18__clc_atomic_storePU3AS1Vffii: > %5 = bitcast float %1 to i32 (New) < %5 = fptosi float %1 to i32 (Old)
2025-11-05[NFC][libclc] Rename clc_cbrt.inc to clc_cbrt.h (#166330)Wenju He3-1/+2
__clc_cbrt functions are declared in clc_cbrt.inc. Rename to .h for consistency with other headers.
2025-10-27[libclc] Implement integer __clc_abs using __builtin_elementwise_abs (#164957)Wenju He2-2/+26
Previous implementation was cmp, select and @llvm.smax sequence in LLVM IR. __CLC_GEN_U/__CLC_GEN_S is upstreamed from intel/llvm repo.
2025-10-22[NFC][libclc] Improve empty builtins error: include ARCH_SUFFIX in message ↵Wenju He1-1/+1
(#164527) To clarify which builtin set has no bytecode files.
2025-10-21[libclc] Fix double NAN_MASK in __clc_nan (#163522)Wenju He1-1/+1
0x7ff0000000000000 is +inf. Change it to quiet nan 0x7ff8000000000000.
2025-10-20[NFC][libclc] Simplify degrees, radians and smoothstep macros (#164203)Wenju He9-145/+97
Implementation doesn't change.
2025-10-20[libclc] Move functions definition from header clc_sincos_piby4.inc into ↵Wenju He12-194/+174
clc_sincos_helpers.cl (#164028) inline functions defined in clc_sincos_piby4.inc miss static specifier and are deleted by EliminateAvailableExternallyPass when not inlined. This PR fix the problem by removing inline and moving function definition into clc/lib/generic/math/clc_sincos_helpers.cl. It makes sense to put all sin/cos helpers definitions in one file clc_sincos_helpers.cl.
2025-10-20[libclc] Change libclc install dir to ${clang_resource_dir}/lib/libclc in ↵Wenju He2-3/+6
in-tree build (#163896) Commit df7473673214b placed libclc libraries into clang resource dir <resource-dir>/lib/libclc at build stage. This PR does it at install stage as well. Note that in standalone (not in-tree) build, libclc is still installed to old ${CMAKE_INSTALL_DATADIR}/clc dir.
2025-10-16[NFC][libclc] Add missing evaluation for variable ${tool}_target (#163540)Wenju He1-1/+1
The variabls should be evaluated before checking for empty.
2025-10-07[libclc] Remove -fno-builtin from compile options (#162075)Wenju He1-1/+1
The flag was added in 8ef48d07efa3 to suppress build warning and is no longer needed. It adds "no-builtins" attribute, which prevents libclc functions from being inlined into caller that don't have the attribute. The flag is meant to prevent folding standard library calls into optimized implementations. For libclc device targets, however, such target‑driven folding is desirable. llvm-diff shows no change to amdgcn--amdhsa.bc and nvptx--nvidiacl.bc. Co-authored-by: Mészáros Gergely <gergely.meszaros@intel.com>
2025-10-06libclc: Add gfx1250 and gfx1251 to amdgpu target list (#162034)Matt Arsenault1-1/+1
2025-09-29[libclc] Propose new libclc maintainer (#161141)Fraser Cormack1-0/+3
Wenju He has been active on the libclc project for a while now and has been contributing to the overall health and steering the future of the project.
2025-09-29[libclc] Move myself to the list of inactive maintainersFraser Cormack1-3/+9
Change my email address in the process. I will not be able to keep up maintainership duties on this project in the future. Adding the wording on the inactive maintainers section myself like this feels self-aggrandizing but was copied from other LLVM projects.
2025-09-22[libclc] Fix ctest failures after 7f3661128b1e: adjust external check and ↵Wenju He4-10/+10
make shuffle helpers static (#160036) * Replace call-site check with external declaration scan (grep declare) to avoid false positives for not-inlined __clc_* functions. * _clc_get_el* helpers are defined as inline in clc_shuffle2.cl, so they have available_externally attribute. When they fail to inline they are deleted by EliminateAvailableExternallyPass and become unresolved in cedar-r600--.bc. Mark them static to resolve the issue. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-19[Clang] Rename elementwise builtins to `clzg` and `ctzg` (#157128)Joseph Huber2-2/+2
Summary: The added bit counting builtins for vectors used `cttz` and `ctlz`, which is consistent with the LLVM naming convention. However, these are clang builtins and implement exactly the `__builtin_ctzg` and `__builtin_clzg` behavior. It is confusing to people familiar with other other builtins that these are the only bit counting intrinsics named differently. This includes the additional operation for the undefined zero case, which was added as a `clzg` extension.
2025-09-18[libclc] Remove __attribute__((always_inline)) (#158791)Wenju He6-23/+19
always_inline doesn't guarantee performance improvement. Target-specific optimizations decide whether inlining is profitable. Changes to amdgcn--amdhsa.bc: * _Z9__clc_logDv16_f and _Z15__clc_remainderDv16_fS_ are not inlined. * sincos vector function code size has doubled due to apparent duplication. Also replace typo _CLC_DECL with _CLC_DEF for function definition.
2025-09-15[libclc][NFC] Update README.md to use runtime build (#158283)Wenju He1-2/+2
LLVM_ENABLE_PROJECTS=libclc is deprecated, see https://github.com/llvm/llvm-project/blob/a2a9601ea49a/llvm/CMakeLists.txt#L223-L228
2025-09-12[libclc] Create LIBCLC_OUTPUT_LIBRARY_DIR directory before build (#158171)Wenju He1-0/+1
This fixes `No such file or directory` error when "Unix Makefiles" generator is used, see https://github.com/intel/llvm/issues/20058. Ninja generator implicitly creates output directory when generating libclc libraries, but "Unix Makefiles" generator does not.
2025-09-12libclc: Remove HAVE_LLVM version macros (#158257)Matt Arsenault2-21/+0
This doesn't need to pretend to support multiple versions of llvm and these are old anyway.
2025-09-09[NFC][libclc] Replace _CLC_V_V_VP_VECTORIZE macro with use of ↵Wenju He2-62/+115
unary_def_with_ptr_scalarize.inc (#157002) Commit d50f2ef437ae removes _CLC_V_V_VP_VECTORIZE from header file, but the macro is still used in our downstream code: https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/modf.cl#L30 https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/sincos.cl#L31 We can either revert d50f2ef437ae or replace macro with use of unary_def_with_ptr_scalarize.inc. This PR uses the latter approach.
2025-09-05[libclc] Implement erf/erfc vector function with loop since scalar function ↵Wenju He3-2/+30
is large (#157055) This PR reduces amdgcn--amdhsa.bc size by 1.8% and nvptx64--nvidiacl.bc size by 4%. Loop trip count is constant and backend can decide whether to unroll. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-05[libclc] Override generic symbol using llvm-link --override flag instead of ↵Wenju He4-12/+23
using weak linkage (#156778) Before this PR, weak linkage is applied to a few CLC generic functions to allow target specific implementation to override generic one. However, adding weak linkage has a side effect of preventing inter-procedural optimization, such as PostOrderFunctionAttrsPass, because weak function doesn't have exact definition (as determined by hasExactDefinition in the pass). This PR resolves the issue by adding --override flag for every non-generic bitcode file in llvm-link run. This approach eliminates the need for weak linkage while still allowing target-specific implementation to override generic one. llvm-diff shows imporoved attribute deduction for some functions in amdgcn--amdhsa.bc, e.g. %23 = tail call half @llvm.sqrt.f16(half %22) => %23 = tail call noundef half @llvm.sqrt.f16(half %22)
2025-09-05[NFC][libclc] Set MACRO_ARCH to ${ARCH} uncondionally before customizing ↵Wenju He1-1/+1
(#156789) Our downstream libclc add a few more targets that customizes build_flags and opt_flags. Then in each customization block, MACRO_ARCH is defined to be ${ARCH}. Hoisting MACRO_ARCH definition out of if-else-end block avoids code duplication. This also avoids potential error when MACRO_ARCH definition is forgotten, e.g. in https://github.com/intel/llvm/pull/19971.
2025-09-05[NFC][libclc] Remove unused -DCLC_INTERNAL build flag, remove unused ↵Wenju He2-7/+1
M_LOG210 (#156590)
2025-09-03[NFC][libclc] Move _CLC_V_V_VP_VECTORIZE macro into clc_lgamma_r.cl and ↵Wenju He48-116/+55
delete clcmacro.h (#156280) clcmacro.h only defines _CLC_V_V_VP_VECTORIZE which is only used in clc/lib/generic/math/clc_lgamma_r.cl.
2025-09-01[libclc] update __clc_mem_fence: add MemorySemantic arg and use ↵Wenju He12-44/+108
__builtin_amdgcn_fence for AMDGPU (#152275) It is necessary to add MemorySemantic argument for AMDGPU which means the memory or address space to which the memory ordering is applied. The MemorySemantic is also necessary for implementing the SPIR-V MemoryBarrier instruction. Additionally, the implementation of __clc_mem_fence on Intel GPUs requires the MemorySemantic argument. Using __builtin_amdgcn_fence for AMDGPU is follow-up of https://github.com/llvm/llvm-project/pull/151446#discussion_r2254006508 llvm-diff shows no change to nvptx64--nvidiacl.bc.
2025-08-28libclc: CMake: include GetClangResourceDir (#155836)Romaric Jodin1-0/+1
`get_clang_resource_dir` is not guarantee to be there. Make sure of it by including `GetClangResourceDir`.
2025-08-22[libclc] Only create a target per each compile command for cmake MSVC ↵Wenju He1-21/+48
generator (#154479) libclc sequential build issue addressed in commit 0c21d6b4c8ad is specific to cmake MSVC generator. Therefore, this PR avoids creating a large number of targets when a non-MSVC generator is used, such as the Ninja generator, which is used in pre-merge CI on Windows in llvm-project repo. We plan to migrate from MSVC generator to Ninja generator in our downstream CI to fix flaky cmake bug `Cannot restore timestamp`, which might be related to the large number of targets.
2025-08-21[libclc] Use elementwise ctlz/cttz builtins for CLC clz/ctz (#154535)Fraser Cormack4-64/+24
Using the elementwise builtin optimizes the vector case; instead of scalarizing we can compile directly to the vector intrinsics.
2025-08-19[libclc] Implement __clc_get_local_size/__clc_get_max_sub_group_size for ↵Wenju He3-0/+37
amdgcn (#153785) This simplifies downstream refactoring of libspirv workitem function in https://github.com/intel/llvm/tree/sycl/libclc/libspirv/lib/generic
2025-08-18[NFC][libclc] add missing __CLC_ prefix all internal macros (#153523)Wenju He665-2470/+2500
This unifies naming scheme of macros to address review comment https://github.com/intel/llvm/pull/19779#discussion_r2272194357 math constant value macros are not changed, e.g. `#define AU0 -9.86494292470009928597e-03`
2025-08-18[libclc] Fix out-of-bound value for workitem functions according to OpenCL ↵Wenju He4-4/+4
spec (#153784)
2025-08-12[libclc] Add __attribute__((const)) to functions that don't access memory ↵Wenju He80-151/+160
(#152456) Before this PR, PostOrderFunctionAttrsPass in opt run can deduce memory(none) for these functions. This PR explicitly adds the attribute to align with Clang's OpenCL headers and ensures the attribute is present throughout the compilation flow. Generated bitcode files amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc become slightly smaller.
2025-08-11[libclc] Fix libclc install on Windows when MSVC generator is used (#152703)Wenju He1-8/+29
Fix a regression of df7473673214. cmake MSVC generator is multiple configurations. Build type is not known at configure time and CMAKE_CFG_INTDIR is evaluated to $(Configuration) at configure time. libclc install fails since $(Configuration) in bitcode file path is unresolved in libclc/cmake_install.cmake at install time. We need a solution that resolves libclc bitcode file path at install time. This PR fixes the issue using CMAKE_INSTALL_CONFIG_NAME which can be evaluated at install time. This is the same solution as in https://reviews.llvm.org/D76827
2025-08-11[libclc] Implement clc_log/sinpi/sqrt with __nv_* functions (#150174)Wenju He7-0/+180
This is to upstream implementations in https://github.com/intel/llvm/tree/sycl/libclc/clc/lib/ptx-nvidiacl/math