riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2026-01-05	[libclc] Fix incorrect argument format	Joseph Huber	1	-1/+1

2026-01-05	[libclc] Suppress AMDGCN code object version for library code (#174412)	Joseph Huber	1	-0/+3
	Summary: The code object version defines the ABI the HSA executables follow. This prevents it from hard-coding it in the OpenCL libraries and preventing users from overriding it when using them.
2025-12-19	[libclc] Improve __clc_min/max/clamp implementation (#172599)	Wenju He	6	-5/+43
	Replace __clc_max/min with __clc_fmax/fmin in __clc_clamp. FP __clc_min/max/clamp now lowers to @llvm.minimumnum/@llvm.maximumnum, and integer clamp lowers to @llvm.umin/@llvm.umax. This reduce fcmp+select chains and improving codegen. Example change to amdgcn--amdhsa.bc: ``` in function _Z5clamphhh: > %4 = icmp ugt i8 %0, %2 %4 = tail call noundef i8 @llvm.umax.i8(i8 %0, i8 %1) > %6 = select i1 %4, i8 %2, i8 %5 > ret i8 %6 < %5 = tail call noundef i8 @llvm.umin.i8(i8 %2, i8 %4) < ret i8 %5 in function _Z5clampddd: in block %3 / %3: > %4 = fcmp ogt double %0, %2 > %5 = fcmp olt double %0, %1 > %6 = select i1 %5, double %1, double %0 > %7 = select i1 %4, double %2, double %6 > ret double %7 < %4 = tail call noundef double @llvm.maximumnum.f64(double %0, double %1) < %5 = tail call noundef double @llvm.minimumnum.f64(double %4, double %2) < ret double %5 ```
2025-12-19	[libclc][NFC] Move convert builtins from Python generator to .cl sources ↵	Wenju He	25	-621/+1311
	(#172634) Remove the Python dependency for generating convert builtins, aligning with how other builtins are defined. In addition, our downstream target relies on this PR to override convert implementations. llvm-diff shows no changes to all bitcodes: amdgcn--amdhsa.bc, barts-r600--.bc, cayman-r600--.bc, cedar-r600--.bc, clspv64--.bc, clspv--.bc, cypress-r600--.bc, nvptx64--.bc, nvptx64--nvidiacl.bc, nvptx--.bc, nvptx--nvidiacl.bc, tahiti-amdgcn--.bc and tahiti-amdgcn-mesa-mesa3d.bc.
2025-12-11	[libclc] use clc functions in clspv/shared/vstore_half.cl (#171770)	Romaric Jodin	1	-12/+18

2025-12-09	[libclc] fix clspv/shared/vstore_half.cl (#171105)	Romaric Jodin	1	-11/+12
	Update as_type functions
2025-12-09	[NFC][libclc] Delete OpenCL builtin declarations (#170803)	Wenju He	476	-5389/+31
	This is follow-up of comment https://github.com/llvm/llvm-project/pull/168318#discussion_r2588117855 libclc OpenCL library is already compiled with flag `-fdeclare-opencl-builtins -finclude-default-header`.
2025-12-05	[libclc] Add OpenCL atomic_*_explicit builtins (#168318)	Wenju He	64	-337/+364
	Implement atomic__explicit (e.g. atomic_store_explicit) with memory_order plus optional memory_scope. OpenCL memory_order maps 1:1 to Clang (e.g. OpenCL memory_order_relaxed == Clang __ATOMIC_RELAXED), so we pass it unchanged to clc_atomic_ function which forwards to Clang _scoped_atomic* builtins. Other changes: * Add __opencl_get_clang_memory_scope helper in opencl/utils.h (OpenCL scope -> Clang scope). * Correct atomic_compare_exchange return type to bool. * Fix atomic_compare_exchange to return true when value stored in the pointer equals expected value. * Remove volatile from CLC functions so that volatile isn't present in LLVM IR. * Add '-fdeclare-opencl-builtins -finclude-default-header' flag to include declaration of memory_scope. Some constants in libclc are already provided by Clang’s OpenCL header; disable those in OpenCL library build and enable them only for CLC library build.
2025-12-04	[libclc] Fix memory fence scope mapping for OpenCL (#170542)	Victor Mustya	1	-6/+5
	The function `__opencl_get_memory_scope` incorrectly assumed that the Clang built-in `__MEMORY_SCOPE_` macros defined as bitmasks, while they are actually defined as distinct integer values. This led to incorrect mapping of OpenCL memory fence flags to LLVM memory scopes, causing issues in generated code. The fix involves updating the `__opencl_get_memory_scope` function to return the correct `__MEMORY_SCOPE_` values based on the provided `cl_mem_fence_flags`. Additionally, the `__opencl_get_memory_semantics` and the `__opencl_get_memory_scope` functions are marked as `static` to avoid potential multiple definition issues during linking.
2025-12-02	libclc: Fix build in atomic_def.inc (#170306)	Matt Arsenault	1	-1/+2

2025-12-02	[libclc] Fix bitfield_insert implementation (#170208)	Victor Mustya	1	-1/+1
	The `bitfield_insert` function in the OpenCL C library had an incorrect `__CLC_BODY` definition, that included the `.inc` file for the `__clc_bitfield_insert` declaration instead of the correct implementation. So, the function was not defined at all, leading to linker errors when trying to use it.
2025-11-26	[libclc] Use __scoped_atomic_udec/uinc_wrap to implement _clc_atomic_dec/inc ↵	Wenju He	3	-3/+3
	(#168327)
2025-11-25	[libclc] Add atomic_init, atomic_flag_clear and atomic_flag_test_and_set ↵	Wenju He	16	-0/+541
	(#168329)
2025-11-19	[libclc] Use CLC atomic functions for legacy OpenCL atom/atomic builtins ↵	Wenju He	26	-265/+190
	(#168325) Main changes: * OpenCL legacy atom/atomic builtins now call CLC atomic functions (which use Clang __scoped_atomic_), replacing previous Clang __sync_ functions. * Change memory order from seq_cst to relaxed; keep device scope (spec permits broader than workgroup). LLVM IR for _Z8atom_decPU3AS1Vi in amdgcn--amdhsa.bc: Before: %2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1 syncscope("agent") seq_cst After: %2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1 syncscope("agent") monotonic * Also adds OpenCL 1.0 atom_* variants without volatile on the pointer. They are added for backward compatibility.
2025-11-17	[libclc] Fix link to source in index.html (#167494)	Konrad Kleine	1	-1/+1

2025-11-13	[libclc] Fix floating-point __clc_atomic_store/exchange cast mismatch (#167625)	Wenju He	4	-17/+24
	When pointer element type is casted to integer type, the stored value should be casted to integer type to avoid type mistmatch. LLVM IR change in function _Z18__clc_atomic_storePU3AS1Vffii: > %5 = bitcast float %1 to i32 (New) < %5 = fptosi float %1 to i32 (Old)
2025-11-05	[NFC][libclc] Rename clc_cbrt.inc to clc_cbrt.h (#166330)	Wenju He	3	-1/+2
	__clc_cbrt functions are declared in clc_cbrt.inc. Rename to .h for consistency with other headers.
2025-10-27	[libclc] Implement integer __clc_abs using __builtin_elementwise_abs (#164957)	Wenju He	2	-2/+26
	Previous implementation was cmp, select and @llvm.smax sequence in LLVM IR. __CLC_GEN_U/__CLC_GEN_S is upstreamed from intel/llvm repo.
2025-10-22	[NFC][libclc] Improve empty builtins error: include ARCH_SUFFIX in message ↵	Wenju He	1	-1/+1
	(#164527) To clarify which builtin set has no bytecode files.
2025-10-21	[libclc] Fix double NAN_MASK in __clc_nan (#163522)	Wenju He	1	-1/+1
	0x7ff0000000000000 is +inf. Change it to quiet nan 0x7ff8000000000000.
2025-10-20	[NFC][libclc] Simplify degrees, radians and smoothstep macros (#164203)	Wenju He	9	-145/+97
	Implementation doesn't change.
2025-10-20	[libclc] Move functions definition from header clc_sincos_piby4.inc into ↵	Wenju He	12	-194/+174
	clc_sincos_helpers.cl (#164028) inline functions defined in clc_sincos_piby4.inc miss static specifier and are deleted by EliminateAvailableExternallyPass when not inlined. This PR fix the problem by removing inline and moving function definition into clc/lib/generic/math/clc_sincos_helpers.cl. It makes sense to put all sin/cos helpers definitions in one file clc_sincos_helpers.cl.
2025-10-20	[libclc] Change libclc install dir to ${clang_resource_dir}/lib/libclc in ↵	Wenju He	2	-3/+6
	in-tree build (#163896) Commit df7473673214b placed libclc libraries into clang resource dir <resource-dir>/lib/libclc at build stage. This PR does it at install stage as well. Note that in standalone (not in-tree) build, libclc is still installed to old ${CMAKE_INSTALL_DATADIR}/clc dir.
2025-10-16	[NFC][libclc] Add missing evaluation for variable ${tool}_target (#163540)	Wenju He	1	-1/+1
	The variabls should be evaluated before checking for empty.
2025-10-07	[libclc] Remove -fno-builtin from compile options (#162075)	Wenju He	1	-1/+1
	The flag was added in 8ef48d07efa3 to suppress build warning and is no longer needed. It adds "no-builtins" attribute, which prevents libclc functions from being inlined into caller that don't have the attribute. The flag is meant to prevent folding standard library calls into optimized implementations. For libclc device targets, however, such target‑driven folding is desirable. llvm-diff shows no change to amdgcn--amdhsa.bc and nvptx--nvidiacl.bc. Co-authored-by: Mészáros Gergely <gergely.meszaros@intel.com>
2025-10-06	libclc: Add gfx1250 and gfx1251 to amdgpu target list (#162034)	Matt Arsenault	1	-1/+1

2025-09-29	[libclc] Propose new libclc maintainer (#161141)	Fraser Cormack	1	-0/+3
	Wenju He has been active on the libclc project for a while now and has been contributing to the overall health and steering the future of the project.
2025-09-29	[libclc] Move myself to the list of inactive maintainers	Fraser Cormack	1	-3/+9
	Change my email address in the process. I will not be able to keep up maintainership duties on this project in the future. Adding the wording on the inactive maintainers section myself like this feels self-aggrandizing but was copied from other LLVM projects.
2025-09-22	[libclc] Fix ctest failures after 7f3661128b1e: adjust external check and ↵	Wenju He	4	-10/+10
	make shuffle helpers static (#160036) * Replace call-site check with external declaration scan (grep declare) to avoid false positives for not-inlined __clc_* functions. * _clc_get_el* helpers are defined as inline in clc_shuffle2.cl, so they have available_externally attribute. When they fail to inline they are deleted by EliminateAvailableExternallyPass and become unresolved in cedar-r600--.bc. Mark them static to resolve the issue. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-19	[Clang] Rename elementwise builtins to `clzg` and `ctzg` (#157128)	Joseph Huber	2	-2/+2
	Summary: The added bit counting builtins for vectors used `cttz` and `ctlz`, which is consistent with the LLVM naming convention. However, these are clang builtins and implement exactly the `__builtin_ctzg` and `__builtin_clzg` behavior. It is confusing to people familiar with other other builtins that these are the only bit counting intrinsics named differently. This includes the additional operation for the undefined zero case, which was added as a `clzg` extension.
2025-09-18	[libclc] Remove __attribute__((always_inline)) (#158791)	Wenju He	6	-23/+19
	always_inline doesn't guarantee performance improvement. Target-specific optimizations decide whether inlining is profitable. Changes to amdgcn--amdhsa.bc: * _Z9__clc_logDv16_f and _Z15__clc_remainderDv16_fS_ are not inlined. * sincos vector function code size has doubled due to apparent duplication. Also replace typo _CLC_DECL with _CLC_DEF for function definition.
2025-09-15	[libclc][NFC] Update README.md to use runtime build (#158283)	Wenju He	1	-2/+2
	LLVM_ENABLE_PROJECTS=libclc is deprecated, see https://github.com/llvm/llvm-project/blob/a2a9601ea49a/llvm/CMakeLists.txt#L223-L228
2025-09-12	[libclc] Create LIBCLC_OUTPUT_LIBRARY_DIR directory before build (#158171)	Wenju He	1	-0/+1
	This fixes `No such file or directory` error when "Unix Makefiles" generator is used, see https://github.com/intel/llvm/issues/20058. Ninja generator implicitly creates output directory when generating libclc libraries, but "Unix Makefiles" generator does not.
2025-09-12	libclc: Remove HAVE_LLVM version macros (#158257)	Matt Arsenault	2	-21/+0
	This doesn't need to pretend to support multiple versions of llvm and these are old anyway.
2025-09-09	[NFC][libclc] Replace _CLC_V_V_VP_VECTORIZE macro with use of ↵	Wenju He	2	-62/+115
	unary_def_with_ptr_scalarize.inc (#157002) Commit d50f2ef437ae removes _CLC_V_V_VP_VECTORIZE from header file, but the macro is still used in our downstream code: https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/modf.cl#L30 https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/sincos.cl#L31 We can either revert d50f2ef437ae or replace macro with use of unary_def_with_ptr_scalarize.inc. This PR uses the latter approach.
2025-09-05	[libclc] Implement erf/erfc vector function with loop since scalar function ↵	Wenju He	3	-2/+30
	is large (#157055) This PR reduces amdgcn--amdhsa.bc size by 1.8% and nvptx64--nvidiacl.bc size by 4%. Loop trip count is constant and backend can decide whether to unroll. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-05	[libclc] Override generic symbol using llvm-link --override flag instead of ↵	Wenju He	4	-12/+23
	using weak linkage (#156778) Before this PR, weak linkage is applied to a few CLC generic functions to allow target specific implementation to override generic one. However, adding weak linkage has a side effect of preventing inter-procedural optimization, such as PostOrderFunctionAttrsPass, because weak function doesn't have exact definition (as determined by hasExactDefinition in the pass). This PR resolves the issue by adding --override flag for every non-generic bitcode file in llvm-link run. This approach eliminates the need for weak linkage while still allowing target-specific implementation to override generic one. llvm-diff shows imporoved attribute deduction for some functions in amdgcn--amdhsa.bc, e.g. %23 = tail call half @llvm.sqrt.f16(half %22) => %23 = tail call noundef half @llvm.sqrt.f16(half %22)
2025-09-05	[NFC][libclc] Set MACRO_ARCH to ${ARCH} uncondionally before customizing ↵	Wenju He	1	-1/+1
	(#156789) Our downstream libclc add a few more targets that customizes build_flags and opt_flags. Then in each customization block, MACRO_ARCH is defined to be ${ARCH}. Hoisting MACRO_ARCH definition out of if-else-end block avoids code duplication. This also avoids potential error when MACRO_ARCH definition is forgotten, e.g. in https://github.com/intel/llvm/pull/19971.
2025-09-05	[NFC][libclc] Remove unused -DCLC_INTERNAL build flag, remove unused ↵	Wenju He	2	-7/+1
	M_LOG210 (#156590)
2025-09-03	[NFC][libclc] Move _CLC_V_V_VP_VECTORIZE macro into clc_lgamma_r.cl and ↵	Wenju He	48	-116/+55
	delete clcmacro.h (#156280) clcmacro.h only defines _CLC_V_V_VP_VECTORIZE which is only used in clc/lib/generic/math/clc_lgamma_r.cl.
2025-09-01	[libclc] update __clc_mem_fence: add MemorySemantic arg and use ↵	Wenju He	12	-44/+108
	__builtin_amdgcn_fence for AMDGPU (#152275) It is necessary to add MemorySemantic argument for AMDGPU which means the memory or address space to which the memory ordering is applied. The MemorySemantic is also necessary for implementing the SPIR-V MemoryBarrier instruction. Additionally, the implementation of __clc_mem_fence on Intel GPUs requires the MemorySemantic argument. Using __builtin_amdgcn_fence for AMDGPU is follow-up of https://github.com/llvm/llvm-project/pull/151446#discussion_r2254006508 llvm-diff shows no change to nvptx64--nvidiacl.bc.
2025-08-28	libclc: CMake: include GetClangResourceDir (#155836)	Romaric Jodin	1	-0/+1
	`get_clang_resource_dir` is not guarantee to be there. Make sure of it by including `GetClangResourceDir`.
2025-08-22	[libclc] Only create a target per each compile command for cmake MSVC ↵	Wenju He	1	-21/+48
	generator (#154479) libclc sequential build issue addressed in commit 0c21d6b4c8ad is specific to cmake MSVC generator. Therefore, this PR avoids creating a large number of targets when a non-MSVC generator is used, such as the Ninja generator, which is used in pre-merge CI on Windows in llvm-project repo. We plan to migrate from MSVC generator to Ninja generator in our downstream CI to fix flaky cmake bug `Cannot restore timestamp`, which might be related to the large number of targets.
2025-08-21	[libclc] Use elementwise ctlz/cttz builtins for CLC clz/ctz (#154535)	Fraser Cormack	4	-64/+24
	Using the elementwise builtin optimizes the vector case; instead of scalarizing we can compile directly to the vector intrinsics.
2025-08-19	[libclc] Implement __clc_get_local_size/__clc_get_max_sub_group_size for ↵	Wenju He	3	-0/+37
	amdgcn (#153785) This simplifies downstream refactoring of libspirv workitem function in https://github.com/intel/llvm/tree/sycl/libclc/libspirv/lib/generic
2025-08-18	[NFC][libclc] add missing __CLC_ prefix all internal macros (#153523)	Wenju He	665	-2470/+2500
	This unifies naming scheme of macros to address review comment https://github.com/intel/llvm/pull/19779#discussion_r2272194357 math constant value macros are not changed, e.g. `#define AU0 -9.86494292470009928597e-03`
2025-08-18	[libclc] Fix out-of-bound value for workitem functions according to OpenCL ↵	Wenju He	4	-4/+4
	spec (#153784)
2025-08-12	[libclc] Add __attribute__((const)) to functions that don't access memory ↵	Wenju He	80	-151/+160
	(#152456) Before this PR, PostOrderFunctionAttrsPass in opt run can deduce memory(none) for these functions. This PR explicitly adds the attribute to align with Clang's OpenCL headers and ensures the attribute is present throughout the compilation flow. Generated bitcode files amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc become slightly smaller.
2025-08-11	[libclc] Fix libclc install on Windows when MSVC generator is used (#152703)	Wenju He	1	-8/+29
	Fix a regression of df7473673214. cmake MSVC generator is multiple configurations. Build type is not known at configure time and CMAKE_CFG_INTDIR is evaluated to $(Configuration) at configure time. libclc install fails since $(Configuration) in bitcode file path is unresolved in libclc/cmake_install.cmake at install time. We need a solution that resolves libclc bitcode file path at install time. This PR fixes the issue using CMAKE_INSTALL_CONFIG_NAME which can be evaluated at install time. This is the same solution as in https://reviews.llvm.org/D76827
2025-08-11	[libclc] Implement clc_log/sinpi/sqrt with __nv_* functions (#150174)	Wenju He	7	-0/+180
	This is to upstream implementations in https://github.com/intel/llvm/tree/sycl/libclc/clc/lib/ptx-nvidiacl/math