| Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
Summary:
The code object version defines the ABI the HSA executables follow. This
prevents it from hard-coding it in the OpenCL libraries and preventing
users from overriding it when using them.
|
|
Replace __clc_max/min with __clc_fmax/fmin in __clc_clamp. FP
__clc_min/max/clamp now lowers to @llvm.minimumnum/@llvm.maximumnum, and
integer clamp lowers to @llvm.umin/@llvm.umax. This reduce fcmp+select
chains and improving codegen. Example change to amdgcn--amdhsa.bc:
```
in function _Z5clamphhh:
> %4 = icmp ugt i8 %0, %2
%4 = tail call noundef i8 @llvm.umax.i8(i8 %0, i8 %1)
> %6 = select i1 %4, i8 %2, i8 %5
> ret i8 %6
< %5 = tail call noundef i8 @llvm.umin.i8(i8 %2, i8 %4)
< ret i8 %5
in function _Z5clampddd:
in block %3 / %3:
> %4 = fcmp ogt double %0, %2
> %5 = fcmp olt double %0, %1
> %6 = select i1 %5, double %1, double %0
> %7 = select i1 %4, double %2, double %6
> ret double %7
< %4 = tail call noundef double @llvm.maximumnum.f64(double %0, double %1)
< %5 = tail call noundef double @llvm.minimumnum.f64(double %4, double %2)
< ret double %5
```
|
|
(#172634)
Remove the Python dependency for generating convert builtins, aligning
with how other builtins are defined.
In addition, our downstream target relies on this PR to override convert
implementations.
llvm-diff shows no changes to all bitcodes:
amdgcn--amdhsa.bc, barts-r600--.bc, cayman-r600--.bc, cedar-r600--.bc,
clspv64--.bc, clspv--.bc, cypress-r600--.bc, nvptx64--.bc,
nvptx64--nvidiacl.bc, nvptx--.bc, nvptx--nvidiacl.bc, tahiti-amdgcn--.bc
and tahiti-amdgcn-mesa-mesa3d.bc.
|
|
|
|
Update as_type functions
|
|
This is follow-up of comment
https://github.com/llvm/llvm-project/pull/168318#discussion_r2588117855
libclc OpenCL library is already compiled with flag
`-fdeclare-opencl-builtins -finclude-default-header`.
|
|
Implement atomic_*_explicit (e.g. atomic_store_explicit) with
memory_order plus optional memory_scope.
OpenCL memory_order maps 1:1 to Clang (e.g. OpenCL memory_order_relaxed
== Clang __ATOMIC_RELAXED), so we pass it unchanged to clc_atomic_*
function which forwards to Clang _scoped_atomic* builtins.
Other changes:
* Add __opencl_get_clang_memory_scope helper in opencl/utils.h (OpenCL
scope -> Clang scope).
* Correct atomic_compare_exchange return type to bool.
* Fix atomic_compare_exchange to return true when value stored in the
pointer equals expected value.
* Remove volatile from CLC functions so that volatile isn't present in
LLVM IR.
* Add '-fdeclare-opencl-builtins -finclude-default-header' flag to
include
declaration of memory_scope. Some constants in libclc are already
provided
by Clang’s OpenCL header; disable those in OpenCL library build and
enable them only for CLC library build.
|
|
The function `__opencl_get_memory_scope` incorrectly assumed that the
Clang built-in `__MEMORY_SCOPE_*` macros defined as bitmasks, while they
are actually defined as distinct integer values. This led to incorrect
mapping of OpenCL memory fence flags to LLVM memory scopes, causing
issues in generated code.
The fix involves updating the `__opencl_get_memory_scope` function to
return the correct `__MEMORY_SCOPE_*` values based on the provided
`cl_mem_fence_flags`. Additionally, the `__opencl_get_memory_semantics`
and the `__opencl_get_memory_scope` functions are marked as `static`
to avoid potential multiple definition issues during linking.
|
|
|
|
The `bitfield_insert` function in the OpenCL C library had an incorrect
`__CLC_BODY` definition, that included the `.inc` file for the
`__clc_bitfield_insert` declaration instead of the correct
implementation. So, the function was not defined at all, leading to
linker errors when trying to use it.
|
|
(#168327)
|
|
(#168329)
|
|
(#168325)
Main changes:
* OpenCL legacy atom/atomic builtins now call CLC atomic functions
(which use Clang __scoped_atomic_*), replacing previous Clang __sync_*
functions.
* Change memory order from seq_cst to relaxed; keep device scope (spec
permits broader than workgroup). LLVM IR for _Z8atom_decPU3AS1Vi in
amdgcn--amdhsa.bc:
Before:
%2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1
syncscope("agent") seq_cst
After:
%2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1
syncscope("agent") monotonic
* Also adds OpenCL 1.0 atom_* variants without volatile on the pointer.
They are added for backward compatibility.
|
|
|
|
When pointer element type is casted to integer type, the stored value
should be casted to integer type to avoid type mistmatch. LLVM IR change
in function _Z18__clc_atomic_storePU3AS1Vffii:
> %5 = bitcast float %1 to i32 (New)
< %5 = fptosi float %1 to i32 (Old)
|
|
__clc_cbrt functions are declared in clc_cbrt.inc. Rename to .h for
consistency with other headers.
|
|
Previous implementation was cmp, select and @llvm.smax sequence in LLVM IR.
__CLC_GEN_U/__CLC_GEN_S is upstreamed from intel/llvm repo.
|
|
(#164527)
To clarify which builtin set has no bytecode files.
|
|
0x7ff0000000000000 is +inf. Change it to quiet nan 0x7ff8000000000000.
|
|
Implementation doesn't change.
|
|
clc_sincos_helpers.cl (#164028)
inline functions defined in clc_sincos_piby4.inc miss static specifier
and are deleted by EliminateAvailableExternallyPass when not inlined.
This PR fix the problem by removing inline and moving function
definition
into clc/lib/generic/math/clc_sincos_helpers.cl. It makes sense to put
all sin/cos helpers definitions in one file clc_sincos_helpers.cl.
|
|
in-tree build (#163896)
Commit df7473673214b placed libclc libraries into clang resource dir
<resource-dir>/lib/libclc at build stage.
This PR does it at install stage as well.
Note that in standalone (not in-tree) build, libclc is still installed
to old ${CMAKE_INSTALL_DATADIR}/clc dir.
|
|
The variabls should be evaluated before checking for empty.
|
|
The flag was added in 8ef48d07efa3 to suppress build warning and is no
longer needed.
It adds "no-builtins" attribute, which prevents libclc functions from
being inlined into caller that don't have the attribute.
The flag is meant to prevent folding standard library calls into
optimized implementations. For libclc device targets, however, such
target‑driven folding is desirable.
llvm-diff shows no change to amdgcn--amdhsa.bc and nvptx--nvidiacl.bc.
Co-authored-by: Mészáros Gergely <gergely.meszaros@intel.com>
|
|
|
|
Wenju He has been active on the libclc project for a while now and has
been contributing to the overall health and steering the future of the
project.
|
|
Change my email address in the process. I will not be able to keep up
maintainership duties on this project in the future.
Adding the wording on the inactive maintainers section myself like this
feels self-aggrandizing but was copied from other LLVM projects.
|
|
make shuffle helpers static (#160036)
* Replace call-site check with external declaration scan (grep declare)
to avoid false positives for not-inlined __clc_* functions.
* _clc_get_el* helpers are defined as inline in clc_shuffle2.cl, so they
have available_externally attribute. When they fail to inline they are
deleted by EliminateAvailableExternallyPass and become unresolved in
cedar-r600--.bc. Mark them static to resolve the issue.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
|
Summary:
The added bit counting builtins for vectors used `cttz` and `ctlz`,
which is consistent with the LLVM naming convention. However, these are
clang builtins and implement exactly the `__builtin_ctzg` and
`__builtin_clzg` behavior. It is confusing to people familiar with other
other builtins that these are the only bit counting intrinsics named
differently. This includes the additional operation for the undefined
zero case, which was added as a `clzg` extension.
|
|
always_inline doesn't guarantee performance improvement.
Target-specific optimizations decide whether inlining is profitable.
Changes to amdgcn--amdhsa.bc:
* _Z9__clc_logDv16_f and _Z15__clc_remainderDv16_fS_ are not inlined.
* sincos vector function code size has doubled due to apparent
duplication.
Also replace typo _CLC_DECL with _CLC_DEF for function definition.
|
|
LLVM_ENABLE_PROJECTS=libclc is deprecated, see
https://github.com/llvm/llvm-project/blob/a2a9601ea49a/llvm/CMakeLists.txt#L223-L228
|
|
This fixes `No such file or directory` error when "Unix Makefiles"
generator is used, see https://github.com/intel/llvm/issues/20058.
Ninja generator implicitly creates output directory when generating
libclc libraries, but "Unix Makefiles" generator does not.
|
|
This doesn't need to pretend to support multiple versions of llvm
and these are old anyway.
|
|
unary_def_with_ptr_scalarize.inc (#157002)
Commit d50f2ef437ae removes _CLC_V_V_VP_VECTORIZE from header file, but
the macro is still used in our downstream code:
https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/modf.cl#L30
https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/sincos.cl#L31
We can either revert d50f2ef437ae or replace macro with use of
unary_def_with_ptr_scalarize.inc. This PR uses the latter approach.
|
|
is large (#157055)
This PR reduces amdgcn--amdhsa.bc size by 1.8% and nvptx64--nvidiacl.bc
size by 4%.
Loop trip count is constant and backend can decide whether to unroll.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
|
using weak linkage (#156778)
Before this PR, weak linkage is applied to a few CLC generic functions
to allow target specific implementation to override generic one.
However, adding weak linkage has a side effect of preventing
inter-procedural optimization, such as PostOrderFunctionAttrsPass,
because weak function doesn't have exact definition (as determined by
hasExactDefinition in the pass).
This PR resolves the issue by adding --override flag for every
non-generic bitcode file in llvm-link run. This approach eliminates the
need for weak linkage while still allowing target-specific
implementation to override generic one.
llvm-diff shows imporoved attribute deduction for some functions in
amdgcn--amdhsa.bc, e.g.
%23 = tail call half @llvm.sqrt.f16(half %22)
=>
%23 = tail call noundef half @llvm.sqrt.f16(half %22)
|
|
(#156789)
Our downstream libclc add a few more targets that customizes build_flags
and opt_flags. Then in each customization block, MACRO_ARCH is defined
to be ${ARCH}.
Hoisting MACRO_ARCH definition out of if-else-end block avoids code
duplication. This also avoids potential error when MACRO_ARCH definition
is forgotten, e.g. in https://github.com/intel/llvm/pull/19971.
|
|
M_LOG210 (#156590)
|
|
delete clcmacro.h (#156280)
clcmacro.h only defines _CLC_V_V_VP_VECTORIZE which is only used in
clc/lib/generic/math/clc_lgamma_r.cl.
|
|
__builtin_amdgcn_fence for AMDGPU (#152275)
It is necessary to add MemorySemantic argument for AMDGPU which means
the memory or address space to which the memory ordering is applied.
The MemorySemantic is also necessary for implementing the SPIR-V
MemoryBarrier instruction. Additionally, the implementation of
__clc_mem_fence on Intel GPUs requires the MemorySemantic argument.
Using __builtin_amdgcn_fence for AMDGPU is follow-up of
https://github.com/llvm/llvm-project/pull/151446#discussion_r2254006508
llvm-diff shows no change to nvptx64--nvidiacl.bc.
|
|
`get_clang_resource_dir` is not guarantee to be there. Make sure of it
by including `GetClangResourceDir`.
|
|
generator (#154479)
libclc sequential build issue addressed in commit 0c21d6b4c8ad is
specific to cmake MSVC generator. Therefore, this PR avoids creating a
large number of targets when a non-MSVC generator is used, such as the
Ninja generator, which is used in pre-merge CI on Windows in
llvm-project repo. We plan to migrate from MSVC generator to Ninja
generator in our downstream CI to fix flaky cmake bug `Cannot restore
timestamp`, which might be related to the large number of targets.
|
|
Using the elementwise builtin optimizes the vector case; instead of
scalarizing we can compile directly to the vector intrinsics.
|
|
amdgcn (#153785)
This simplifies downstream refactoring of libspirv workitem function in
https://github.com/intel/llvm/tree/sycl/libclc/libspirv/lib/generic
|
|
This unifies naming scheme of macros to address review comment
https://github.com/intel/llvm/pull/19779#discussion_r2272194357
math constant value macros are not changed, e.g.
`#define AU0 -9.86494292470009928597e-03`
|
|
spec (#153784)
|
|
(#152456)
Before this PR, PostOrderFunctionAttrsPass in opt run can deduce
memory(none) for these functions.
This PR explicitly adds the attribute to align with Clang's OpenCL
headers and ensures the attribute is present throughout the compilation
flow. Generated bitcode files amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc
become slightly smaller.
|
|
Fix a regression of df7473673214.
cmake MSVC generator is multiple configurations. Build type is not known
at configure time and CMAKE_CFG_INTDIR is evaluated to $(Configuration)
at configure time. libclc install fails since $(Configuration) in
bitcode file path is unresolved in libclc/cmake_install.cmake at install time.
We need a solution that resolves libclc bitcode file path at install
time. This PR fixes the issue using CMAKE_INSTALL_CONFIG_NAME which can
be evaluated at install time. This is the same solution as in
https://reviews.llvm.org/D76827
|
|
This is to upstream implementations in
https://github.com/intel/llvm/tree/sycl/libclc/clc/lib/ptx-nvidiacl/math
|