diff options
author | Jakub Jelinek <jakub@redhat.com> | 2023-11-14 10:38:56 +0100 |
---|---|---|
committer | Jakub Jelinek <jakub@redhat.com> | 2023-11-14 10:52:16 +0100 |
commit | 7383cb56e1170789929201b0dadc156888928fdd (patch) | |
tree | b191fcf3c926db3ed83620f9d2dcd56384be6e63 /gcc/builtins.cc | |
parent | fe23a2ff1f5072559552be0e41ab55bf72f5c79f (diff) | |
download | gcc-7383cb56e1170789929201b0dadc156888928fdd.zip gcc-7383cb56e1170789929201b0dadc156888928fdd.tar.gz gcc-7383cb56e1170789929201b0dadc156888928fdd.tar.bz2 |
Add type-generic clz/ctz/clrsb/ffs/parity/popcount builtins [PR111309]
The following patch adds 6 new type-generic builtins,
__builtin_clzg
__builtin_ctzg
__builtin_clrsbg
__builtin_ffsg
__builtin_parityg
__builtin_popcountg
The g at the end stands for generic because the unsuffixed variant
of the builtins already have unsigned int or int arguments.
The main reason to add these is to support arbitrary unsigned (for
clrsb/ffs signed) bit-precise integer types and also __int128 which
wasn't supported by the existing builtins, so that e.g. <stdbit.h>
type-generic functions could then support not just bit-precise unsigned
integer type whose width matches a standard or extended integer type,
but others too.
None of these new builtins promote their first argument, so the argument
can be e.g. unsigned char or unsigned short or unsigned __int20 etc.
The first 2 support either 1 or 2 arguments, if only 1 argument is supplied,
the behavior is undefined for argument 0 like for other __builtin_c[lt]z*
builtins, if 2 arguments are supplied, the second argument should be int
that will be returned if the argument is 0. All other builtins have
just one argument. For __builtin_clrsbg and __builtin_ffsg the argument
shall be any signed standard/extended or bit-precise integer, for the others
any unsigned standard/extended or bit-precise integer (bool not allowed).
One possibility would be to also allow signed integer types for
the clz/ctz/parity/popcount ones (and just cast the argument to
unsigned_type_for during folding) and similarly unsigned integer types
for the clrsb/ffs ones, dunno what is better; for stdbit.h the current
version is sufficient and diagnoses use of the inappropriate sign,
though on the other side I wonder if users won't be confused by
__builtin_clzg (1) being an error and having to write __builtin_clzg (1U).
The new builtins are lowered to corresponding builtins with other suffixes
or internal calls (plus casts and adjustments where needed) during FE
folding or during gimplification at latest, the non-suffixed builtins
handling precisions up to precision of int, l up to precision of long,
ll up to precision of long long, up to __int128 precision lowered to
double-word expansion early and the rest (which must be _BitInt) lowered
to internal fn calls - those are then lowered during bitint lowering pass.
The patch also changes representation of IFN_CLZ and IFN_CTZ calls,
previously they were in the IL only if they are directly supported optab
and depending on C[LT]Z_DEFINED_VALUE_AT_ZERO (...) == 2 they had or didn't
have defined behavior at 0, now they are in the IL either if directly
supported optab, or for the large/huge BITINT_TYPEs and they have either
1 or 2 arguments. If one, the behavior is undefined at zero, if 2, the
second argument is an int constant that should be returned for 0.
As there is no extra support during expansion, for directly supported optab
the second argument if present should still match the
C[LT]Z_DEFINED_VALUE_AT_ZERO (...) == 2 value, but for BITINT_TYPE arguments
it can be arbitrary int INTEGER_CST.
The indended uses in stdbit.h are e.g.
#ifdef __has_builtin
#if __has_builtin(__builtin_clzg) && __has_builtin(__builtin_ctzg) && __has_builtin(__builtin_popcountg)
#define stdc_leading_zeros(value) \
((unsigned int) __builtin_clzg (value, __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0)))
#define stdc_leading_ones(value) \
((unsigned int) __builtin_clzg ((__typeof (value)) ~(value), __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0)))
#define stdc_first_trailing_one(value) \
((unsigned int) (__builtin_ctzg (value, -1) + 1))
#define stdc_trailing_zeros(value) \
((unsigned int) __builtin_ctzg (value, __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0)))
#endif
#endif
where __builtin_popcountg ((__typeof (x)) -1) computes the bit precision
of x's type (kind of _Bitwidthof (x) alternative).
They also allow casting of arbitrary unsigned _BitInt other than
unsigned _BitInt(1) to corresponding signed _BitInt by using
signed _BitInt(__builtin_popcountg ((__typeof (a)) -1))
and of arbitrary signed _BitInt to corresponding unsigned _BitInt
using unsigned _BitInt(__builtin_clrsbg ((__typeof (a)) -1) + 1).
2023-11-14 Jakub Jelinek <jakub@redhat.com>
PR c/111309
gcc/
* builtins.def (BUILT_IN_CLZG, BUILT_IN_CTZG, BUILT_IN_CLRSBG,
BUILT_IN_FFSG, BUILT_IN_PARITYG, BUILT_IN_POPCOUNTG): New
builtins.
* builtins.cc (fold_builtin_bit_query): New function.
(fold_builtin_1): Use it for
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
(fold_builtin_2): Use it for BUILT_IN_{CLZ,CTZ}G.
* fold-const-call.cc: Fix comment typo on tm.h inclusion.
(fold_const_call_ss): Handle
CFN_BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
(fold_const_call_sss): New function.
(fold_const_call_1): Call it for 2 argument functions returning
scalar when passed 2 INTEGER_CSTs.
* genmatch.cc (cmp_operand): For function calls also compare
number of arguments.
(fns_cmp): New function.
(dt_node::gen_kids): Sort fns and generic_fns.
(dt_node::gen_kids_1): Handle fns with the same id but different
number of arguments.
* match.pd (CLZ simplifications): Drop checks for defined behavior
at zero. Add variant of simplifications for IFN_CLZ with 2 arguments.
(CTZ simplifications): Drop checks for defined behavior at zero,
don't optimize precisions above MAX_FIXED_MODE_SIZE. Add variant of
simplifications for IFN_CTZ with 2 arguments.
(a != 0 ? CLZ(a) : CST -> .CLZ(a)): Use TREE_TYPE (@3) instead of
type, add BITINT_TYPE handling, create 2 argument IFN_CLZ rather than
one argument. Add variant for matching CLZ with 2 arguments.
(a != 0 ? CTZ(a) : CST -> .CTZ(a)): Similarly.
* gimple-lower-bitint.cc (bitint_large_huge::lower_bit_query): New
method.
(bitint_large_huge::lower_call): Use it for IFN_{CLZ,CTZ,CLRSB,FFS}
and IFN_{PARITY,POPCOUNT} calls.
* gimple-range-op.cc (cfn_clz::fold_range): Don't check
CLZ_DEFINED_VALUE_AT_ZERO for m_gimple_call_internal_p, instead
assume defined value at zero if the call has 2 arguments and use
second argument value for that case.
(cfn_ctz::fold_range): Similarly.
(gimple_range_op_handler::maybe_builtin_call): Use op_cfn_clz_internal
or op_cfn_ctz_internal only if internal fn call has 2 arguments and
set m_op2 in that case.
* tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern,
vect_recog_popcount_clz_ctz_ffs_pattern): For value defined at zero
use second argument of calls if present, otherwise assume UB at zero,
create 2 argument .CLZ/.CTZ calls if needed.
* tree-vect-stmts.cc (vectorizable_call): Handle 2 argument .CLZ/.CTZ
calls.
* tree-ssa-loop-niter.cc (build_cltz_expr): Create 2 argument
.CLZ/.CTZ calls if needed.
* tree-ssa-forwprop.cc (simplify_count_trailing_zeroes): Create 2
argument .CTZ calls if needed.
* tree-ssa-phiopt.cc (cond_removal_in_builtin_zero_pattern): Handle
2 argument .CLZ/.CTZ calls, handle BITINT_TYPE, create 2 argument
.CLZ/.CTZ calls.
* doc/extend.texi (__builtin_clzg, __builtin_ctzg, __builtin_clrsbg,
__builtin_ffsg, __builtin_parityg, __builtin_popcountg): Document.
gcc/c-family/
* c-common.cc (check_builtin_function_arguments): Handle
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
* c-gimplify.cc (c_gimplify_expr): If __builtin_c[lt]zg second
argument hasn't been folded into constant yet, transform it to one
argument call inside of a COND_EXPR which for first argument 0
returns the second argument.
gcc/c/
* c-typeck.cc (convert_arguments): Don't promote first argument
of BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
gcc/cp/
* call.cc (magic_varargs_p): Return 4 for
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
(build_over_call): Don't promote first argument of
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
* cp-gimplify.cc (cp_gimplify_expr): For BUILT_IN_C{L,T}ZG use
c_gimplify_expr.
gcc/testsuite/
* c-c++-common/pr111309-1.c: New test.
* c-c++-common/pr111309-2.c: New test.
* gcc.dg/torture/bitint-43.c: New test.
* gcc.dg/torture/bitint-44.c: New test.
Diffstat (limited to 'gcc/builtins.cc')
-rw-r--r-- | gcc/builtins.cc | 277 |
1 files changed, 277 insertions, 0 deletions
diff --git a/gcc/builtins.cc b/gcc/builtins.cc index cb90bd0..5ece0d2 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -9573,6 +9573,271 @@ fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode, return build2_loc (loc, COMPOUND_EXPR, boolean_type_node, store, ovfres); } +/* Fold __builtin_{clz,ctz,clrsb,ffs,parity,popcount}g into corresponding + internal function. */ + +static tree +fold_builtin_bit_query (location_t loc, enum built_in_function fcode, + tree arg0, tree arg1) +{ + enum internal_fn ifn; + enum built_in_function fcodei, fcodel, fcodell; + tree arg0_type = TREE_TYPE (arg0); + tree cast_type = NULL_TREE; + int addend = 0; + + switch (fcode) + { + case BUILT_IN_CLZG: + if (arg1 && TREE_CODE (arg1) != INTEGER_CST) + return NULL_TREE; + ifn = IFN_CLZ; + fcodei = BUILT_IN_CLZ; + fcodel = BUILT_IN_CLZL; + fcodell = BUILT_IN_CLZLL; + break; + case BUILT_IN_CTZG: + if (arg1 && TREE_CODE (arg1) != INTEGER_CST) + return NULL_TREE; + ifn = IFN_CTZ; + fcodei = BUILT_IN_CTZ; + fcodel = BUILT_IN_CTZL; + fcodell = BUILT_IN_CTZLL; + break; + case BUILT_IN_CLRSBG: + ifn = IFN_CLRSB; + fcodei = BUILT_IN_CLRSB; + fcodel = BUILT_IN_CLRSBL; + fcodell = BUILT_IN_CLRSBLL; + break; + case BUILT_IN_FFSG: + ifn = IFN_FFS; + fcodei = BUILT_IN_FFS; + fcodel = BUILT_IN_FFSL; + fcodell = BUILT_IN_FFSLL; + break; + case BUILT_IN_PARITYG: + ifn = IFN_PARITY; + fcodei = BUILT_IN_PARITY; + fcodel = BUILT_IN_PARITYL; + fcodell = BUILT_IN_PARITYLL; + break; + case BUILT_IN_POPCOUNTG: + ifn = IFN_POPCOUNT; + fcodei = BUILT_IN_POPCOUNT; + fcodel = BUILT_IN_POPCOUNTL; + fcodell = BUILT_IN_POPCOUNTLL; + break; + default: + gcc_unreachable (); + } + + if (TYPE_PRECISION (arg0_type) + <= TYPE_PRECISION (long_long_unsigned_type_node)) + { + if (TYPE_PRECISION (arg0_type) <= TYPE_PRECISION (unsigned_type_node)) + + cast_type = (TYPE_UNSIGNED (arg0_type) + ? unsigned_type_node : integer_type_node); + else if (TYPE_PRECISION (arg0_type) + <= TYPE_PRECISION (long_unsigned_type_node)) + { + cast_type = (TYPE_UNSIGNED (arg0_type) + ? long_unsigned_type_node : long_integer_type_node); + fcodei = fcodel; + } + else + { + cast_type = (TYPE_UNSIGNED (arg0_type) + ? long_long_unsigned_type_node + : long_long_integer_type_node); + fcodei = fcodell; + } + } + else if (TYPE_PRECISION (arg0_type) <= MAX_FIXED_MODE_SIZE) + { + cast_type + = build_nonstandard_integer_type (MAX_FIXED_MODE_SIZE, + TYPE_UNSIGNED (arg0_type)); + gcc_assert (TYPE_PRECISION (cast_type) + == 2 * TYPE_PRECISION (long_long_unsigned_type_node)); + fcodei = END_BUILTINS; + } + else + fcodei = END_BUILTINS; + if (cast_type) + { + switch (fcode) + { + case BUILT_IN_CLZG: + case BUILT_IN_CLRSBG: + addend = TYPE_PRECISION (arg0_type) - TYPE_PRECISION (cast_type); + break; + default: + break; + } + arg0 = fold_convert (cast_type, arg0); + arg0_type = cast_type; + } + + if (arg1) + arg1 = fold_convert (integer_type_node, arg1); + + tree arg2 = arg1; + if (fcode == BUILT_IN_CLZG && addend) + { + if (arg1) + arg0 = save_expr (arg0); + arg2 = NULL_TREE; + } + tree call = NULL_TREE, tem; + if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE + && (TYPE_PRECISION (arg0_type) + == 2 * TYPE_PRECISION (long_long_unsigned_type_node))) + { + /* __int128 expansions using up to 2 long long builtins. */ + arg0 = save_expr (arg0); + tree type = (TYPE_UNSIGNED (arg0_type) + ? long_long_unsigned_type_node + : long_long_integer_type_node); + tree hi = fold_build2 (RSHIFT_EXPR, arg0_type, arg0, + build_int_cst (integer_type_node, + MAX_FIXED_MODE_SIZE / 2)); + hi = fold_convert (type, hi); + tree lo = fold_convert (type, arg0); + switch (fcode) + { + case BUILT_IN_CLZG: + call = fold_builtin_bit_query (loc, fcode, lo, NULL_TREE); + call = fold_build2 (PLUS_EXPR, integer_type_node, call, + build_int_cst (integer_type_node, + MAX_FIXED_MODE_SIZE / 2)); + if (arg2) + call = fold_build3 (COND_EXPR, integer_type_node, + fold_build2 (NE_EXPR, boolean_type_node, + lo, build_zero_cst (type)), + call, arg2); + call = fold_build3 (COND_EXPR, integer_type_node, + fold_build2 (NE_EXPR, boolean_type_node, + hi, build_zero_cst (type)), + fold_builtin_bit_query (loc, fcode, hi, + NULL_TREE), + call); + break; + case BUILT_IN_CTZG: + call = fold_builtin_bit_query (loc, fcode, hi, NULL_TREE); + call = fold_build2 (PLUS_EXPR, integer_type_node, call, + build_int_cst (integer_type_node, + MAX_FIXED_MODE_SIZE / 2)); + if (arg2) + call = fold_build3 (COND_EXPR, integer_type_node, + fold_build2 (NE_EXPR, boolean_type_node, + hi, build_zero_cst (type)), + call, arg2); + call = fold_build3 (COND_EXPR, integer_type_node, + fold_build2 (NE_EXPR, boolean_type_node, + lo, build_zero_cst (type)), + fold_builtin_bit_query (loc, fcode, lo, + NULL_TREE), + call); + break; + case BUILT_IN_CLRSBG: + tem = fold_builtin_bit_query (loc, fcode, lo, NULL_TREE); + tem = fold_build2 (PLUS_EXPR, integer_type_node, tem, + build_int_cst (integer_type_node, + MAX_FIXED_MODE_SIZE / 2)); + tem = fold_build3 (COND_EXPR, integer_type_node, + fold_build2 (LT_EXPR, boolean_type_node, + fold_build2 (BIT_XOR_EXPR, type, + lo, hi), + build_zero_cst (type)), + build_int_cst (integer_type_node, + MAX_FIXED_MODE_SIZE / 2 - 1), + tem); + call = fold_builtin_bit_query (loc, fcode, hi, NULL_TREE); + call = save_expr (call); + call = fold_build3 (COND_EXPR, integer_type_node, + fold_build2 (NE_EXPR, boolean_type_node, + call, + build_int_cst (integer_type_node, + MAX_FIXED_MODE_SIZE + / 2 - 1)), + call, tem); + break; + case BUILT_IN_FFSG: + call = fold_builtin_bit_query (loc, fcode, hi, NULL_TREE); + call = fold_build2 (PLUS_EXPR, integer_type_node, call, + build_int_cst (integer_type_node, + MAX_FIXED_MODE_SIZE / 2)); + call = fold_build3 (COND_EXPR, integer_type_node, + fold_build2 (NE_EXPR, boolean_type_node, + hi, build_zero_cst (type)), + call, integer_zero_node); + call = fold_build3 (COND_EXPR, integer_type_node, + fold_build2 (NE_EXPR, boolean_type_node, + lo, build_zero_cst (type)), + fold_builtin_bit_query (loc, fcode, lo, + NULL_TREE), + call); + break; + case BUILT_IN_PARITYG: + call = fold_builtin_bit_query (loc, fcode, + fold_build2 (BIT_XOR_EXPR, type, + lo, hi), NULL_TREE); + break; + case BUILT_IN_POPCOUNTG: + call = fold_build2 (PLUS_EXPR, integer_type_node, + fold_builtin_bit_query (loc, fcode, hi, + NULL_TREE), + fold_builtin_bit_query (loc, fcode, lo, + NULL_TREE)); + break; + default: + gcc_unreachable (); + } + } + else + { + /* Only keep second argument to IFN_CLZ/IFN_CTZ if it is the + value defined at zero during GIMPLE, or for large/huge _BitInt + (which are then lowered during bitint lowering). */ + if (arg2 && TREE_CODE (TREE_TYPE (arg0)) != BITINT_TYPE) + { + int val; + if (fcode == BUILT_IN_CLZG) + { + if (CLZ_DEFINED_VALUE_AT_ZERO (SCALAR_TYPE_MODE (arg0_type), + val) != 2 + || wi::to_widest (arg2) != val) + arg2 = NULL_TREE; + } + else if (CTZ_DEFINED_VALUE_AT_ZERO (SCALAR_TYPE_MODE (arg0_type), + val) != 2 + || wi::to_widest (arg2) != val) + arg2 = NULL_TREE; + if (!direct_internal_fn_supported_p (ifn, arg0_type, + OPTIMIZE_FOR_BOTH)) + arg2 = NULL_TREE; + } + if (fcodei == END_BUILTINS || arg2) + call = build_call_expr_internal_loc (loc, ifn, integer_type_node, + arg2 ? 2 : 1, arg0, arg2); + else + call = build_call_expr_loc (loc, builtin_decl_explicit (fcodei), 1, + arg0); + } + if (addend) + call = fold_build2 (PLUS_EXPR, integer_type_node, call, + build_int_cst (integer_type_node, addend)); + if (arg1 && arg2 == NULL_TREE) + call = fold_build3 (COND_EXPR, integer_type_node, + fold_build2 (NE_EXPR, boolean_type_node, + arg0, build_zero_cst (arg0_type)), + call, arg1); + + return call; +} + /* Fold __builtin_{add,sub}c{,l,ll} into pair of internal functions that return both result of arithmetics and overflowed boolean flag in a complex integer result. */ @@ -9824,6 +10089,14 @@ fold_builtin_1 (location_t loc, tree expr, tree fndecl, tree arg0) return build_empty_stmt (loc); break; + case BUILT_IN_CLZG: + case BUILT_IN_CTZG: + case BUILT_IN_CLRSBG: + case BUILT_IN_FFSG: + case BUILT_IN_PARITYG: + case BUILT_IN_POPCOUNTG: + return fold_builtin_bit_query (loc, fcode, arg0, NULL_TREE); + default: break; } @@ -9913,6 +10186,10 @@ fold_builtin_2 (location_t loc, tree expr, tree fndecl, tree arg0, tree arg1) case BUILT_IN_ATOMIC_IS_LOCK_FREE: return fold_builtin_atomic_is_lock_free (arg0, arg1); + case BUILT_IN_CLZG: + case BUILT_IN_CTZG: + return fold_builtin_bit_query (loc, fcode, arg0, arg1); + default: break; } |