Age | Commit message (Collapse) | Author | Files | Lines |
|
ieee_quad}_format
With some historical reasons, rs6000 defines KFmode, TFmode
and IFmode to have different mode precisions, but it causes
some issues and needs some workarounds such as PR112993.
So we are going to make all rs6000 128 bit scalar FP modes
have 128 bit precision. Be prepared for that, this patch
is to make function convert_mode_scalar allow same precision
FP modes conversion if their underlying formats are
ibm_extended_format and ieee_quad_format respectively, just
like the existing special treatment on arm_bfloat_half_format
<-> ieee_half_format. It also factors out all the relevant
checks into a lambda function. Besides, similar to ieee fp16
-> bfloat conversion, it adopts trunc_optab rather than
sext_optab for ibm128 to ieee128 conversion.
PR target/112993
gcc/ChangeLog:
* expr.cc (convert_mode_scalar): Allow same precision conversion
between scalar floating point modes if whose underlying format is
ibm_extended_format or ieee_quad_format, and refactor assertion
with new lambda function acceptable_same_precision_modes. Use
trunc_optab rather than sext_optab for ibm128 to ieee128 conversion.
* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Use trunc_optab rather
than sext_optab for ibm128 to ieee128 conversion.
|
|
This patch makes target-independent code use force_lowpart_subreg
instead of simplify_gen_subreg and lowpart_subreg in some places.
The criteria were:
(1) The code is obviously specific to expand (where new pseudos
can be created), or at least would be invalid to call when
!can_create_pseudo_p () and temporaries are needed.
(2) The value is obviously an rvalue rather than an lvalue.
Doing this should reduce the likelihood of bugs like PR115464
occuring in other situations.
gcc/
* builtins.cc (expand_builtin_issignaling): Use force_lowpart_subreg
instead of simplify_gen_subreg and lowpart_subreg.
* expr.cc (convert_mode_scalar, expand_expr_real_2): Likewise.
* optabs.cc (expand_doubleword_mod): Likewise.
|
|
This patch makes target-independent code use force_subreg instead
of simplify_gen_subreg in some places. The criteria were:
(1) The code is obviously specific to expand (where new pseudos
can be created), or at least would be invalid to call when
!can_create_pseudo_p () and temporaries are needed.
(2) The value is obviously an rvalue rather than an lvalue.
(3) The offset wasn't a simple lowpart or highpart calculation;
a later patch will deal with those.
Doing this should reduce the likelihood of bugs like PR115464
occuring in other situations.
gcc/
* expmed.cc (store_bit_field_using_insv): Use force_subreg
instead of simplify_gen_subreg.
(store_bit_field_1): Likewise.
(extract_bit_field_as_subreg): Likewise.
(extract_integral_bit_field): Likewise.
(emit_store_flag_1): Likewise.
* expr.cc (convert_move): Likewise.
(convert_modes): Likewise.
(emit_group_load_1): Likewise.
(emit_group_store): Likewise.
(expand_assignment): Likewise.
|
|
expand_widen_pattern_expr [PR113212]
While working on an expand patch back in January I noticed that
the first argument (of sepops type) of expand_expr_real_2 could be
constified as it was not to be touched by the function (nor should it be).
There is code in internal-fn.cc that depends on expand_expr_real_2 not touching
the ops argument so constification makes this more obvious.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR middle-end/113212
* expr.h (const_seqpops): New typedef.
(expand_expr_real_2): Constify the first argument.
* optabs.cc (expand_widen_pattern_expr): Likewise.
* optabs.h (expand_widen_pattern_expr): Likewise.
* expr.cc (expand_expr_real_2): Likewise
(do_store_flag): Likewise. Remove incorrect store to ops->code.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
In cfgexpand, there is an optimization for branch which tests
targetm.gen_ccmp_first == NULL. However for target like x86-64, the
hook was implemented but it does not indicate that ccmp was enabled.
Add a new target hook TARGET_HAVE_CCMP and replace the middle-end
check for the existance of gen_ccmp_first to avoid misoptimization.
gcc/ChangeLog:
PR target/115370
PR target/115463
* target.def (have_ccmp): New target hook.
* targhooks.cc (default_have_ccmp): New function.
* targhooks.h (default_have_ccmp): New prototype.
* doc/tm.texi.in: Add TARGET_HAVE_CCMP.
* doc/tm.texi: Regenerate.
* cfgexpand.cc (expand_gimple_cond): Call targetm.have_ccmp
instead of checking if targetm.gen_ccmp_first exists.
* expr.cc (expand_expr_real_gassign): Likewise.
* config/i386/i386.cc (ix86_have_ccmp): New target hook to
check if APX_CCMP enabled.
(TARGET_HAVE_CCMP): Define.
|
|
Make clear_by_pieces() available to other parts of the compiler,
similar to store_by_pieces().
gcc/ChangeLog:
* expr.cc (clear_by_pieces): Remove static from clear_by_pieces.
* expr.h (clear_by_pieces): Add prototype for clear_by_pieces.
|
|
The HF and BF modes have the same size/precision and neither is
a subset nor superset of the other.
So, using either __extendhfbf2 or __trunchfbf2 is weird.
The expansion apparently emits __extendhfbf2, but on the libgcc side
we apparently have __trunchfbf2 implemented.
I think it is easier to switch to using what is available rather than
adding new entrypoints to libgcc, even alias, because this is backportable.
2024-05-07 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114907
* expr.cc (convert_mode_scalar): Use trunc_optab rather than
sext_optab for HF->BF conversions.
* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Likewise.
* gcc.dg/pr114907.c: New test.
|
|
Another patch from eyeballing
git grep -v 'long long\|optab optab\|template template\|double double' | grep ' \([a-zA-Z]\+\) \1 '
output, this time in gcc/ subdirectory.
2024-04-09 Jakub Jelinek <jakub@redhat.com>
gcc/
* expr.cc (convert_mode_scalar): Fix duplicated words in comment;
into into -> it into.
* function.h (function::cond_uids): Fix duplicated words in comment;
same same -> same.
* config/riscv/riscv-vector-costs.cc
(costs::adjust_vect_cost_per_loop): Fix duplicated words in comment;
model model -> model.
* config/riscv/riscv-vector-builtins-shapes.cc (build_base): Fix
duplicated words in comment; for for -> for.
* config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Fix
duplicated words in comment; more more -> more.
* config/aarch64/driver-aarch64.cc (host_detect_local_cpu): Fix
duplicated words in comment; be be -> be.
* tree-profile.cc (masking_vectors): Fix duplicated words in comment;
has has -> has, the the -> the.
* value-range.cc (irange::set_range_from_bitmask): Fix duplicated
words in comment; the the -> the.
* gcov.cc (add_condition_counts): Fix duplicated words in comment;
to to -> to.
* vr-values.cc (get_scev_info): Fix duplicated words in comment;
the the -> to the.
* tree-vrp.cc (fully_replaceable): Fix duplicated words in comment;
by by -> by.
* mode-switching.cc (single_succ_confluence_n): Fix duplicated words
in comment; the the -> the.
* tree-ssa-phiopt.cc (value_replacement): Fix duplicated words in
comment; can can -> we can.
* gimple-range-phi.cc (phi_analyzer::process_phi): Fix duplicated words
in comment; it it -> it is.
* tree-ssa-sccvn.cc (visit_phi): Fix duplicated words in comment;
to to -> to.
* rtl-ssa/accesses.h (use_info::next_debug_insn_use): Fix duplicated
words in comment; if if -> if.
* doc/options.texi (InverseMask): Fix duplicated words; and and -> and.
Change take to takes.
* doc/invoke.texi (fanalyzer-undo-inlining): Fix duplicated words;
be be -> be.
(-minline-memops-threshold): Likewise.
gcc/analyzer/
* analyzer.opt (Wanalyzer-undefined-behavior-strtok): Fix duplicated
words; in in -> in.
* program-state.cc (sm_state_map::replay_call_summary): Fix duplicated
words in comment; to to -> to.
(program_state::replay_call_summary): Likewise.
* region-model.cc (region_model::replay_call_summary): Likewise.
gcc/c/
* c-decl.cc (previous_tag): Fix duplicated words in comment; the the
-> the.
(diagnose_mismatched_decls): Fix duplicated words in comment;
about about -> about.
gcc/cp/
* constexpr.cc (build_new_constexpr_heap_type): Fix duplicated words
in comment; is is -> is.
* cp-tree.def (CO_RETURN_EXPR): Fix duplicated words in comment;
for for -> for.
* parser.cc (fixup_blocks_walker): Fix duplicated words in comment;
is is -> is.
* semantics.cc (fixup_template_type): Fix duplicated words in comment;
for for -> for.
(finish_omp_for): Fix duplicated words in comment; the the -> the.
* pt.cc (more_specialized_fn): Fix duplicated words in comment;
think think -> think.
(type_targs_deducible_from): Fix duplicated words in comment; the the
-> the.
gcc/jit/
* docs/topics/expressions.rst (Constructor expressions): Fix
duplicated words; have have -> have.
|
|
r13-990 added optimizations in multiple spots to optimize during
expansion storing of constant initializers into targets.
In the load_register_parameters and expand_expr_real_1 cases,
it checks it has a tree as the source and so knows we are reading
that whole decl's value, so the code is fine as is, but in the
emit_push_insn case it checks for a MEM from which something
is pushed and checks for SYMBOL_REF as the MEM's address, but
still assumes the whole object is copied, which as the following
testcase shows might not always be the case. In the testcase,
k is 6 bytes, then 2 bytes of padding, then another 4 bytes,
while the emit_push_insn wants to store just the 6 bytes.
The following patch simply verifies it is the whole initializer
that is being stored, I think that is best thing to do so late
in GCC 14 cycle as well for backporting.
For GCC 15, perhaps the code could stop requiring it must be at offset zero,
nor that the size is equal, but could use
get_symbol_constant_value/fold_ctor_reference gimple-fold APIs to actually
extract just part of the initializer if we e.g. push just some subset
(of course, still verify that it is a subset). For sizes which are power
of two bytes and we have some integer modes, we could use as type for
fold_ctor_reference corresponding integral types, otherwise dunno, punt
or use some structure (e.g. try to find one in the initializer?), whatever.
But even in the other spots it could perhaps handle loading of
COMPONENT_REFs or MEM_REFs from the .rodata vars.
2024-04-03 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114552
* expr.cc (emit_push_insn): Only use store_constructor for
immediate_const_ctor_p if int_expr_size matches size.
* gcc.c-torture/execute/pr114552.c: New test.
|
|
The x86-64 and aarch64 psABIs (and the unwritten ia64 psABI part) say that
the padding bits of _BitInt are undefined, while the expansion internally
typically assumes that non-mode precision integers are sign/zero extended
and extends after operations. We handle that mismatch with EXTEND_BITINT
done when reading from untrusted sources like function arguments, reading
_BitInt from memory etc. but otherwise keep relying on stuff being extended
internally (say in pseudos).
The return value of a function is an ABI boundary though too and we need
to extend that too.
2024-03-15 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114332
* expr.cc (expand_expr_real_1): EXTEND_BITINT also CALL_EXPR results.
|
|
The masks and bitvectors were broken when nunits==32 on hosts where int is
32-bit.
gcc/ChangeLog:
* dojump.cc (do_compare_and_jump): Use full-width integers for shifts.
* expr.cc (store_constructor): Likewise.
(do_store_flag): Likewise.
|
|
The following testcase ICEs, because the REDUCE_BIT_FIELD macro uses
the target variable implicitly:
#define REDUCE_BIT_FIELD(expr) (reduce_bit_field \
? reduce_to_bit_field_precision ((expr), \
target, \
type) \
: (expr))
and so when the code below reuses the target variable, documented to be
The value may be stored in TARGET if TARGET is nonzero.
TARGET is just a suggestion; callers must assume that
the rtx returned may not be the same as TARGET.
for something unrelated (the value that should be returned), this misbehaves
(in the testcase target is set to a CONST_INT, which has VOIDmode and
reduce_to_bit_field_precision assert checking doesn't like that).
Needed to say that
If TARGET is CONST0_RTX, it means that the value will be ignored.
but in expand_expr_real_2 does at the start:
ignore = (target == const0_rtx
|| ((CONVERT_EXPR_CODE_P (code)
|| code == COND_EXPR || code == VIEW_CONVERT_EXPR)
&& TREE_CODE (type) == VOID_TYPE));
/* We should be called only if we need the result. */
gcc_assert (!ignore);
- so such target is mainly meant for calls and the like in other routines.
Certainly doesn't expect that target changes from not being ignored
initially to ignore later on and other CONST_INT results as well as anything
which is not an object into which anything can be stored.
So, the following patch fixes that by using a more appripriate temporary
for the result, which other code is using.
2024-02-23 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114054
* expr.cc (expand_expr_real_2) <case MULT_EXPR>: Use
temp variable instead of target parameter for result.
* gcc.dg/bitint-92.c: New test.
|
|
The following zeros paddings of vector bools when expanding compares
and the mode used for the compare is an integer mode. In that case
targets cannot distinguish between a 4 element and 8 element vector
compare (both get to the QImode compare optab) so we have to do the
job in the middle-end.
PR middle-end/113576
* expr.cc (do_store_flag): For vector bool compares of vectors
with padding zero that.
* dojump.cc (do_compare_and_jump): Likewise.
|
|
INTEGER_TYPEs [PR113783]
On the following testcases memcpy lowering folds the calls to
reading and writing of MEM_REFs with huge INTEGER_TYPEs - uint256_t
with OImode or uint512_t with XImode. Further optimization turn
the load from MEM_REF from the large/huge _BitInt var into VIEW_CONVERT_EXPR
from it to the uint256_t/uint512_t. The backend doesn't really
support those except for "movoi"/"movxi" insns, so it isn't possible
to handle it like casts to supportable INTEGER_TYPEs where we can
construct those from individual limbs - there are no OImode/XImode shifts
and the like we can use.
So, the following patch makes sure for such VCEs that the SSA_NAME operand
of the VCE lives in memory and then turns it into a VIEW_CONVERT_EXPR so
that we actually load the OImode/XImode integer from memory (i.e. a mov).
We need to make sure those aren't merged with other
operations in the gimple_lower_bitint hunks.
For SSA_NAMEs which have underlying VAR_DECLs that is all we need, those
VAR_DECL have ARRAY_TYPEs.
For SSA_NAMEs which have underlying PARM_DECLs or RESULT_DECLs those have
BITINT_TYPE and I had to tweak expand_expr_real_1 for that so that it
doesn't try convert_modes on those when one of the modes is BLKmode - we
want to fall through into the adjust_address on the MEM.
2024-02-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113783
* gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Look
through VIEW_CONVERT_EXPR for final cast checks. Handle
VIEW_CONVERT_EXPRs from large/huge _BitInt to > MAX_FIXED_MODE_SIZE
INTEGER_TYPEs.
(gimple_lower_bitint): Don't merge mergeable operations or other
casts with VIEW_CONVERT_EXPRs to > MAX_FIXED_MODE_SIZE INTEGER_TYPEs.
* expr.cc (expand_expr_real_1): Don't use convert_modes if either
mode is BLKmode.
* gcc.dg/bitint-88.c: New test.
|
|
The following implements storing to a non-MEM_P with a variable
offset. We usually avoid this by forcing expansion to memory but
this doesn't work for hard register variables. The solution is
to spill and operate on the stack.
PR middle-end/113622
* expr.cc (expand_assignment): Spill hard registers if
we index them with a variable offset.
* gcc.target/i386/pr113622-2.c: New testcase.
* gcc.target/i386/pr113622-3.c: Likewise.
|
|
Ccmp is not used if the result of the and/ior is used by both
a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
here by using ccmp in this case.
Two changes is required, first we need to allow the outer statement's
result be used more than once.
The second change is that during the expansion of the gimple, we need
to try using ccmp. This is needed because we don't use expand the ssa
name of the lhs but rather expand directly from the gimple.
A small note on the ccmp_4.c testcase, we should be able to get slightly
better than with this patch but it is one extra instruction compared to
before.
PR target/100942
gcc/ChangeLog:
* ccmp.cc (ccmp_candidate_p): Add outer argument.
Allow if the outer is true and the lhs is used more
than once.
(expand_ccmp_expr): Update call to ccmp_candidate_p.
* expr.h (expand_expr_real_gassign): Declare.
* expr.cc (expand_expr_real_gassign): New function, split out from...
(expand_expr_real_1): ...here.
* cfgexpand.cc (expand_gimple_stmt_1): Use expand_expr_real_gassign.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ccmp_3.c: New test.
* gcc.target/aarch64/ccmp_4.c: New test.
* gcc.target/aarch64/ccmp_5.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>
|
|
>> emit_library_call_value_1 calls emit_push_insn with NULL_TREE
>> for TYPE. Sometimes emit_push_insn needs to assign a temp with
>> that TYPE, which causes a segfault.
>>
>> Fixed by computing the TYPE from MODE when needed.
>>
>> Original patch by Thorsten Otto.
>>
[ ... ]
> This really needs to happen in the two call paths which pass in
> NULL_TREE for the type. Note how the type is used to determine padding
> earlier in emit_push_insn. That would also make the code more
> consistent with the comment before emit_push_insn which implies that
> both MODE and TYPE are valid.
>
>
> Additionally you should bootstrap and regression test this patch on at
> least one target.
Updated as requested, and bootstrapped and tested on
{x86_64,aarch64,m68k}-linux-gnu without regressions.
gcc/
PR target/82420
PR target/111279
* calls.cc (emit_library_call_value_1): Pass valid TYPE
to emit_push_insn.
* expr.cc (emit_push_insn): Likewise.
gcc/testsuite/
PR target/82420
* gcc.target/m68k/pr82420.c: New test.
Co-authored-by: Thorsten Otto <admin@tho-otto.de>
|
|
On aarch64 the backend decides to use non-BLKmode for some arrays
like unsigned long[4] - OImode in that case, but the corresponding
BITINT_TYPEs have BLKmode (like structures containing that many limb
elements).
This later causes ICEs durring expansion when expanding VIEW_CONVERT_EXPR
from non-BLKmode VAR_DECL to BLKmode BITINT_TYPE.
The following fix contains two parts, the discover_nonconstant_array_refs_r
is make sure we force such variables into memory and the expand_expr_real_1
change makes sure we don't try to extract a bitfield or something similar
which doesn't really work for BLKmode - as op0 is a MEM, all we need is
the op0 = adjust_address (op0, mode, 0); at the end to change the MEM's mode
to BLKmode.
2024-01-19 Jakub Jelinek <jakub@redhat.com>
Richard Biener <rguenther@suse.de>
* cfgexpand.cc (discover_nonconstant_array_refs_r): Force non-BLKmode
VAR_DECLs referenced in BLKmode VIEW_CONVERT_EXPRs into memory.
* expr.cc (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: Do nothing
but adjust_address also for BLKmode mode and MEM op0.
|
|
The problem here is after the recent vectorizer improvements, we end up
with a comparison against a vector bool 0 which then tries expand_single_bit_test
which is not expecting vector comparisons at all.
The IR was:
vector(4) <signed-boolean:1> mask_patt_5.13;
_Bool _12;
mask_patt_5.13_44 = vect_perm_even_41 != { 0.0, 1.0e+0, 2.0e+0, 3.0e+0 };
_12 = mask_patt_5.13_44 == { 0, 0, 0, 0 };
and we tried to call expand_single_bit_test for the last comparison.
Rejecting the vector comparison is needed.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR middle-end/113322
gcc/ChangeLog:
* expr.cc (do_store_flag): Don't try single bit tests with
comparison on vector types.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr113322-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The optimization to expand uniform boolean vectors by sign-extension
works only for dense masks but it failed to check that.
PR middle-end/112740
* expr.cc (store_constructor): Check the integer vector
mask has a single bit per element before using sign-extension
to expand an uniform vector.
* gcc.dg/pr112740.c: New testcase.
|
|
This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
the fields are written into a pseudo register that needs to be kept sign
extended.
The motivating example from the bugzilla PR is:
extern void ext(int);
void foo(const unsigned char *buf) {
int val;
((unsigned char*)&val)[0] = *buf++;
((unsigned char*)&val)[1] = *buf++;
((unsigned char*)&val)[2] = *buf++;
((unsigned char*)&val)[3] = *buf++;
if(val > 0)
ext(1);
else
ext(0);
}
which at the end of the tree optimization passes looks like:
void foo (const unsigned char * buf)
{
int val;
unsigned char _1;
unsigned char _2;
unsigned char _3;
unsigned char _4;
int val.5_5;
<bb 2> [local count: 1073741824]:
_1 = *buf_7(D);
MEM[(unsigned char *)&val] = _1;
_2 = MEM[(const unsigned char *)buf_7(D) + 1B];
MEM[(unsigned char *)&val + 1B] = _2;
_3 = MEM[(const unsigned char *)buf_7(D) + 2B];
MEM[(unsigned char *)&val + 2B] = _3;
_4 = MEM[(const unsigned char *)buf_7(D) + 3B];
MEM[(unsigned char *)&val + 3B] = _4;
val.5_5 = val;
if (val.5_5 > 0)
goto <bb 3>; [59.00%]
else
goto <bb 4>; [41.00%]
<bb 3> [local count: 633507681]:
ext (1);
goto <bb 5>; [100.00%]
<bb 4> [local count: 440234144]:
ext (0);
<bb 5> [local count: 1073741824]:
val ={v} {CLOBBER(eol)};
return;
}
Here four bytes are being sequentially written into the SImode value
val. On some platforms, such as MIPS64, this SImode value is kept in
a 64-bit register, suitably sign-extended. The function expand_assignment
contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264
in expr.cc) which outputs an explicit extension operation after each
store_field (typically insv) to such promoted/extended pseudos.
The first observation is that there's no need to perform sign extension
after each byte in the example above; the extension is only required
after changes to the most significant byte (i.e. to a field that overlaps
the most significant bit).
The bug fix is actually a bit more subtle, but at this point during
code expansion it's not safe to use a SUBREG when sign-extending this
field. Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0))
but combine (and other RTL optimizers) later realize that because SImode
values are always sign-extended in their 64-bit hard registers that
this is a no-op and eliminates it. The trouble is that it's unsafe to
refer to the SImode lowpart of a 64-bit register using SUBREG at those
critical points when temporarily the value isn't correctly sign-extended,
and the usual backend invariants don't hold. At these critical points,
the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a
TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like
(sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem.
2024-01-04 Roger Sayle <roger@nextmovesoftware.com>
Jeff Law <jlaw@ventanamicro.com>
gcc/ChangeLog
PR rtl-optimization/104914
* expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P
a sign or zero extension is only required if the modified field
overlaps the SUBREG's most significant bit. On MODE_REP_EXTENDED
targets, don't refer to the temporarily incorrectly extended value
using a SUBREG, but instead generate an explicit TRUNCATE rtx.
|
|
|
|
smallest_int_mode_for_size may abort when the requested mode is not
available. Call int_mode_for_size instead, that signals the
unsatisfiable request in a more graceful way.
for gcc/ChangeLog
PR middle-end/112784
* expr.cc (emit_block_move_via_loop): Call int_mode_for_size
for maybe-too-wide sizes.
(emit_block_cmp_via_loop): Likewise.
for gcc/testsuite/ChangeLog
PR middle-end/112784
* gcc.target/i386/avx512cd-inline-stringops-pr112784.c: New.
|
|
After r14-1655-g52c92fb3f40050 (and the other commits
which touch zero_one_valued_p), we end up with a with
`bool * a` but where the bool is an SSA name that might not
have non-zero bits set on it (to 0x1) even though it
does the non-zero bits would be 0x1.
The case of coremarks, it is only phiopt4 which adds the new
ssa name and nothing afterwards updates the nonzero bits on it.
This fixes the regression by using gimple_zero_one_valued_p
rather than tree_nonzero_bits to match the cases where the
SSA_NAME didn't have the non-zero bits set.
gimple_zero_one_valued_p handles one level of cast and also
and an `&`.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR middle-end/112935
* expr.cc (expand_expr_real_2): Use
gimple_zero_one_valued_p instead of tree_nonzero_bits
to find boolean defined expressions.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The following testcaser ICEs during gimplification, because
count_type_elements doesn't handle BITINT_TYPE. It should handle it like
other integral types.
2023-12-07 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112881
* expr.cc (count_type_elements): Handle BITINT_TYPE like INTEGER_TYPE.
* gcc.dg/bitint-50.c: New test.
|
|
try_store_by_multiple_pieces was added not long ago, enabling
variable-sized memset to be expanded inline when the worst-case
in-range constant length would, using conditional blocks with powers
of two to cover all possibilities of length and alignment.
This patch introduces -finline-stringops[=fn] to request expansions to
start with a loop, so as to still take advantage of known alignment
even with long lengths, but without necessarily adding store blocks
for every power of two.
This makes it possible for the supported stringops (memset, memcpy,
memmove, memset) to be expanded, even if storing a single byte per
iteration. Surely efficient implementations can run faster, with a
pre-loop to increase alignment, but that would likely be excessive for
inline expansions.
Still, in some cases, such as in freestanding environments, users
prefer to inline such stringops, especially those that the compiler
may introduce itself, even if the expansion is not as performant as a
highly optimized C library implementation could be, to avoid
depending on a C runtime library.
for gcc/ChangeLog
* expr.cc (emit_block_move_hints): Take ctz of len. Obey
-finline-stringops. Use oriented or sized loop.
(emit_block_move): Take ctz of len, and pass it on.
(emit_block_move_via_sized_loop): New.
(emit_block_move_via_oriented_loop): New.
(emit_block_move_via_loop): Take incr. Move an incr-sized
block per iteration.
(emit_block_cmp_via_cmpmem): Take ctz of len. Obey
-finline-stringops.
(emit_block_cmp_via_loop): New.
* expr.h (emit_block_move): Add ctz of len defaulting to zero.
(emit_block_move_hints): Likewise.
(emit_block_cmp_hints): Likewise.
* builtins.cc (expand_builtin_memory_copy_args): Pass ctz of
len to emit_block_move_hints.
(try_store_by_multiple_pieces): Support starting with a loop.
(expand_builtin_memcmp): Pass ctz of len to
emit_block_cmp_hints.
(expand_builtin): Allow inline expansion of memset, memcpy,
memmove and memcmp if requested.
* common.opt (finline-stringops): New.
(ilsop_fn): New enum.
* flag-types.h (enum ilsop_fn): New.
* doc/invoke.texi (-finline-stringops): Add.
for gcc/testsuite/ChangeLog
* gcc.dg/torture/inline-mem-cmp-1.c: New.
* gcc.dg/torture/inline-mem-cpy-1.c: New.
* gcc.dg/torture/inline-mem-cpy-cmp-1.c: New.
* gcc.dg/torture/inline-mem-move-1.c: New.
* gcc.dg/torture/inline-mem-set-1.c: New.
|
|
The by pieces compare can be implemented by overlapped operations. So
it should be taken into consideration when doing the adjustment for
overlap operations. The mode returned from
widest_fixed_size_mode_for_size is already checked with mov_optab in
by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size.
So it is no need to check mov_optab again in by_pieces_ninsns. The
patch fixes these issues.
gcc/
* expr.cc (by_pieces_ninsns): Include by pieces compare when
do the adjustment for overlap operations. Replace mov_optab
checks with gcc assertion.
|
|
As the following testcase shows, we ICE when trying to emit ADDR_EXPR of
a bitint variable which doesn't have mode width.
The problem is in the EXTEND_BITINT stuff which makes sure we treat the
padding bits on memory reads from user bitint vars as undefined.
When expanding ADDR_EXPR on such vars inside outside of initializers,
expand_expr_addr* uses EXPAND_CONST_ADDRESS modifier and EXTEND_BITINT
does nothing, but in initializers it keeps using EXPAND_INITIALIZER
modifier. So, we need to treat EXPAND_INITIALIZER the same as
EXPAND_CONST_ADDRESS for this regard.
2023-11-23 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112336
* expr.cc (EXTEND_BITINT): Don't call reduce_to_bit_field_precision
if modifier is EXPAND_INITIALIZER.
* gcc.dg/bitint-41.c: New test.
|
|
Most targets have an "and" instructions for their vector mask size, but RISC-V
only has DImode "and". Fixed by allowing wider instruction modes.
gcc/ChangeLog:
PR target/112481
* expr.cc (store_constructor): Use OPTAB_WIDEN for mask adjustment.
|
|
AVX ignores any excess bits in the mask (at least for vector sizes >=8), but
AMD GCN magically uses a larger vector than was intended (the smaller sizes are
"fake"), leading to wrong-code.
This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c,
gfortran.dg/c-interop/contiguous-1.f90,
gfortran.dg/c-interop/ff-descriptor-7.f90, and others.
gcc/ChangeLog:
* expr.cc (store_constructor): Add "and" operation to uniform mask
generation.
|
|
The former patch (f08ca5903c7) examines the scalar modes by target
hook scalar_mode_supported_p. It causes some i386 regression cases
as XImode and OImode are not enabled in i386 target function. This
patch examines the scalar mode by checking if the corresponding optabs
are available for the mode.
gcc/
PR target/111449
* expr.cc (qi_vector_mode_supported_p): Rename to...
(by_pieces_mode_supported_p): ...this, and extends it to do
the checking for both scalar and vector mode.
(widest_fixed_size_mode_for_size): Call
by_pieces_mode_supported_p to examine the mode.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.
|
|
Vector mode compare instructions are efficient for equality compare on
rs6000. This patch refactors the codes of by pieces operation to enable
vector mode for compare.
gcc/
PR target/111449
* expr.cc (can_use_qi_vectors): New function to return true if
we know how to implement OP using vectors of bytes.
(qi_vector_mode_supported_p): New function to check if optabs
exists for the mode and certain by pieces operations.
(widest_fixed_size_mode_for_size): Replace the second argument
with the type of by pieces operations. Call can_use_qi_vectors
and qi_vector_mode_supported_p to do the check. Call
scalar_mode_supported_p to check if the scalar mode is supported.
(by_pieces_ninsns): Pass the type of by pieces operation to
widest_fixed_size_mode_for_size.
(class op_by_pieces_d): Remove m_qi_vector_mode. Add m_op to
record the type of by pieces operations.
(op_by_pieces_d::op_by_pieces_d): Change last argument to the
type of by pieces operations, initialize m_op with it. Pass
m_op to function widest_fixed_size_mode_for_size.
(op_by_pieces_d::get_usable_mode): Pass m_op to function
widest_fixed_size_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
can_use_qi_vectors and qi_vector_mode_supported_p to do the
check.
(op_by_pieces_d::run): Pass m_op to function
widest_fixed_size_mode_for_size.
(move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES.
(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
(can_store_by_pieces): Pass the type of by pieces operations to
widest_fixed_size_mode_for_size.
(clear_by_pieces): Initialize class store_by_pieces_d with
CLEAR_BY_PIECES.
(compare_by_pieces_d::compare_by_pieces_d): Set m_op to
COMPARE_BY_PIECES.
|
|
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670
where we would remove the `& CST` part if we ended up not calling
expand_single_bit_test.
This fixes the problem by introducing a new variable that will be used
for calling expand_single_bit_test.
As afar as I know this can only show up when disabling optimization
passes as this above form would have been optimized away.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR middle-end/111863
gcc/ChangeLog:
* expr.cc (do_store_flag): Don't over write arg0
when stripping off `& POW2`.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr111863-1.c: New test.
|
|
[target/111466]
RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)
Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.
RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.
Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.
The hunk being removed was introduced way back in 1994 as
5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")
This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.
| | # of unexpected case / # of unique unexpected case
| | gcc | g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/| 264 / 87 | 5 / 2 | 72 / 12 |
| lp64d/medlow
Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this 🙂
I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming 😉
Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.
```
foo2:
sext.w a6,a1 <-- this goes away
beq a1,zero,.L4
li a5,0
li a0,0
.L3:
addw a4,a2,a5
addw a5,a3,a5
addw a0,a4,a0
bltu a5,a6,.L3
ret
.L4:
li a0,0
ret
```
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com>
PR target/111466
gcc/
* expr.cc (expand_expr_real_2): Do not clear SUBREG_PROMOTED_VAR_P.
gcc/testsuite
* gcc.target/riscv/pr111466.c: New test.
|
|
poly_int was written before the switch to C++11 and so couldn't
use explicit default constructors. This led to an awkward split
between poly_int_pod and poly_int. poly_int simply inherited from
poly_int_pod and added constructors, with the argumentless constructor
having an empty body. But inheritance meant that poly_int had to
repeat the assignment operators from poly_int_pod (again, no C++11,
so no "using" to inherit base-class implementations).
All that goes away if we switch to using default constructors.
The main complication is ensuring that braced initialisation still
gives a constexpr, so that static variables can be initialised without
runtime code. The two problems here are:
(1) When initialising a poly_int<N, wide_int> with fewer than N
coefficients, the other coefficients need to be a zero of
the same precision as the explicit coefficients. This was
previously done in a for loop using wi::ints_for<...>::zero,
but C++11 constexpr constructors can't have function bodies.
The patch instead uses a series of delegated initialisers to
fill in the implicit coefficients.
(2) The initialisation in:
void f(int x) {
unsigned int foo {x};
}
produces the warning:
warning: narrowing conversion of 'x' from 'int' to 'unsigned int' [-Wnarrowing]
whereas:
void f(int x) {
unsigned int foo = x;
}
does not. So switching to direct initialisation of the coeffs array
would mean that:
poly_uin64_t x = 0;
would trigger a warning for using 0 rather than 0u. That seemed
overly pedantic, so the patch adds explicit casts to the constructor.
The complication is to do that without adding extra code to
wide-int versions. The patch uses a new init_cast type for that.
gcc/
* poly-int.h (poly_int_pod): Delete.
(poly_coeff_traits::init_cast): New type.
(poly_int_full, poly_int_hungry, poly_int_fullness): New structures.
(poly_int): Replace constructors that take 1 and 2 coefficients with
a general one that takes an arbitrary number of coefficients.
Delegate initialization to two new private constructors, one of
which uses the coefficients as-is and one of which adds an extra
zero of the appropriate type (and precision, where applicable).
(gt_ggc_mx, gt_pch_nx): Operate on poly_ints rather than poly_int_pods.
* poly-int-types.h (poly_uint16_pod, poly_int64_pod, poly_uint64_pod)
(poly_offset_int_pod, poly_wide_int_pod, poly_widest_int_pod): Delete.
* gengtype.cc (main): Don't register poly_int64_pod.
* calls.cc (initialize_argument_information): Use poly_int rather
than poly_int_pod.
(combine_pending_stack_adjustment_and_call): Likewise.
* config/aarch64/aarch64.cc (pure_scalable_type_info): Likewise.
* data-streamer.h (bp_unpack_poly_value): Likewise.
* dwarf2cfi.cc (struct dw_trace_info): Likewise.
(struct queued_reg_save): Likewise.
* dwarf2out.h (struct dw_cfa_location): Likewise.
* emit-rtl.h (struct incoming_args): Likewise.
(struct rtl_data): Likewise.
* expr.cc (get_bit_range): Likewise.
(get_inner_reference): Likewise.
* expr.h (get_bit_range): Likewise.
* fold-const.cc (split_address_to_core_and_offset): Likewise.
(ptr_difference_const): Likewise.
* fold-const.h (ptr_difference_const): Likewise.
* function.cc (try_fit_stack_local): Likewise.
(instantiate_new_reg): Likewise.
* function.h (struct expr_status): Likewise.
(struct args_size): Likewise.
* genmodes.cc (ZERO_COEFFS): Likewise.
(mode_size_inline): Likewise.
(mode_nunits_inline): Likewise.
(emit_mode_precision): Likewise.
(emit_mode_size): Likewise.
(emit_mode_nunits): Likewise.
* gimple-fold.cc (get_base_constructor): Likewise.
* gimple-ssa-store-merging.cc (struct symbolic_number): Likewise.
* inchash.h (class hash): Likewise.
* ipa-modref-tree.cc (modref_access_node::dump): Likewise.
* ipa-modref.cc (modref_access_analysis::merge_call_side_effects):
Likewise.
* ira-int.h (ira_spilled_reg_stack_slot): Likewise.
* lra-eliminations.cc (self_elim_offsets): Likewise.
* machmode.h (mode_size, mode_precision, mode_nunits): Likewise.
* omp-low.cc (omplow_simd_context): Likewise.
* pretty-print.cc (pp_wide_integer): Likewise.
* pretty-print.h (pp_wide_integer): Likewise.
* reload.cc (struct decomposition): Likewise.
* reload.h (struct reload): Likewise.
* reload1.cc (spill_stack_slot_width): Likewise.
(struct elim_table): Likewise.
(offsets_at): Likewise.
(init_eliminable_invariants): Likewise.
* rtl.h (union rtunion): Likewise.
(poly_int_rtx_p): Likewise.
(strip_offset): Likewise.
(strip_offset_and_add): Likewise.
* rtlanal.cc (strip_offset): Likewise.
* tree-dfa.cc (get_ref_base_and_extent): Likewise.
(get_addr_base_and_unit_offset_1): Likewise.
(get_addr_base_and_unit_offset): Likewise.
* tree-dfa.h (get_ref_base_and_extent): Likewise.
(get_addr_base_and_unit_offset_1): Likewise.
(get_addr_base_and_unit_offset): Likewise.
* tree-ssa-loop-ivopts.cc (struct iv_use): Likewise.
(strip_offset): Likewise.
* tree-ssa-sccvn.h (struct vn_reference_op_struct): Likewise.
* tree.cc (ptrdiff_tree_p): Likewise.
* tree.h (poly_int_tree_p): Likewise.
(ptrdiff_tree_p): Likewise.
(get_inner_reference): Likewise.
gcc/testsuite/
* gcc.dg/plugin/poly-int-tests.h (test_num_coeffs_extra): Use
poly_int rather than poly_int_pod.
|
|
c_readstr only operated on integer modes. It worked by reading
the source string into an array of HOST_WIDE_INTs, converting
that array into a wide_int, and from there to an rtx.
It's simpler to do this by building a target memory image and
using native_decode_rtx to convert that memory image into an rtx.
It avoids all the endianness shenanigans because both the string and
native_decode_rtx follow target memory order. It also means that the
function can handle all fixed-size modes, which simplifies callers
and allows vector modes to be used more widely.
gcc/
* builtins.h (c_readstr): Take a fixed_size_mode rather than a
scalar_int_mode.
* builtins.cc (c_readstr): Likewise. Build a local array of
bytes and use native_decode_rtx to get the rtx image.
(builtin_memcpy_read_str): Simplify accordingly.
(builtin_strncpy_read_str): Likewise.
(builtin_memset_read_str): Likewise.
(builtin_memset_gen_str): Likewise.
* expr.cc (string_cst_read_str): Likewise.
|
|
On Tue, Sep 19, 2023 at 05:50:59PM +0100, Richard Sandiford wrote:
> How about using MAX_FIXED_MODE_SIZE for things like this?
Seems like a good idea.
The following patch does that.
2023-09-20 Jakub Jelinek <jakub@redhat.com>
* match.pd ((x << c) >> c): Use MAX_FIXED_MODE_SIZE instead of
GET_MODE_PRECISION of TImode or DImode depending on whether
TImode is supported scalar mode.
* gimple-lower-bitint.cc (bitint_precision_kind): Likewise.
* expr.cc (expand_expr_real_1): Likewise.
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): Likewise.
* ubsan.cc (ubsan_encode_value, ubsan_type_descriptor): Likewise.
|
|
[PR102989]
On Thu, Sep 07, 2023 at 10:36:02AM +0200, Thomas Schwinge wrote:
> Minor comment/question: are we doing away with the property that
> 'assert'-like "calls" must not have side effects? Per 'gcc/system.h',
> this is "OK" for 'gcc_assert' for '#if ENABLE_ASSERT_CHECKING' or
> '#elif (GCC_VERSION >= 4005)' -- that is, GCC 4.5, which is always-true,
> thus the "offending" '#else' is never active. However, it's different
> for standard 'assert' and 'gcc_checking_assert', so I'm not sure if
> that's a good property for 'gcc_assert' only? For example, see also
> <https://gcc.gnu.org/PR6906> "warn about asserts with side effects", or
> recent <https://gcc.gnu.org/PR111144>
> "RFE: could -fanalyzer warn about assertions that have side effects?".
You're right, the
#define gcc_assert(EXPR) ((void)(0 && (EXPR)))
fallback definition is incompatible with the way I've used it, so for
--disable-checking built by non-GCC it would not work properly.
2023-09-07 Jakub Jelinek <jakub@redhat.com>
PR c/102989
* expr.cc (expand_expr_real_1): Don't call targetm.c.bitint_type_info
inside gcc_assert, as later code relies on it filling info variable.
* gimple-fold.cc (clear_padding_bitint_needs_padding_p,
clear_padding_type): Likewise.
* varasm.cc (output_constant): Likewise.
* fold-const.cc (native_encode_int, native_interpret_int): Likewise.
* stor-layout.cc (finish_bitfield_representative, layout_type):
Likewise.
* gimple-lower-bitint.cc (bitint_precision_kind): Likewise.
|
|
The following patch introduces the middle-end part of the _BitInt
support, a new BITINT_TYPE, handling it where needed, except the lowering
pass and sanitizer support.
2023-09-06 Jakub Jelinek <jakub@redhat.com>
PR c/102989
* tree.def (BITINT_TYPE): New type.
* tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define.
(NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include
BITINT_TYPE.
(BITINT_TYPE_P): Define.
(CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if
they have BITINT_TYPE type.
(tree_check6, tree_not_check6): New inline functions.
(any_integral_type_check): Include BITINT_TYPE.
(build_bitint_type): Declare.
* tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst,
build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal,
type_hash_canon): Handle BITINT_TYPE.
(bitint_type_cache): New variable.
(build_bitint_type): New function.
(signed_or_unsigned_type_for, verify_type_variant, verify_type):
Handle BITINT_TYPE.
(tree_cc_finalize): Free bitint_type_cache.
* builtins.cc (type_to_class): Handle BITINT_TYPE.
(fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE.
* cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE
INTEGER_CSTs.
* convert.cc (convert_to_pointer_1, convert_to_real_1,
convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE.
(convert_to_integer_1): Likewise. For BITINT_TYPE don't check
GET_MODE_PRECISION (TYPE_MODE (type)).
* doc/generic.texi (BITINT_TYPE): Document.
* doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New.
* doc/tm.texi: Regenerated.
* dwarf2out.cc (base_type_die, is_base_type, modified_type_die,
gen_type_die_with_usage): Handle BITINT_TYPE.
(rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or
handle those which fit into shwi.
* expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce
to bitfield precision reads from BITINT_TYPE vars, parameters or
memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into
memory.
* fold-const.cc (fold_convert_loc, make_range_step): Handle
BITINT_TYPE.
(extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE).
(native_encode_int, native_interpret_int, native_interpret_expr):
Handle BITINT_TYPE.
* gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE
to some other integral type or vice versa conversions non-useless.
* gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE.
(clear_padding_unit): Mention in comment that _BitInt types don't need
to fit either.
(clear_padding_bitint_needs_padding_p): New function.
(clear_padding_type_may_have_padding_p): Handle BITINT_TYPE.
(clear_padding_type): Likewise.
* internal-fn.cc (expand_mul_overflow): For unsigned non-mode
precision operands force pos_neg? to 1.
(expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT,
expand_BITINTTOFLOAT): New functions.
* internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT,
BITINTTOFLOAT): New internal functions.
* internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT,
expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare.
* match.pd (non-equality compare simplifications from fold_binary):
Punt if TYPE_MODE (arg1_type) is BLKmode.
* pretty-print.h (pp_wide_int): Handle printing of large precision
wide_ints which would buffer overflow digit_buffer.
* stor-layout.cc (finish_bitfield_representative): For bit-fields
with BITINT_TYPE, prefer representatives with precisions in
multiple of limb precision.
(layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode
element type and assert it is BITINT_TYPE.
* target.def (bitint_type_info): New C target hook.
* target.h (struct bitint_info): New type.
* targhooks.cc (default_bitint_type_info): New function.
* targhooks.h (default_bitint_type_info): Declare.
* tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE.
Handle printing large wide_ints which would buffer overflow
digit_buffer.
* tree-ssa-sccvn.cc: Include target.h.
(eliminate_dom_walker::eliminate_stmt): Punt for large/huge
BITINT_TYPE.
* tree-switch-conversion.cc (jump_table_cluster::emit): For more than
64-bit BITINT_TYPE subtract low bound from expression and cast to
64-bit integer type both the controlling expression and case labels.
* typeclass.h (enum type_class): Add bitint_type_class enumerator.
* varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs.
* vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather
than widest_int.
(simplify_using_ranges::simplify_internal_call_using_ranges): Use
unsigned_type_for rather than build_nonstandard_integer_type.
|
|
Small optimization to avoid testing modifier multiple times.
2023-08-10 Jakub Jelinek <jakub@redhat.com>
PR c/102989
* expr.cc (expand_expr_real_1) <case MEM_REF>: Add an early return for
EXPAND_WRITE or EXPAND_MEMORY modifiers to avoid testing it multiple
times.
|
|
This patch is one of a series of fixes for PR rtl-optimization/110587,
a compile-time regression with -O0, that attempts to address the underlying
cause. As noted previously, the pathological test case pr28071.c contains
a large number of useless register-to-register moves that can produce
quadratic behaviour (in LRA). These moves are generated during RTL
expansion in emit_group_load_1, where the middle-end attempts to simplify
the source before calling extract_bit_field. This is reasonable if the
source is a complex expression (from before the tree-ssa optimizers), or
a SUBREG, or a hard register, but it's not particularly useful to copy
a pseudo register into a new pseudo register. This patch eliminates that
redundancy.
The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains
777K lines, with this patch it contains 717K lines, i.e. saving about 60K
lines (admittedly of debugging text output, but it makes the point).
2023-07-28 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR middle-end/28071
PR rtl-optimization/110587
* expr.cc (emit_group_load_1): Simplify logic for calling
force_reg on ORIG_SRC, to avoid making a copy if the source
is already in a pseudo register.
|
|
gcc/ChangeLog:
* cselib.h (rtx_equal_for_cselib_1):
Change return type from int to bool.
(references_value_p): Ditto.
(rtx_equal_for_cselib_p): Ditto.
* expr.h (can_store_by_pieces): Ditto.
(try_casesi): Ditto.
(try_tablejump): Ditto.
(safe_from_p): Ditto.
* sbitmap.h (bitmap_equal_p): Ditto.
* cselib.cc (references_value_p): Change return type
from int to void and adjust function body accordingly.
(rtx_equal_for_cselib_1): Ditto.
* expr.cc (is_aligning_offset): Ditto.
(can_store_by_pieces): Ditto.
(mostly_zeros_p): Ditto.
(all_zeros_p): Ditto.
(safe_from_p): Ditto.
(is_aligning_offset): Ditto.
(try_casesi): Ditto.
(try_tablejump): Ditto.
(store_constructor): Change "need_to_clear" and
"const_bounds_p" variables to bool.
* sbitmap.cc (bitmap_equal_p): Change return type from int to bool.
|
|
The following adds an alternate way of expanding a uniform
mask vector constructor like
_55 = _2 ? -1 : 0;
vect_cst__56 = {_55, _55, _55, _55, _55, _55, _55, _55};
when the mask mode is a scalar int mode like for AVX512 or GCN.
Instead of piecewise building the result via shifts and ors
we can take advantage of uniformity and signedness of the
component and simply sign-extend to the result.
Instead of
cmpl $3, %edi
sete %cl
movl %ecx, %esi
leal (%rsi,%rsi), %eax
leal 0(,%rsi,4), %r9d
leal 0(,%rsi,8), %r8d
orl %esi, %eax
orl %r9d, %eax
movl %ecx, %r9d
orl %r8d, %eax
movl %ecx, %r8d
sall $4, %r9d
sall $5, %r8d
sall $6, %esi
orl %r9d, %eax
orl %r8d, %eax
movl %ecx, %r8d
orl %esi, %eax
sall $7, %r8d
orl %r8d, %eax
kmovb %eax, %k1
we then get
cmpl $3, %edi
sete %cl
negl %ecx
kmovb %ecx, %k1
Code generation for non-uniform masks remains bad, but at least
I see no easy way out for the most general case here.
PR middle-end/110452
* expr.cc (store_constructor): Handle uniform boolean
vectors with integer mode specially.
|
|
This middle-end patch avoids some redundant RTL for vector initialization
during RTL expansion. For the simple test case:
typedef __int128 v1ti __attribute__ ((__vector_size__ (16)));
__int128 key;
v1ti foo() {
return (v1ti){key};
}
the middle-end currently expands:
(set (reg:V1TI 85) (const_vector:V1TI [ (const_int 0) ]))
(set (reg:V1TI 85) (mem/c:V1TI (symbol_ref:DI ("key"))))
where we create a dead instruction that initializes the vector to zero,
immediately followed by a set of the entire vector. This patch skips
this zeroing instruction when the vector has only a single element.
It also updates the code to indicate when we've cleared the vector,
so that we don't need to initialize zero elements.
2023-06-13 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* expr.cc (store_constructor) <case VECTOR_TYPE>: Don't bother
clearing vectors with only a single element. Set CLEARED if the
vector was initialized to zero.
|
|
After expanding directly to rtl instead of
creating a tree, we could end up with
a const_int which is not ready to be handled
by extract_bit_field.
So need to the constant folding here instead.
OK? bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR middle-end/110117
gcc/ChangeLog:
* expr.cc (expand_single_bit_test): Handle
const_int from expand_expr.
gcc/testsuite/ChangeLog:
* gcc.dg/pr110117-1.c: New test.
* gcc.dg/pr110117-2.c: New test.
|
|
In r14-1534-g908e5ab5c11c, I forgot you could turn off CCP or
turn off the bit tracking part of CCP so we would lose out
what TER was able to do before hand. This moves around the
TER code so that it is used instead of just the nonzerobits.
It also makes it easier to remove the TER part of the code
later on too.
OK? Bootstrapped and tested on x86_64-linux-gnu.
Note it reintroduces PR 110117 (which was accidently fixed after
r14-1534-g908e5ab5c11c). The next patch in series will fix that.
gcc/ChangeLog:
* expr.cc (do_store_flag): Rearrange the
TER code so that it overrides the nonzero bits
info if we had `a & POW2`.
|
|
This patch removes the old widen plus/minus tree codes which have been
replaced by internal functions.
2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Joel Hutton <joel.hutton@arm.com>
gcc/ChangeLog:
* doc/generic.texi: Remove old tree codes.
* expr.cc (expand_expr_real_2): Remove old tree code cases.
* gimple-pretty-print.cc (dump_binary_rhs): Likewise.
* optabs-tree.cc (optab_for_tree_code): Likewise.
(supportable_half_widening_operation): Likewise.
* tree-cfg.cc (verify_gimple_assign_binary): Likewise.
* tree-inline.cc (estimate_operator_cost): Likewise.
(op_symbol_code): Likewise.
* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise.
(vect_analyze_data_ref_accesses): Likewise.
* tree-vect-generic.cc (expand_vector_operations_1): Likewise.
* cfgexpand.cc (expand_debug_expr): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Likewise.
(supportable_widening_operation): Likewise.
* gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard):
Likewise.
* optabs.def (vec_widen_ssubl_hi_optab, vec_widen_ssubl_lo_optab,
vec_widen_saddl_hi_optab, vec_widen_saddl_lo_optab,
vec_widen_usubl_hi_optab, vec_widen_usubl_lo_optab,
vec_widen_uaddl_hi_optab, vec_widen_uaddl_lo_optab): Remove optabs.
* tree-pretty-print.cc (dump_generic_node): Remove tree code definition.
* tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR,
VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR,
VEC_WIDEN_MINUS_LO_EXPR): Likewise.
|
|
This is a case which I noticed while working on the previous patch.
Sometimes we end up with `a == CST` instead of comparing against 0.
This happens in the following code:
```
unsigned f(unsigned t)
{
if (t & ~(1<<30)) __builtin_unreachable();
t ^= (1<<30);
return t != 0;
}
```
We should handle the case where the nonzero bits is the same as the
comparison operand.
Changes from v1:
* v2: Updated for the bit extraction changes.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* expr.cc (do_store_flag): Improve for single bit testing
not against zero but against that single bit.
|
|
While working something else, I noticed we could improve
the following function code generation:
```
unsigned f(unsigned t)
{
if (t & ~(1<<30)) __builtin_unreachable();
return t != 0;
}
```
Right know we just emit a comparison against 0 instead
of just a shift right by 30.
There is code in do_store_flag which already optimizes
`(t & 1<<30) != 0` to `(t >> 30) & 1` (using bit extraction if available).
This patch extends it to handle the case where we know t has a nonzero
of just one bit set.
Changes from v1:
* v2: Updated for the bit extraction improvements.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* expr.cc (do_store_flag): Extend the one bit checking case
to handle the case where we don't have an and but rather still
one bit is known to be non-zero.
|
|
I had thought extract_bit_field bitpos argument was the shifted position
and not the bitposition like BIT_FIELD_REF so I had removed the code which
would use the correct bitposition for BYTES_BIG_ENDIAN.
Committed as obvious; I checked big-endian MIPS to make sure we are now
producing the correct code.
gcc/ChangeLog:
* expr.cc (expand_single_bit_test): Correct bitpos for big-endian.
|