Age | Commit message (Collapse) | Author | Files | Lines |
|
After expanding directly to rtl instead of
creating a tree, we could end up with
a const_int which is not ready to be handled
by extract_bit_field.
So need to the constant folding here instead.
OK? bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR middle-end/110117
gcc/ChangeLog:
* expr.cc (expand_single_bit_test): Handle
const_int from expand_expr.
gcc/testsuite/ChangeLog:
* gcc.dg/pr110117-1.c: New test.
* gcc.dg/pr110117-2.c: New test.
|
|
In r14-1534-g908e5ab5c11c, I forgot you could turn off CCP or
turn off the bit tracking part of CCP so we would lose out
what TER was able to do before hand. This moves around the
TER code so that it is used instead of just the nonzerobits.
It also makes it easier to remove the TER part of the code
later on too.
OK? Bootstrapped and tested on x86_64-linux-gnu.
Note it reintroduces PR 110117 (which was accidently fixed after
r14-1534-g908e5ab5c11c). The next patch in series will fix that.
gcc/ChangeLog:
* expr.cc (do_store_flag): Rearrange the
TER code so that it overrides the nonzero bits
info if we had `a & POW2`.
|
|
I noticed while looking at some code generation issue, that forwprop
was not handling `-a == 0` for unsigned types and I was confused why
it was not.
r6-1814-g66e1cacf608045 removed these from fold because they
were supposed to be already handled by the match.pd patterns
but it was missed that the match.pd patterns checked
TYPE_OVERFLOW_UNDEFINED while fold didn't do that for NE/EQ.
This patch removes the restriction on NE/EQ on TYPE_OVERFLOW_UNDEFINED.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/110134
* match.pd (-A CMP -B -> B CMP A): Allow EQ/NE for all integer
types.
(-A CMP CST -> B CMP (-CST)): Likewise.
gcc/testsuite/ChangeLog:
PR tree-optimization/110134
* gcc.dg/tree-ssa/negneq-1.c: New test.
* gcc.dg/tree-ssa/negneq-2.c: New test.
* gcc.dg/tree-ssa/negneq-3.c: New test.
* gcc.dg/tree-ssa/negneq-4.c: New test.
|
|
are constant
This adds a match pattern that are for boolean values
that optimizes `a ? onezero : 0` to `a & onezero` and
`a ? 1 : onezero` to `a | onezero`.
This was reported a few times and I thought I would finally
add the match pattern for this.
This hits a few times in GCC itself too.
Notes on the testcases:
* phi-opt-2.c: This now is optimized to `a & b` in phiopt rather than ifcombine
* phi-opt-25b.c: The test part that was failing was parity which now gets `x & y` treatment.
* ssa-thread-21.c: there is no longer a threading opportunity, so need to disable phiopt.
Note PR 109957 is filed for the now missing optimization in that testcase too.
gcc/ChangeLog:
PR tree-optimization/89263
PR tree-optimization/99069
PR tree-optimization/20083
PR tree-optimization/94898
* match.pd: Add patterns to optimize `a ? onezero : onezero` with
one of the operands are constant.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-2.c: Adjust the testcase.
* gcc.dg/tree-ssa/phi-opt-25b.c: Adjust the testcase.
* gcc.dg/tree-ssa/ssa-thread-21.c: Disable phiopt.
* gcc.dg/tree-ssa/phi-opt-27.c: New test.
* gcc.dg/tree-ssa/phi-opt-28.c: New test.
* gcc.dg/tree-ssa/phi-opt-29.c: New test.
* gcc.dg/tree-ssa/phi-opt-30.c: New test.
* gcc.dg/tree-ssa/phi-opt-31.c: New test.
* gcc.dg/tree-ssa/phi-opt-32.c: New test.
|
|
While working on `bool0 ? bool1 : bool2` I noticed that
zero_one_valued_p does not match on the constant zero
as in that case tree_nonzero_bits will return 0 and
that is different from 1.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd (zero_one_valued_p): Match 0 integer constant
too.
|
|
This patch would like to fix the incorrect requirement of the vector
builtin types for the ZVFH/ZVFHMIN extension. The incorrect requirement
will result in the ops mismatch with iterators, and then ICE will be
triggered if ZVFH/ZVFHMIN is not given.
Sorry for inconviensient.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): Take RVV_REQUIRE_ELEN_FP_16 as requirement.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): Ditto.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
|
|
While looking at PRs about cases where we don't perform the named return
value optimization, it occurred to me that it might be useful to have a
warning for that.
This does not fix PR58487, but might be interesting to people watching it.
PR c++/58487
gcc/c-family/ChangeLog:
* c.opt: Add -Wnrvo.
gcc/ChangeLog:
* doc/invoke.texi: Document it.
gcc/cp/ChangeLog:
* typeck.cc (want_nrvo_p): New.
(check_return_expr): Handle -Wnrvo.
gcc/testsuite/ChangeLog:
* g++.dg/opt/nrv25.C: New test.
|
|
Our implementation of the named return value optimization has been limited
to variables declared in the outermost block of the function, to avoid
needing to handle the case where the variable needs to be destroyed due to
going out of scope. PR92407 pointed out a case we were missing, where the
variable goes out of scope due to a goto and we were failing to destroy it.
It occurred to me that this problem is the flip side of PR33799, where we
need to be sure to destroy the return value if a cleanup throws on return;
here we want to avoid destroying the return value when exiting the
variable's scope on return. We can use the same flag to indicate to both
cleanups that we're returning.
This implements the guaranteed copy elision specified by P2025 (which is not
yet part of the draft standard).
PR c++/51571
PR c++/92407
gcc/cp/ChangeLog:
* decl.cc (finish_function): Simplify NRV handling.
* except.cc (maybe_set_retval_sentinel): Also set if NRV.
(maybe_splice_retval_cleanup): Don't add the cleanup region
if we don't need it.
* semantics.cc (nrv_data): Add simple field.
(finalize_nrv): Set it.
(finalize_nrv_r): Check it and retval sentinel.
* cp-tree.h (finalize_nrv): Adjust declaration.
* typeck.cc (check_return_expr): Remove named_labels check.
gcc/testsuite/ChangeLog:
* g++.dg/opt/nrv23.C: New test.
|
|
Here our named return value optimization was breaking the required
destructor when the goto takes 'a' out of scope. The simplest fix is to
disable the optimization in the presence of user labels.
We could do better by disabling the optimization only if there is a backward
goto across the variable declaration, but we don't currently track that.
PR c++/92407
gcc/cp/ChangeLog:
* typeck.cc (check_return_expr): Prevent NRV in the presence of
named labels.
gcc/testsuite/ChangeLog:
* g++.dg/opt/nrv22.C: New test.
|
|
While looking at PR92407 I noticed that the expectations of
maybe_splice_retval_cleanup weren't being met; an sk_cleanup level was
confusing its attempt to recognize the outer block of the function. And
even if I fixed the detection, it failed to actually wrap the body of the
function because the STATEMENT_LIST it got only had the label, not anything
after it. So I moved the call after poplevel does pop_stmt_list on all the
sk_cleanup levels.
PR c++/33799
gcc/cp/ChangeLog:
* except.cc (maybe_splice_retval_cleanup): Change
recognition of function body and try scopes.
* semantics.cc (do_poplevel): Call it after poplevel.
(at_try_scope): New.
* cp-tree.h (maybe_splice_retval_cleanup): Adjust.
gcc/testsuite/ChangeLog:
* g++.dg/eh/return1.C: Add label cases.
|
|
The NRV implementation was blindly replacing the operand of RETURN_EXPR,
clobbering anything that check_return_expr might have added on to the actual
initialization, such as checking the postcondition.
gcc/cp/ChangeLog:
* semantics.cc (finalize_nrv_r): [RETURN_EXPR]: Only replace the
INIT_EXPR.
gcc/testsuite/ChangeLog:
* g++.dg/contracts/contracts-post7.C: New test.
|
|
This was fixed in GCC 10.
PR c++/58050
gcc/testsuite/ChangeLog:
* g++.dg/opt/nrv24.C: New test.
|
|
Fix off by one in m2.flex when the line number is set via cpp.
gcc/m2/ChangeLog:
PR modula2/110019
* gm2-compiler/SymbolKey.mod (SearchAndDo): Reformatted.
(ForeachNodeDo): Reformatted.
* gm2-compiler/SymbolTable.mod (AddListify): Join list
with "," or "and" if more than one word is in the list.
* m2.flex: Remove -1 from atoi(yytext) line number.
gcc/testsuite/ChangeLog:
PR modula2/110019
* gm2/cpp/fail/cpp-fail.exp: New test.
* gm2/cpp/fail/foocpp.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
|
|
An analysis of backend UNSPECs reveals that two of the most common UNSPECs
across target backends are for copysign and bit reversal. This patch
adds RTX codes for these expressions to allow their representation to
be standardized, and them to optimized by the middle-end RTL optimizers.
2023-06-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* doc/rtl.texi (bitreverse, copysign): Document new RTX codes.
* rtl.def (BITREVERSE, COPYSIGN): Define new RTX codes.
* simplify-rtx.cc (simplify_unary_operation_1): Optimize
NOT (BITREVERSE x) as BITREVERSE (NOT x).
Optimize POPCOUNT (BITREVERSE x) as POPCOUNT x.
Optimize PARITY (BITREVERSE x) as PARITY x.
Optimize BITREVERSE (BITREVERSE x) as x.
(simplify_const_unary_operation) <case BITREVERSE>: Evaluate
BITREVERSE of a constant integer at compile-time.
(simplify_binary_operation_1) <case COPYSIGN>: Optimize
COPY_SIGN (x, x) as x. Optimize COPYSIGN (x, C) as ABS x
or NEG (ABS x) for constant C. Optimize COPYSIGN (ABS x, y)
and COPYSIGN (NEG x, y) as COPYSIGN (x, y).
Optimize COPYSIGN (x, ABS y) as ABS x.
Optimize COPYSIGN (COPYSIGN (x, y), z) as COPYSIGN (x, z).
Optimize COPYSIGN (x, COPYSIGN (y, z)) as COPYSIGN (x, z).
(simplify_const_binary_operation): Evaluate COPYSIGN of constant
arguments at compile-time.
|
|
gcc/ChangeLog:
* rtl.h (function_invariant_p): Change return type from int to bool.
* reload1.cc (function_invariant_p): Change return type from
int to bool and adjust function body accordingly.
|
|
Fix according to comments from Robin of V1 patch.
This patch add combine optimization for following case:
__attribute__ ((noipa)) void
vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b,
int n)
{
for (int i = 0; i < n; i++)
dst[i] += (int16_t) a[i] * (int16_t) b[i];
}
Before this patch:
...
vsext.vf2
vzext.vf2
vmadd.vv
..
After this patch:
...
vwmaccsu.vv
...
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*<optab>_fma<mode>): New pattern.
(*single_<optab>mult_plus<mode>): Ditto.
(*double_<optab>mult_plus<mode>): Ditto.
(*sign_zero_extend_fma): Ditto.
(*zero_sign_extend_fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test.
|
|
This implements support for the OpenMP 5.1 'present' modifier, which can be
used in map clauses in the 'target', 'target data', 'target data enter' and
'target data exit' constructs, and in the 'to' and 'from' clauses of the
'target update' construct. It is also supported in defaultmap.
The modifier triggers a fatal runtime error if the data specified by the
clause is not already present on the target device. It can also be combined
with 'always' in map clauses.
2023-06-06 Kwok Cheung Yeung <kcy@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
gcc/c/
* c-parser.cc (c_parser_omp_clause_defaultmap,
c_parser_omp_clause_map): Parse 'present'.
(c_parser_omp_clause_to, c_parser_omp_clause_from): Remove.
(c_parser_omp_clause_from_to): New; parse to/from clauses with
optional present modifer.
(c_parser_omp_all_clauses): Update call.
(c_parser_omp_target_data, c_parser_omp_target_enter_data,
c_parser_omp_target_exit_data): Handle new map enum values
for 'present' mapping.
gcc/cp/
* parser.cc (cp_parser_omp_clause_defaultmap,
cp_parser_omp_clause_map): Parse 'present'.
(cp_parser_omp_clause_from_to): New; parse to/from
clauses with optional 'present' modifier.
(cp_parser_omp_all_clauses): Update call.
(cp_parser_omp_target_data, cp_parser_omp_target_enter_data,
cp_parser_omp_target_exit_data): Handle new enum value for
'present' mapping.
* semantics.cc (finish_omp_target): Likewise.
gcc/fortran/
* dump-parse-tree.cc (show_omp_namelist): Display 'present' map
modifier.
(show_omp_clauses): Display 'present' motion modifier for 'to'
and 'from' clauses.
* gfortran.h (enum gfc_omp_map_op): Add entries with 'present'
modifiers.
(struct gfc_omp_namelist): Add 'present_modifer'.
* openmp.cc (gfc_match_motion_var_list): New, handles optional
'present' modifier for to/from clauses.
(gfc_match_omp_clauses): Call it for to/from clauses; parse 'present'
in defaultmap and map clauses.
(resolve_omp_clauses): Allow 'present' modifiers on 'target',
'target data', 'target enter' and 'target exit' directives.
* trans-openmp.cc (gfc_trans_omp_clauses): Apply 'present' modifiers
to tree node for 'map', 'to' and 'from' clauses. Apply 'present' for
defaultmap.
gcc/
* gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag
and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag
set.
(omp_get_attachment): Handle map clauses with 'present' modifier.
(omp_group_base): Likewise.
(gimplify_scan_omp_clauses): Reorder present maps to come first.
Set GOVD flags for present defaultmaps.
(gimplify_adjust_omp_clauses_1): Set map kind for present defaultmaps.
* omp-low.cc (scan_sharing_clauses): Handle 'always, present' map
clauses.
(lower_omp_target): Handle map clauses with 'present' modifier.
Handle 'to' and 'from' clauses with 'present'.
* tree-core.h (enum omp_clause_defaultmap_kind): Add
OMP_CLAUSE_DEFAULTMAP_PRESENT defaultmap kind.
* tree-pretty-print.cc (dump_omp_clause): Handle 'map', 'to' and
'from' clauses with 'present' modifier. Handle present defaultmap.
* tree.h (OMP_CLAUSE_MOTION_PRESENT): New #define.
include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_5): New.
(GOMP_MAP_FLAG_FORCE): Redefine.
(GOMP_MAP_FLAG_PRESENT, GOMP_MAP_FLAG_ALWAYS_PRESENT): New.
(enum gomp_map_kind): Add map kinds with 'present' modifiers.
(GOMP_MAP_COPY_TO_P, GOMP_MAP_COPY_FROM_P): Evaluate to true for
map variants with 'present'
(GOMP_MAP_ALWAYS_TO_P, GOMP_MAP_ALWAYS_FROM_P): Evaluate to true
for map variants with 'always, present' modifiers.
(GOMP_MAP_ALWAYS): Redefine.
(GOMP_MAP_FORCE_P, GOMP_MAP_PRESENT_P): New.
libgomp/
* libgomp.texi (OpenMP 5.1 Impl. status): Set 'present' support for
defaultmap to 'Y', add 'Y' entry for 'present' on to/from/map clauses.
* target.c (gomp_to_device_kind_p): Add map kinds with 'present'
modifier.
(gomp_map_vars_existing): Use new GOMP_MAP_FORCE_P macro.
(gomp_map_vars_internal, gomp_update, gomp_target_rev):
Emit runtime error if memory region not present.
* testsuite/libgomp.c-c++-common/target-present-1.c: New test.
* testsuite/libgomp.c-c++-common/target-present-2.c: New test.
* testsuite/libgomp.c-c++-common/target-present-3.c: New test.
* testsuite/libgomp.fortran/target-present-1.f90: New test.
* testsuite/libgomp.fortran/target-present-2.f90: New test.
* testsuite/libgomp.fortran/target-present-3.f90: New test.
gcc/testsuite/
* c-c++-common/gomp/map-6.c: Update dg-error, extend to test for
duplicated 'present' and extend scan-dump tests for 'present'.
* gfortran.dg/gomp/defaultmap-1.f90: Update dg-error.
* gfortran.dg/gomp/map-7.f90: Extend parse and dump test for
'present'.
* gfortran.dg/gomp/map-8.f90: Extend for duplicate 'present'
modifier checking.
* c-c++-common/gomp/defaultmap-4.c: New test.
* c-c++-common/gomp/map-9.c: New test.
* c-c++-common/gomp/target-update-1.c: New test.
* gfortran.dg/gomp/defaultmap-8.f90: New test.
* gfortran.dg/gomp/map-11.f90: New test.
* gfortran.dg/gomp/map-12.f90: New test.
* gfortran.dg/gomp/target-update-1.f90: New test.
|
|
2023-06-06 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/genfusion.pl: Delete some dead code.
|
|
This makes the code more readable, more digestible, more maintainable,
more extensible. That kind of thing. It does that by pulling things
apart a bit, but also making what stays together more cohesive lumps.
The original function was a bunch of loops and early-outs, and then
quite a bit of stuff done per iteration, with the iterations essentially
independent of each other. This patch moves the stuff done for one
iteration to a new _one function.
The second big thing is the stuff printed to the .md file is done in
"here documents" now, which is a lot more readable than having to quote
and escape and double-escape pieces of text. Whitespace inside the
here-document is significant (will be printed as-is), which is a bit
awkward sometimes, or might take some getting used to, but it is also
one of the benefits of using them.
Local variables are declared at first use (or close to first use).
There also shouldn't be many at all, often you can write easier to
read and manage code by omitting to name something that is hard to name
in the first place.
Finally some things are done in more typical, more modern, and tighter
Perl style, for example REs in "if"s or "qw" for lists of constants.
2023-06-06 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): New, rewritten and
split out from...
(gen_ld_cmpi_p10): ... this.
|
|
PR106907 has few warnings spotted from cppcheck. In that addressing duplicate
expression issue here. Here the same expression is used twice in logical
AND(&&) operation which result in same result so removing that.
2023-06-06 Jeevitha Palanisamy <jeevitha@linux.ibm.com>
gcc/
PR target/106907
* config/rs6000/rs6000.cc (vec_const_128bit_to_bytes): Remove
duplicate expression.
|
|
The aarch64_addpdi pattern is redundant as the reduc_plus_scal_<mode> pattern can already generate
the required form of the ADDP instruction, and is mostly folded to GIMPLE early on so can benefit from more optimisations.
Though it turns out that we were missing the folding for the unsigned variants.
This patch adds that and wires up the vpaddd_u64 and vpaddd_s64 intrinsics through the above pattern instead
so that we can remove a redundant pattern and get more optimisation earlier.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_general_gimple_fold_builtin):
Handle unsigned reduc_plus_scal_ builtins.
* config/aarch64/aarch64-simd-builtins.def (addp): Delete DImode instances.
* config/aarch64/aarch64-simd.md (aarch64_addpdi): Delete.
* config/aarch64/arm_neon.h (vpaddd_s64): Reimplement with
__builtin_aarch64_reduc_plus_scal_v2di.
(vpaddd_u64): Reimplement with __builtin_aarch64_reduc_plus_scal_v2di_uu.
|
|
Having converted the patterns for the URSRA,SRSRA instructions to standard RTL codes we can also
easily convert the non-accumulating forms URSHR,SRSHR.
This patch does that, reusing the various helpers and predicates from that patch in a straightforward way.
This allows GCC to perform the optimisations in the testcase, matching what Clang does.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>shr_n<mode>): Delete.
(aarch64_<sra_op>rshr_n<mode><vczle><vczbe>_insn): New define_insn.
(aarch64_<sra_op>rshr_n<mode>): New define_expand.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/vrshr_1.c: New test.
|
|
Now that we've got the <vczle><vczbe> annotations we can get rid of explicit
!BYTES_BIG_ENDIAN and BYTES_BIG_ENDIAN patterns for the narrowing shift instructions.
This allows us to clean up the expanders as well.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le): Delete.
(aarch64_shrn<mode>_insn_be): Delete.
(*aarch64_<srn_op>shrn<mode>_vect): Rename to...
(*aarch64_<srn_op>shrn<mode><vczle><vczbe>): ... This.
(aarch64_shrn<mode>): Remove reference to the above deleted patterns.
(aarch64_rshrn<mode>_insn_le): Delete.
(aarch64_rshrn<mode>_insn_be): Delete.
(aarch64_rshrn<mode><vczle><vczbe>_insn): New define_insn.
(aarch64_rshrn<mode>): Remove references to the above deleted patterns.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/pr99195_5.c: Add testing for shrn_n, rshrn_n
intrinsics.
|
|
We've received requests to optimise the attached intrinsics testcase.
We currently generate:
foo_1:
uaddlp v0.4s, v0.8h
uaddlv d31, v0.4s
fmov x0, d31
ret
foo_2:
uaddlp v0.4s, v0.8h
addv s31, v0.4s
fmov w0, s31
ret
foo_3:
saddlp v0.4s, v0.8h
addv s31, v0.4s
fmov w0, s31
ret
The widening pair-wise addition addlp instructions can be omitted if we're just doing an ADDV afterwards.
Making this optimisation would be quite simple if we had a standard RTL PLUS vector reduction code.
As we don't, we can use UNSPEC_ADDV as a stand in.
This patch expresses the SADDLV and UADDLV instructions as an UNSPEC_ADDV over a widened input, thus removing
the need for separate UNSPEC_SADDLV and UNSPEC_UADDLV codes.
To optimise the testcases involved we add two splitters that match a vector addition where all participating elements
are taken and widened from the same vector and then fed into an UNSPEC_ADDV. In that case we can just remove the
vector PLUS and just emit the simple RTL for SADDLV/UADDLV.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_parallel_select_half_p):
Define prototype.
(aarch64_pars_overlap_p): Likewise.
* config/aarch64/aarch64-simd.md (aarch64_<su>addlv<mode>):
Express in terms of UNSPEC_ADDV.
(*aarch64_<su>addlv<VDQV_L:mode>_ze<GPI:mode>): Likewise.
(*aarch64_<su>addlv<mode>_reduction): Define.
(*aarch64_uaddlv<mode>_reduction_2): Likewise.
* config/aarch64/aarch64.cc (aarch64_parallel_select_half_p): Define.
(aarch64_pars_overlap_p): Likewise.
* config/aarch64/iterators.md (UNSPEC_SADDLV, UNSPEC_UADDLV): Delete.
(VQUADW): New mode attribute.
(VWIDE2X_S): Likewise.
(USADDLV): Delete.
(su): Delete handling of UNSPEC_SADDLV, UNSPEC_UADDLV.
* config/aarch64/predicates.md (vect_par_cnst_select_half): Define.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/addlv_1.c: New test.
|
|
The gimplifier can elide initialized constant automatic variables
to static storage in which case TARGET_EXPR gimplification needs
to avoid emitting a CLOBBER for them since their lifetime is no
longer limited. Failing to do so causes spurious dangling-pointer
diagnostics on the added testcase for some targets.
PR middle-end/110055
* gimplify.cc (gimplify_target_expr): Do not emit
CLOBBERs for variables which have static storage duration
after gimplifying their initializers.
* g++.dg/warn/Wdangling-pointer-pr110055.C: New testcase.
|
|
The following improves solution_set_expand to require one less
iteration over the bitmap and avoid changing the bitmap we iterate
over. Plus we handle adjacent subvars in the ID space (the common case)
and use bitmap_set_range. This cuts a bit less than 10% off the PTA
time from the testcase in the PR.
PR tree-optimization/109143
* tree-ssa-structalias.cc (solution_set_expand): Avoid
one bitmap iteration and optimize bit range setting.
|
|
PR bootstrap/110120
* postreload.cc (reload_cse_move2add, move2add_use_add2_insn): Use
XVECEXP, not XEXP, to access first item of a PARALLEL.
|
|
gcc/testsuite/ChangeLog:
* gcc.target/riscv/save-restore-cfi.c: New test to check save-restore
cfi directives.
|
|
This patch support the intrinsic API of FP16 ZVFH Reduction floating-point.
Aka SEW=16 for below instructions:
vfredosum vfredusum
vfredmax vfredmin
vfwredosum vfwredusum
Then users can leverage the instrinsic APIs to perform the FP=16 related
reduction operations. Please note not all the instrinsic APIs are coverred
in the test files, only pick some typical ones due to too many. We will
perform the FP16 related instrinsic API test entirely soon.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add vfloat16mf4_t to WF operations.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/vector-iterators.md: Add FP=16 to VWF, VWF_ZVE64,
VWLMUL1, VWLMUL1_ZVE64, vwlmul1 and vwlmul1_zve64.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new test cases.
|
|
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue): Use Pmode
for cfi reg/mem machmode
(riscv_adjust_libcall_cfi_epilogue): Use Pmode for cfi reg machmode
gcc/testsuite/ChangeLog:
* gcc.target/riscv/save-restore-cfi-2.c: New test to check machmode
for cfi reg/mem.
|
|
gcc/ChangeLog:
* config/riscv/vector-iterators.md:
Fix 'REQUIREMENT' for machine_mode 'MODE'.
* config/riscv/vector.md (@pred_indexed_<order>store<VNX16_QHS:mode>
<VNX16_QHSI:mode>): change VNX16_QHSI to VNX16_QHSDI.
(@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>): Ditto.
|
|
This patch would like to fix some typo in vector-iterators.md, aka:
[-"vnx1DI")-]{+"vnx1di")+}
[-"vnx2SI")-]{+"vnx2si")+}
[-"vnx1SI")-]{+"vnx1si")+}
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/vector-iterators.md: Fix typo in mode attr.
|
|
|
|
This patch removes the old widen plus/minus tree codes which have been
replaced by internal functions.
2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Joel Hutton <joel.hutton@arm.com>
gcc/ChangeLog:
* doc/generic.texi: Remove old tree codes.
* expr.cc (expand_expr_real_2): Remove old tree code cases.
* gimple-pretty-print.cc (dump_binary_rhs): Likewise.
* optabs-tree.cc (optab_for_tree_code): Likewise.
(supportable_half_widening_operation): Likewise.
* tree-cfg.cc (verify_gimple_assign_binary): Likewise.
* tree-inline.cc (estimate_operator_cost): Likewise.
(op_symbol_code): Likewise.
* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise.
(vect_analyze_data_ref_accesses): Likewise.
* tree-vect-generic.cc (expand_vector_operations_1): Likewise.
* cfgexpand.cc (expand_debug_expr): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Likewise.
(supportable_widening_operation): Likewise.
* gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard):
Likewise.
* optabs.def (vec_widen_ssubl_hi_optab, vec_widen_ssubl_lo_optab,
vec_widen_saddl_hi_optab, vec_widen_saddl_lo_optab,
vec_widen_usubl_hi_optab, vec_widen_usubl_lo_optab,
vec_widen_uaddl_hi_optab, vec_widen_uaddl_lo_optab): Remove optabs.
* tree-pretty-print.cc (dump_generic_node): Remove tree code definition.
* tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR,
VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR,
VEC_WIDEN_MINUS_LO_EXPR): Likewise.
|
|
DEF_INTERNAL_WIDENING_OPTAB_FN and DEF_INTERNAL_NARROWING_OPTAB_FN
are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN
respectively. With the exception that they provide convenience wrappers
for a single vector to vector conversion, a hi/lo split or an even/odd
split. Each definition for <NAME> will require either signed optabs
named <UOPTAB> and <SOPTAB> (for widening) or a single <OPTAB> (for
narrowing) for each of the five functions it creates.
For example, for widening addition the
DEF_INTERNAL_WIDENING_OPTAB_FN will create five internal functions:
IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
IFN_VEC_WIDEN_PLUS_EVEN and IFN_VEC_WIDEN_PLUS_ODD. Each requiring two
optabs, one for signed and one for unsigned.
Aarch64 implements the hi/lo split optabs:
IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> (u/s)addl2
IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> -> (u/s)addl
This gives the same functionality as the previous
WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into
VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Joel Hutton <joel.hutton@arm.com>
Tamar Christina <tamar.christina@arm.com>
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): Rename
this ...
(vec_widen_<su>add_lo_<mode>): ... to this.
(vec_widen_<su>addl_hi_<mode>): Rename this ...
(vec_widen_<su>add_hi_<mode>): ... to this.
(vec_widen_<su>subl_lo_<mode>): Rename this ...
(vec_widen_<su>sub_lo_<mode>): ... to this.
(vec_widen_<su>subl_hi_<mode>): Rename this ...
(vec_widen_<su>sub_hi_<mode>): ...to this.
* doc/generic.texi: Document new IFN codes.
* internal-fn.cc (lookup_hilo_internal_fn): Add lookup function.
(commutative_binary_fn_p): Add widen_plus fn's.
(widening_fn_p): New function.
(narrowing_fn_p): New function.
(direct_internal_fn_optab): Change visibility.
* internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an
internal_fn that expands into multiple internal_fns for widening.
(IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD,
IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI,
IFN_VEC_WIDEN_MINUS_LO, IFN_VEC_WIDEN_MINUS_ODD,
IFN_VEC_WIDEN_MINUS_EVEN): Define widening plus,minus functions.
* internal-fn.h (direct_internal_fn_optab): Declare new prototype.
(lookup_hilo_internal_fn): Likewise.
(widening_fn_p): Likewise.
(Narrowing_fn_p): Likewise.
* optabs.cc (commutative_optab_p): Add widening plus optabs.
* optabs.def (OPTAB_D): Define widen add, sub optabs.
* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
patterns with a hi/lo or even/odd split.
(vect_recog_sad_pattern): Refactor to use new IFN codes.
(vect_recog_widen_plus_pattern): Likewise.
(vect_recog_widen_minus_pattern): Likewise.
(vect_recog_average_pattern): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Add support for
_HILO IFNs.
(supportable_widening_operation): Likewise.
* tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vect-widen-add.c: Test that new
IFN_VEC_WIDEN_PLUS is being used.
* gcc.target/aarch64/vect-widen-sub.c: Test that new
IFN_VEC_WIDEN_MINUS is being used.
|
|
Refactor vect-patterns to allow patterns to be internal_fns starting
with widening_plus/minus patterns
2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com>
Joel Hutton <joel.hutton@arm.com>
gcc/ChangeLog:
* tree-vect-patterns.cc: Add include for gimple-iterator.
(vect_recog_widen_op_pattern): Refactor to use code_helper.
(vect_gimple_build): New function.
* tree-vect-stmts.cc (simple_integer_narrowing): Refactor to use
code_helper.
(vectorizable_call): Likewise.
(vect_gen_widened_results_half): Likewise.
(vect_create_vectorized_demotion_stmts): Likewise.
(vect_create_vectorized_promotion_stmts): Likewise.
(vect_create_half_widening_stmts): Likewise.
(vectorizable_conversion): Likewise.
(supportable_widening_operation): Likewise.
(supportable_narrowing_operation): Likewise.
* tree-vectorizer.h (supportable_widening_operation): Change
prototype to use code_helper.
(supportable_narrowing_operation): Likewise.
(vect_gimple_build): New function prototype.
* tree.h (code_helper::safe_as_tree_code): New function.
(code_helper::safe_as_fn_code): New function.
|
|
All special enums have declarations in the D runtime library, but the
compiler will recognize and treat them specially if declared in any
module. When the underlying base type of a special enum is a different
size to its matched intrinsic, then this can cause undefined behavior at
runtime. Detect and warn about when such a mismatch occurs.
gcc/d/ChangeLog:
* gdc.texi (Warnings): Document -Wextra and -Wmismatched-special-enum.
* implement-d.texi (Special Enums): Add reference to warning option
-Wmismatched-special-enum.
* lang.opt: Add -Wextra and -Wmismatched-special-enum.
* types.cc (TypeVisitor::visit (TypeEnum *)): Warn when declared
special enum size mismatches its intrinsic type.
gcc/testsuite/ChangeLog:
* gdc.dg/Wmismatched_enum.d: New test.
|
|
This patch provides a wide-int implementation of bitreverse, that
implements both of Richard Sandiford's suggestions from the review at
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618215.html of an
improved API (as a stand-alone function matching the bswap refactoring),
and an implementation that works with any bit-width precision.
2023-06-05 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* wide-int.cc (wi::bitreverse_large): New function implementing
bit reversal of an integer.
* wide-int.h (wi::bitreverse): New (template) function prototype.
(bitreverse_large): Prototype helper function/implementation.
(wi::bitreverse): New template wrapper around bitreverse_large.
|
|
I find fail of the xtheadcondmov-indirect-rv64.c test case and provide a way to solve it.
In this patch, I take Kito's advice that I modify the form of the function bodies.It likes
*[a-x0-9].
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Generalize to be
less sensitive to register allocation choices.
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Similarly.
|
|
Also change one internal variable to bool.
gcc/ChangeLog:
* rtl.h (print_rtl_single): Change return type from int to void.
(print_rtl_single_with_indent): Ditto.
* print-rtl.h (class rtx_writer): Ditto. Change m_sawclose to bool.
* print-rtl.cc (rtx_writer::rtx_writer): Update for m_sawclose change.
(rtx_writer::print_rtx_operand_code_0): Ditto.
(rtx_writer::print_rtx_operand_codes_E_and_V): Ditto.
(rtx_writer::print_rtx_operand_code_i): Ditto.
(rtx_writer::print_rtx_operand_code_u): Ditto.
(rtx_writer::print_rtx_operand): Ditto.
(rtx_writer::print_rtx): Ditto.
(rtx_writer::finish_directive): Ditto.
(print_rtl_single): Change return type from int to void
and adjust function body accordingly.
(rtx_writer::print_rtl_single_with_indent): Ditto.
|
|
gcc/ChangeLog:
* rtl.h (reg_classes_intersect_p): Change return type from int to bool.
(reg_class_subset_p): Ditto.
* reginfo.cc (reg_classes_intersect_p): Ditto.
(reg_class_subset_p): Ditto.
|
|
This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:
vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt
Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
* config/riscv/vector-iterators.md: Add FP=16 support for V,
VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
On sh target, there is a MULTILIB_DIRNAMES (or is it MULTILIB_OPTIONS) named m2,
this conflicts with the langauge m2. So when you do a `make clean`, it will remove
the m2 directory and then a build will fail. Now since r0-78222-gfa9585134f6f58,
the multilib directories are no longer created in the gcc directory as libgcc
was moved to the toplevel. So we can remove the part of clean that removes those
directories.
Tested on x86_64-linux-gnu and a cross to sh-elf that `make clean` followed by
`make` works again.
OK?
gcc/ChangeLog:
PR bootstrap/110085
* Makefile.in (clean): Remove the removing of
MULTILIB_DIR/MULTILIB_OPTIONS directories.
|
|
speculation_barrier for MIPS needs sync+jr.hb (r2+),
so we implement __speculation_barrier in libgcc, like arm32 does.
gcc/ChangeLog:
* config/mips/mips-protos.h (mips_emit_speculation_barrier): New
prototype.
* config/mips/mips.cc (speculation_barrier_libfunc): New static
variable.
(mips_init_libfuncs): Initialize it.
(mips_emit_speculation_barrier): New function.
* config/mips/mips.md (speculation_barrier): Call
mips_emit_speculation_barrier.
libgcc/ChangeLog:
* config/mips/lib1funcs.S: New file.
define __speculation_barrier and include mips16.S.
* config/mips/t-mips: define LIB1ASMSRC as mips/lib1funcs.S.
define LIB1ASMFUNCS as _speculation_barrier.
set version info for __speculation_barrier.
* config/mips/libgcc-mips.ver: New file.
* config/mips/t-mips16: don't define LIB1ASMSRC as mips16.S
included in lib1funcs.S now.
|
|
This patch is just reorganizing the functions for the following patch.
I put rvv_builder and emit_* functions located before expand_const_vector
function since I will use them in expand_const_vector in the following patch.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (class rvv_builder): Reorganize functions.
(rvv_builder::can_duplicate_repeating_sequence_p): Ditto.
(rvv_builder::repeating_sequence_use_merge_profitable_p): Ditto.
(rvv_builder::get_merged_repeating_sequence): Ditto.
(rvv_builder::get_merge_scalar_mask): Ditto.
(emit_scalar_move_insn): Ditto.
(emit_vlmax_integer_move_insn): Ditto.
(emit_nonvlmax_integer_move_insn): Ditto.
(emit_vlmax_gather_insn): Ditto.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(get_repeating_sequence_dup_machine_mode): Ditto.
|
|
Since the following patch will calls expand_vec_perm with
splitted arguments, change the expand_vec_perm interface in
this patch.
gcc/ChangeLog:
* config/riscv/autovec.md: Split arguments.
* config/riscv/riscv-protos.h (expand_vec_perm): Ditto.
* config/riscv/riscv-v.cc (expand_vec_perm): Ditto.
|
|
|
|
This is a case which I noticed while working on the previous patch.
Sometimes we end up with `a == CST` instead of comparing against 0.
This happens in the following code:
```
unsigned f(unsigned t)
{
if (t & ~(1<<30)) __builtin_unreachable();
t ^= (1<<30);
return t != 0;
}
```
We should handle the case where the nonzero bits is the same as the
comparison operand.
Changes from v1:
* v2: Updated for the bit extraction changes.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* expr.cc (do_store_flag): Improve for single bit testing
not against zero but against that single bit.
|
|
While working something else, I noticed we could improve
the following function code generation:
```
unsigned f(unsigned t)
{
if (t & ~(1<<30)) __builtin_unreachable();
return t != 0;
}
```
Right know we just emit a comparison against 0 instead
of just a shift right by 30.
There is code in do_store_flag which already optimizes
`(t & 1<<30) != 0` to `(t >> 30) & 1` (using bit extraction if available).
This patch extends it to handle the case where we know t has a nonzero
of just one bit set.
Changes from v1:
* v2: Updated for the bit extraction improvements.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* expr.cc (do_store_flag): Extend the one bit checking case
to handle the case where we don't have an and but rather still
one bit is known to be non-zero.
|