Age | Commit message (Collapse) | Author | Files | Lines |
|
The mode-switching pass assumed that all of an entity's modes
were mutually exclusive. However, the upcoming SME changes
have an entity with some overlapping modes, so that there is
sometimes a "superunion" mode that contains two given modes.
We can use this relationship to pass something more helpful than
"don't know" to the emit hook.
This patch adds a new hook that targets can use to specify
a mode confluence operator.
With mutually exclusive modes, it's possible to compute a block's
incoming and outgoing modes by looking at its availability sets.
With the confluence operator, we instead need to solve a full
dataflow problem.
However, when emitting a mode transition, the upcoming SME use of
mode-switching benefits from having as much information as possible
about the starting mode. Calculating this information is definitely
worth the compile time.
The dataflow problem is written to work before and after the LCM
problem has been solved. A later patch makes use of this.
While there (since git blame would ping me for the reindented code),
I used a lambda to avoid the cut-&-pasted loops.
gcc/
* target.def (mode_switching.confluence): New hook.
* doc/tm.texi (TARGET_MODE_CONFLUENCE): New @hook.
* doc/tm.texi.in: Regenerate.
* mode-switching.cc (confluence_info): New variable.
(mode_confluence, forward_confluence_n, forward_transfer): New
functions.
(optimize_mode_switching): Use them to calculate mode_in when
TARGET_MODE_CONFLUENCE is defined.
|
|
The pass used the edge aux field to record which mode change
should happen on the edge, with -1 meaning "none". It's more
convenient for later patches to leave aux zero for "none",
and use numbers based at 1 to record a change.
gcc/
* mode-switching.cc (commit_mode_sets): Use 1-based edge aux values.
|
|
This patch passes the set of live hard registers to the after hook,
like the previous one did for the needed hook.
gcc/
* target.def (mode_switching.after): Add a regs_live parameter.
* doc/tm.texi: Regenerate.
* config/epiphany/epiphany-protos.h (epiphany_mode_after): Update
accordingly.
* config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise.
(epiphany_mode_after): Likewise.
* config/i386/i386.cc (ix86_mode_after): Likewise.
* config/riscv/riscv.cc (riscv_mode_after): Likewise.
* config/sh/sh.cc (sh_mode_after): Likewise.
* mode-switching.cc (optimize_mode_switching): Likewise.
|
|
The emit hook already takes the set of live hard registers as input.
This patch passes it to the needed hook too. SME uses this to
optimise the mode choice based on whether state is live or dead.
The main caller already had access to the required info, but the
special handling of return values did not.
gcc/
* target.def (mode_switching.needed): Add a regs_live parameter.
* doc/tm.texi: Regenerate.
* config/epiphany/epiphany-protos.h (epiphany_mode_needed): Update
accordingly.
* config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise.
* config/epiphany/mode-switch-use.cc (insert_uses): Likewise.
* config/i386/i386.cc (ix86_mode_needed): Likewise.
* config/riscv/riscv.cc (riscv_mode_needed): Likewise.
* config/sh/sh.cc (sh_mode_needed): Likewise.
* mode-switching.cc (optimize_mode_switching): Likewise.
(create_pre_exit): Likewise, using the DF simulate functions
to calculate the required information.
|
|
The mode-switching pass already had hooks to say what mode
an entity is in on entry to a function and what mode it must
be in on return. For SME, we also want to say what mode an
entity is guaranteed to be in on entry to an exception handler.
gcc/
* target.def (mode_switching.eh_handler): New hook.
* doc/tm.texi.in (TARGET_MODE_EH_HANDLER): New @hook.
* doc/tm.texi: Regenerate.
* mode-switching.cc (optimize_mode_switching): Use eh_handler
to get the mode on entry to an exception handler.
|
|
An entity isn't transparent in a block that requires a specific mode.
optimize_mode_switching took that into account for normal insns,
but didn't for the exit block. Later patches misbehaved because
of this.
In contrast, an entity was correctly marked as non-transparent
in the entry block, but the reasoning seemed a bit convoluted.
It also referred to a function that no longer exists.
Since KILL = ~TRANSP, the entity is by definition not transparent
in a block that defines the entity, so I think we can make it so
without comment.
Finally, the exit handling was nested in the entry handling,
but that doesn't seem necessary. A target could say that an
entity is undefined on entry but must be defined on return,
on a "be liberal in what you accept, be conservative in what
you do" principle.
gcc/
* mode-switching.cc (optimize_mode_switching): Mark the exit
block as nontransparent if it requires a specific mode.
Handle the entry and exit mode as sibling rather than nested
concepts. Remove outdated comment.
|
|
For a given block, an entity is either transparent for
all modes or for none. Each update to the transparency set
therefore used a loop like:
for (i = 0; i < no_mode; i++)
clear_mode_bit (transp[bb->index], j, i);
This patch instead starts out with a bit-per-block bitmap
and updates the main bitmap at the end.
This isn't much of a simplification on its own. The main
purpose is to simplify later patches.
gcc/
* mode-switching.cc (optimize_mode_switching): Initially
compute transparency in a bit-per-block bitmap.
|
|
optimize_mode_switching passes an entity's current mode (if known)
to the emit hook. However, the mode that it passed ignored the
effect of the after hook. Instead, the mode for the first emit
call in a block was taken from the incoming mode, whereas the
mode for each subsequent emit call was taken from the result
of the previous call.
The previous pass through the insns already calculated the
correct mode, so this patch records it in the seginfo structure.
(There was a 32-bit hole on 64-bit hosts, so this doesn't increase
the size of the structure for them.)
gcc/
* mode-switching.cc (seginfo): Add a prev_mode field.
(new_seginfo): Take and initialize the prev_mode.
(optimize_mode_switching): Update calls accordingly.
Use the recorded modes during the emit phase, rather than
computing one on the fly.
|
|
add_seginfo chained insn information to the end of a list
by starting at the head of the list. This patch avoids the
quadraticness by keeping track of the tail pointer.
gcc/
* mode-switching.cc (add_seginfo): Replace head pointer with
a pointer to the tail pointer.
(optimize_mode_switching): Update calls accordingly.
|
|
optimize_mode_switching uses REG_DEAD notes to track register
liveness, but it failed to tell DF to calculate up-to-date notes.
Noticed by inspection. I don't have a testcase that fails
because of this.
gcc/
* mode-switching.cc (optimize_mode_switching): Call
df_note_add_problem.
|
|
I found the documentation for the mode-switching macros/hooks
a bit hard to follow at first. This patch tries to add the
information that I think would have made it easier to understand.
Of course, documentation preferences are personal, and so I could
be changing something that others understood to something that
seems impenetrable.
Some notes on specific changes:
- "in an optimizing compilation" didn't seem accurate; the pass
is run even at -O0, and often needs to be for correctness.
- "at run time" meant when the compiler was run, rather than when
the compiled code was run.
- Removing the list of optional macros isn't a clarification,
but it means that upcoming patches don't create an absurdly
long list.
- I don't really understand the purpose of TARGET_MODE_PRIORITY,
so I mostly left that alone.
gcc/
* target.def: Tweak documentation of mode-switching hooks.
* doc/tm.texi.in (OPTIMIZE_MODE_SWITCHING): Tweak documentation.
(NUM_MODES_FOR_MODE_SWITCHING): Likewise.
* doc/tm.texi: Regenerate.
|
|
Parameters declared with `static` are nonnull. We synthesize
an artifical nonnull attribute for such parameters to get the
same warnings and optimizations.
Bootstrapped and regression tested on x86.
PR c/110815
PR c/112428
gcc/c-family:
* c-attribs.cc (build_attr_access_from_parms): Synthesize
nonnull attribute for parameters declared with `static`.
gcc:
* gimple-ssa-warn-access.cc (pass_waccess::maybe_check_access_sizes):
remove warning for parameters declared with `static`.
gcc/testsuite:
* gcc.dg/Wnonnull-8.c: Adapt test.
* gcc.dg/Wnonnull-9.c: New test.
|
|
gcc/testsuite/
* lib/scanasm.exp (scan-assembler-times): Disregard LTO sections.
(scan-assembler-dem, scan-assembler-dem-not): Likewise.
(dg-scan): Likewise, if name starts with scan-assembler.
(scan-raw-assembler): New proc.
* gcc.dg/pr61868.c: Use scan-raw-assembler.
* gcc.dg/scantest-lto.c: New test.
gcc/
* doc/sourcebuild.texi (Scan the assembly output): Document change.
|
|
As PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112469
which has been fixed by Richard patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635994.html
Add tests to avoid regression. Committed.
PR target/112469
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112469.c: New test.
|
|
This fixes
FAIL: g++.dg/cpp0x/lambda/lambda-decltype3.C -std=c++11 (test for excess errors)
due to
lambda-decltype3.C:25:6: error: lambda capture initializers only available with '-std=c++14' or '-std=gnu++14' [-Wc++14-extensions]
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-decltype3.C: Check __cpp_init_captures.
|
|
gcc/
PR middle-end/110983
* doc/invoke.texi (Option Summary): Add -fpatchable-function-entry.
|
|
The "length" attribute calculation expressions for branches and jumps
are incorrectly and misleadingly indented, and they overrun the 80
column limit as well, all of this causing troubles in following them.
Correct all these issues.
gcc/
* config/riscv/riscv.md (length): Fix indentation for branch and
jump length calculation expressions.
|
|
Adapt the old and unused code for type checking for C23.
gcc/c/:
* c-typeck.cc (struct comptypes_data): Add anon_field flag.
(comptypes, comptypes_check_unum_int,
comptypes_check_different_types): Remove old cache.
(tagged_tu_types_compatible_p): Rewrite.
|
|
targets
Testcases in g++.dg/vect rely on check_vect_support_and_set_flags
to set dg-do-what-default and avoid running vector tests on non-vector
targets. The testcase in this patch overwrites the default with
dg-do run.
Removing the dg-do run directive resolves this issue for non-vector
targets (while still running the tests on vector targets).
gcc/testsuite/ChangeLog:
* g++.dg/vect/pr102788.cc: Remove dg-do run directive.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
This teaches operand_compare to compare constant CONSTRUCTORs, which is
quite helpful for so-called fat pointers in Ada, i.e. objects that are
semantically pointers but are represented by structures made up of two
pointers. This is modeled on the implementation present in the ICF pass.
gcc/
* fold-const.cc (operand_compare::operand_equal_p) <CONSTRUCTOR>:
Deal with nonempty constant CONSTRUCTORs.
(operand_compare::hash_operand) <CONSTRUCTOR>: Hash DECL_FIELD_OFFSET
and DECL_FIELD_BIT_OFFSET for FIELD_DECLs.
gcc/testsuite/
* gnat.dg/opt103.ads, gnat.dg/opt103.adb: New test.
|
|
My previous RA patches to take register equivalence into account do
temporary register equivalence substitution to find out that the
equivalence can be consumed by insns. The insn with the substitution is
checked on validity using target-depended code. This code expects that
autoinc operations work on register but this register can be substituted
by equivalent memory. The patch fixes this problem. The patch also adds
checking that the substitution can be consumed in memory address too.
gcc/ChangeLog:
PR target/112337
* ira-costs.cc: (validate_autoinc_and_mem_addr_p): New function.
(equiv_can_be_consumed_p): Use it.
gcc/testsuite/ChangeLog:
PR target/112337
* gcc.target/arm/pr112337.c: New.
|
|
gcc/ada/
* expect.c (__gnat_waitpid): fix syntax errors
|
|
The capture_decltype handling in finish_decltype_type wasn't looking
through implicit INDIRECT_REF (added by convert_from_reference), which
caused us to incorrectly resolve decltype((r)) to float& below. This
patch fixes this, and adds an assert to outer_automatic_var_p to help
prevent against such bugs.
We still don't fully accept the example ultimately because for the
decltype inside the lambda's trailing return type, at that point we're
in lambda type scope but not yet in lambda function scope that the
capture_decltype handling looks for (which is an orthogonal bug).
PR c++/79620
gcc/cp/ChangeLog:
* cp-tree.h (STRIP_REFERENCE_REF): Define.
* semantics.cc (outer_var_p): Assert REFERENCE_REF_P is false.
(finish_decltype_type): Look through implicit INDIRECT_REF when
deciding whether to call capture_decltype.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-decltype3.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
We typically don't see capture proxies in finish_decltype_type because
process_outer_var_ref is a no-op within an unevaluated context and so a
use of a captured variable within decltype resolves to the captured
variable, not the capture. But we can see them during decltype(auto)
deduction and for decltype of an init-capture, which suggests we need to
handle capture proxies specially within finish_decltype_type after all.
This patch adds such handling.
PR c++/79378
PR c++/96917
gcc/cp/ChangeLog:
* semantics.cc (finish_decltype_type): Handle an id-expression
naming a capture proxy specially.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/decltype-auto7.C: New test.
* g++.dg/cpp1y/lambda-init20.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
This patch allows an .md iterator to include the contents of
previous iterators, possibly with an extra condition attached.
Too much indirection might become hard to follow, so for the
AArch64 changes I tried to stick to things that seemed likely
to be uncontroversial:
(a) structure iterators that combine modes for different sizes
and vector counts
(b) iterators that explicitly duplicate another iterator
(for iterating over the cross product)
gcc/
* read-rtl.cc (md_reader::read_mapping): Allow iterators to
include other iterators.
* doc/md.texi: Document the change.
* config/aarch64/iterators.md (DREG2, VQ2, TX2, DX2, SX2): Include
the iterator that is being duplicated, rather than reproducing it.
(VSTRUCT_D): Redefine using VSTRUCT_[234]D.
(VSTRUCT_Q): Likewise VSTRUCT_[234]Q.
(VSTRUCT_2QD, VSTRUCT_3QD, VSTRUCT_4QD, VSTRUCT_QD): Redefine using
the individual D and Q iterators.
|
|
Use unrelated register initializations using zero/sign-extend instructions
to clear stack protector scratch register.
Hanlde only SI -> DImode extensions for 64-bit targets, as this is the
only extension that triggers the peephole in a non-negligible number.
Also use explicit check for word_mode instead of mode iterator in peephole2
patterns to avoid pattern explosion.
gcc/ChangeLog:
* config/i386/i386.md (stack_protect_set_1 peephole2):
Explicitly check operand 2 for word_mode.
(stack_protect_set_1 peephole2 #2): Ditto.
(stack_protect_set_2 peephole2): Ditto.
(stack_protect_set_3 peephole2): Ditto.
(*stack_protect_set_4z_<mode>_di): New insn patter.
(*stack_protect_set_4s_<mode>_di): Ditto.
(stack_protect_set_4 peephole2): New peephole2 pattern to
substitute stack protector scratch register clear with unrelated
register initialization involving zero/sign-extend instruction.
|
|
gcc/ChangeLog:
* config/i386/i386.md (shift): Use SAL insted of SLL
for ashift insn mnemonic.
|
|
PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438
1. Since SELECT_VL result is not necessary always VF in non-final iteration.
Current GIMPLE IR is wrong:
...
_35 = .SELECT_VL (ivtmp_33, VF);
_21 = vect_vec_iv_.8_22 + { VF, ... };
E.g. Consider the total iterations N = 6, the VF = 4.
Since SELECT_VL output is defined as not always to be VF in non-final iteration
which needs to depend on hardware implementation.
Suppose we have a RVV CPU core with vsetvl doing even distribution workload optimization.
It may process 3 elements at the 1st iteration and 3 elements at the last iteration.
Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... };
is wrong which is adding VF, which is 4, actually, we didn't process 4 elements.
It should be adding 3 elements which is the result of SELECT_VL.
So, here the correct IR should be:
_36 = .SELECT_VL (ivtmp_34, VF);
_22 = (int) _36;
vect_cst__21 = [vec_duplicate_expr] _22;
2. This issue only happens on non-SLP vectorization single rgroup since:
if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
{
tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
OPTIMIZE_FOR_SPEED)
&& LOOP_VINFO_LENS (loop_vinfo).length () == 1
&& LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
&& (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
|| !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
}
3. This issue doesn't appears on nested loop no matter LOOP_VINFO_USING_SELECT_VL_P is true or false.
Since:
# vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)>
# vect_diff_15.7_20 = PHI <vect_diff_9.8_22(3), vect_diff_18.5_11(5)>
_19 = vect_vec_iv_.6_5 + { 1, ... };
vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5, vect_diff_15.7_20, vect_diff_15.7_20, _28, 0);
ivtmp_1 = ivtmp_4 + 4294967295;
....
<bb 5> [local count: 6549826]:
# vect_diff_18.5_11 = PHI <vect_diff_9.8_22(4), { 0, ... }(2)>
# ivtmp_26 = PHI <ivtmp_27(4), 40(2)>
_28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]);
goto <bb 3>; [100.00%]
Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; update induction variable
independent on VF (or don't care about how many elements are processed in the iteration).
The update is loop invariant. So it won't be the problem even if LOOP_VINFO_USING_SELECT_VL_P is true.
Testing passed, Ok for trunk ?
PR tree-optimization/112438
gcc/ChangeLog:
* tree-vect-loop.cc (vectorizable_induction): Bugfix when
LOOP_VINFO_USING_SELECT_VL_P.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112438.c: New test.
|
|
This patch is a small optimization for vector initialization.
Discovered when I am evaluating benchmarks.
Consider this following case:
void foo3 (int8_t *out, int8_t x, int8_t y)
{
v16qi v = {y, y, y, y, y, y, y, x, x, x, x, x, x, x, x, x};
*(v16qi*)out = v;
}
Before this patch:
vsetivli zero,16,e8,m1,ta,ma
vmv.v.x v1,a2
vslide1down.vx v1,v1,a1
vslide1down.vx v1,v1,a1
vslide1down.vx v1,v1,a1
vslide1down.vx v1,v1,a1
vslide1down.vx v1,v1,a1
vslide1down.vx v1,v1,a1
vslide1down.vx v1,v1,a1
vslide1down.vx v1,v1,a1
vslide1down.vx v1,v1,a1
vse8.v v1,0(a0)
ret
After this patch:
vsetivli zero,16,e8,m1,ta,ma
vmv.v.x v1,a1
vmv.v.x v2,a2
vslideup.vi v1,v2,8
vse8.v v1,0(a0)
ret
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum insn_type): New enum.
* config/riscv/riscv-v.cc
(rvv_builder::combine_sequence_use_slideup_profitable_p): New function.
(expand_vector_init_slideup_combine_sequence): Ditto.
(expand_vec_init): Add slideup combine optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: Add combine test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/combine-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/combine-7.c: New test.
|
|
This patch fixes several more FAILs that would only show in 32-bit runs.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: Adjust.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-3.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/slp-mask-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c:
Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c:
Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c:
Ditto.
|
|
It appears as if we "look through" a statement pattern in
vect_finish_replace_stmt but not before when we replace the newly
created vector statement's lhs. Then the lhs is the statement pattern's
lhs while in vect_finish_replace_stmt we assert that it's from the
statement the pattern replaced.
This patch uses vect_orig_stmt on the scalar destination's definition so
the replaced statement is used everywhere.
gcc/ChangeLog:
PR tree-optimization/112464
* tree-vect-loop.cc (vectorize_fold_left_reduction): Use
vect_orig_stmt on scalar_dest_def_info.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112464.c: New test.
|
|
The t0 register is used as a temporary register for interrupts, so it needs
special treatment. It is necessary to avoid using "th.ldd" in the interrupt
program to stop the subsequent operation of the t0 register, so they need to
exchange positions in the function "riscv_for_each_saved_reg".
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the interrupt
operation before the XTheadMemPair.
|
|
The following fixes the issue that when SLP stmts are internal defs
but appear invariant because they end up only using invariant defs
then they get scheduled outside of the loop. This nice optimization
breaks down when loop masks or lens are applied since those are not
explicitly tracked as dependences. The following makes sure to never
schedule internal defs outside of the vectorized loop when the
loop uses masks/lens.
PR tree-optimization/110221
* tree-vect-slp.cc (vect_schedule_slp_node): When loop
masking / len is applied make sure to not schedule
intenal defs outside of the loop.
* gfortran.dg/pr110221.f: New testcase.
|
|
AVX ignores any excess bits in the mask (at least for vector sizes >=8), but
AMD GCN magically uses a larger vector than was intended (the smaller sizes are
"fake"), leading to wrong-code.
This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c,
gfortran.dg/c-interop/contiguous-1.f90,
gfortran.dg/c-interop/ff-descriptor-7.f90, and others.
gcc/ChangeLog:
* expr.cc (store_constructor): Add "and" operation to uniform mask
generation.
|
|
The instruction doesn't allow "B" constants for the vop3b encoding (used when
the cc register isn't VCC), so fix the pattern and all the insns that might get
split to it post-reload.
Also switch to the new constraint format for ease of adding new alternatives.
gcc/ChangeLog:
PR target/112308
* config/gcn/gcn-valu.md (add<mode>3<exec_clobber>): Fix B constraint
and switch to the new format.
(add<mode>3_dup<exec_clobber>): Likewise.
(add<mode>3_vcc<exec_vcc>): Likewise.
(add<mode>3_vcc_dup<exec_vcc>): Likewise.
(add<mode>3_vcc_zext_dup): Likewise.
(add<mode>3_vcc_zext_dup_exec): Likewise.
(add<mode>3_vcc_zext_dup2): Likewise.
(add<mode>3_vcc_zext_dup2_exec): Likewise.
|
|
The following avoids type inconsistencies in .COND_op generated by
simplifications of VEC_COND_EXPRs.
PR middle-end/112469
* match.pd (cond ? op a : b -> .COND_op (cond, a, b)): Add
missing view_converts.
* gcc.dg/torture/pr112469.c: New testcase.
|
|
The DImode min/max instructions need a clobber that SImode does not, so
add the special case to the reduction expand code.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_expand_reduc_scalar): Add clobber to DImode
min/max instructions.
|
|
gcc/ChangeLog:
* config/loongarch/lsx.md: Fix instruction name typo in
lsx_vreplgr2vr_<lsxfmt_f> template.
|
|
Although current GCC didn't cause ICE when I create FP16 vec_init case
with -march=rv64gcv (no ZVFH), current vec_init pattern looks wrong.
Since V_VLS FP16 predicate is TARGET_VECTOR_ELEN_FP_16, wheras vec_init
needs vfslide1down/vfslide1up.
It makes more sense to robustify the vec_init patterns which split them
into 2 patterns (one is integer, the other is float) like other autovectorization patterns.
gcc/ChangeLog:
* config/riscv/autovec.md (vec_init<mode><vel>): Split patterns.
|
|
This reverts commit e7f4040d9d6ec40c48ada940168885d7dde03af9 as
introduces some legacy vmv insns.
|
|
This patch would like to support the vec_init for the trailing same
element in the array. For example as below
typedef double vnx16df __attribute__ ((vector_size (128)));
__attribute__ ((noipa)) void
f_vnx16df (double a, double b, double *out)
{
vnx16df v = {a, a, a, b, b, b, b, b, b, b, b, b, b, b, b, b};
*(vnx16df *) out = v;
}
Before this patch:
f_vnx16df:
vsetivli zero,16,e64,m8,ta,ma
vfmv.v.f v8,fa0
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vfslide1down.vf v8,v8,fa1
vs8r.v v8,0(a0)
ret
After this patch:
f_vnx16df:
vsetivli zero,16,e64,m8,ta,ma
vfmv.v.f v16,fa1
vfslide1up.vf v8,v16,fa0
vmv8r.v v16,v8
vfslide1up.vf v8,v16,fa0
vmv8r.v v16,v8
vfslide1up.vf v8,v16,fa0
vs8r.v v8,0(a0)
ret
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vector_init_trailing_same_elem):
New fun impl to expand the insn when trailing same elements.
(expand_vec_init): Try trailing same elements when vec_init.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/init-same-tail-9.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr97428.c: Add additional compile option for riscv.
|
|
Since cond_copysign has been support into match.pd (middle-end).
We don't need to support conditional copysign by RTL combine pass.
Instead, we can support it by direct explicit cond_copysign optab.
conditional copysign tests are already available in the testsuite.
No need to add tests.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*cond_copysign<mode>): Remove.
* config/riscv/autovec.md (cond_copysign<mode>): New pattern.
|
|
The defined DEF_EXT_LIB_FLOATN_NX_BUILTINS functions should also
have DEF_INTERNAL_FLT_FLOATN_FN instead of DEF_INTERNAL_FLT_FN for
the FLOATN support. According to the glibc API and gcc builtin, we
have below table for the FLOATN is supported or not.
+---------+-------+-------------------------------------+
| | glibc | gcc: DEF_EXT_LIB_FLOATN_NX_BUILTINS |
+---------+-------+-------------------------------------+
| iceil | N | N |
| ifloor | N | N |
| irint | N | N |
| iround | N | N |
| lceil | N | N |
| lfloor | N | N |
| lrint | Y | Y |
| lround | Y | Y |
| llceil | N | N |
| llfllor | N | N |
| llrint | Y | Y |
| llround | Y | Y |
+---------+-------+-------------------------------------+
This patch would like to support FLOATN for:
1. lrint
2. lround
3. llrint
4. llround
The below tests are passed within this patch:
1. x86 bootstrap and regression test.
2. aarch64 regression test.
3. riscv regression tests.
PR target/112432
gcc/ChangeLog:
* internal-fn.def (LRINT): Add FLOATN support.
(LROUND): Ditto.
(LLRINT): Ditto.
(LLROUND): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
When zero extracting a single bit bitfield from bits 16..31 on the H8 we
currently generate some pretty bad code.
The fundamental issue is we can't shift efficiently and there's no trivial way
to extract a single bit out of the high half word of an SImode value.
What usually happens is we use a synthesized right shift to get the single bit
into the desired position, then a bit-and to mask off everything we don't care
about.
The shifts are expensive, even using tricks like half and quarter word moves to
implement shift-by-16 and shift-by-8. Additionally a logical right shift must
clear out the upper bits which is redundant since we're going to mask things
with &1 later.
This patch provides a consistently better sequence for such extractions. The
general form moves the high half into the low half, a bit extraction into C,
clear the destination, then move C into the destination with a few special
cases.
This also avoids all the shenanigans for H8/SX which has a much more capable
shifter. It's not single cycle, but it is reasonably efficient.
This has been regression tested on the H8 without issues. Pushing to the trunk
momentarily.
jeff
ps. Yes, supporting zero extraction of multi-bit fields might be improvable as
well. But I've already spent more time on this than I can reasonably justify.
gcc/
* config/h8300/combiner.md (single bit sign_extract): Avoid recently
added patterns for H8/SX.
(single bit zero_extract): New patterns.
|
|
gcc/ChangeLog:
PR target/112443
* config/i386/sse.md (*avx2_pcmp<mode>3_4): Fix swap condition
from LT to GT since there's not in the pattern.
(*avx2_pcmp<mode>3_5): Ditto.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr112443.C: New test.
|
|
This patch fixes the pseudo-c BPF assembly syntax used for
*mulsidi3_zeroextend, which was being emitted as:
rN *= wM
instead of the proper way to denote a mul32 in pseudo-C syntax:
wN *= wM
Includes test.
Tested in bpf-unknown-none-gcc target in x86_64-linux-gnu host.
gcc/ChangeLog:
* config/bpf/bpf.cc (bpf_print_register): Accept modifier code 'W'
to force emitting register names using the wN form.
* config/bpf/bpf.md (*mulsidi3_zeroextend): Force operands to
always use wN written form in pseudo-C assembly syntax.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/mulsidi3-zeroextend-pseudoc.c: New test.
|
|
gcc/testsuite/ChangeLog:
* gcc.target/bpf/ldxdw.c: Fix regexp with expected result.
|
|
Reduce implicit usage of line_table global, and move source printing to
within diagnostic_context.
gcc/ChangeLog:
* diagnostic-show-locus.cc (layout::m_line_table): New field.
(compatible_locations_p): Convert to...
(layout::compatible_locations_p): ...this, replacing uses of
line_table global with m_line_table.
(layout::layout): Convert "richloc" param from a pointer to a
const reference. Initialize m_line_table member.
(layout::maybe_add_location_range): Replace uses of line_table
global with m_line_table. Pass the latter to
linemap_client_expand_location_to_spelling_point.
(layout::print_leading_fixits): Pass m_line_table to
affects_line_p.
(layout::print_trailing_fixits): Likewise.
(gcc_rich_location::add_location_if_nearby): Update for change
to layout ctor params.
(diagnostic_show_locus): Convert to...
(diagnostic_context::maybe_show_locus): ...this, converting
richloc param from a pointer to a const reference. Make "loc"
const. Split out printing part of function to...
(diagnostic_context::show_locus): ...this.
(selftest::test_offset_impl): Update for change to layout ctor
params.
(selftest::test_layout_x_offset_display_utf8): Likewise.
(selftest::test_layout_x_offset_display_tab): Likewise.
(selftest::test_tab_expansion): Likewise.
* diagnostic.h (diagnostic_context::maybe_show_locus): New decl.
(diagnostic_context::show_locus): New decl.
(diagnostic_show_locus): Convert from a decl to an inline function.
* gdbinit.in (break-on-diagnostic): Update from a breakpoint
on diagnostic_show_locus to one on
diagnostic_context::maybe_show_locus.
* genmatch.cc (linemap_client_expand_location_to_spelling_point):
Add "set" param and use it in place of line_table global.
* input.cc (expand_location_1): Likewise.
(expand_location): Update for new param of expand_location_1.
(expand_location_to_spelling_point): Likewise.
(linemap_client_expand_location_to_spelling_point): Add "set"
param and use it in place of line_table global.
* tree-diagnostic-path.cc (event_range::print): Pass line_table
for new param of linemap_client_expand_location_to_spelling_point.
libcpp/ChangeLog:
* include/line-map.h (rich_location::get_expanded_location): Make
const.
(rich_location::get_line_table): New accessor.
(rich_location::m_line_table): Make the pointer be const.
(rich_location::m_have_expanded_location): Make mutable.
(rich_location::m_expanded_location): Likewise.
(fixit_hint::affects_line_p): Add const line_maps * param.
(linemap_client_expand_location_to_spelling_point): Likewise.
* line-map.cc (rich_location::get_expanded_location): Make const.
Pass m_line_table to
linemap_client_expand_location_to_spelling_point.
(rich_location::maybe_add_fixit): Likewise.
(fixit_hint::affects_line_p): Add set param and pass to
linemap_client_expand_location_to_spelling_point.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
gcc/jit/ChangeLog:
* libgccjit++.h:
|