Age | Commit message (Collapse) | Author | Files | Lines |
|
generic_vector_cost is not currently used by any SVE target
by default; it has to be specifically selected by -mtune=generic.
Its SVE costing has historically been somewhat idealised, since
it predated any actual SVE cores. This seems like a useful
tradition to continue, at least for testing purposes.
The ideal case is that gathers and scatters do not induce a specific
one-off overhead. This patch therefore sets the gather/scatter init
costs to zero.
This patch is necessary to switch -mtune=generic over to the
"new" vector costs.
gcc/
* config/aarch64/tuning_models/generic.h (generic_sve_vector_cost):
Set gather_load_x32_init_cost and gather_load_x64_init_cost to 0.
|
|
The SVE gather and scatter costs are classified based on whether
they do 4 loads per 128 bits (x32) or 2 loads per 128 bits (x64).
The number after the "x" refers to the number of bits in each
"container".
However, the test for which to use was based on the element size
rather than the container size. This meant that we'd use the
overly conservative x32 costs for VNx2SI gathers. VNx2SI gathers
are really .D gathers in which the upper half of each extension
result is ignored.
This patch is necessary to switch -mtune=generic over to the
"new" vector costs.
gcc/
* config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype)
(aarch64_vector_costs::add_stmt_cost): Use the x64 cost rather
than x32 cost for all VNx2 modes.
|
|
g:8d6c6fbc5271dde433998c09407b30e2cf195420 improved the code
generated for functions like:
void test_s8 (int8x8x2_t *ptr) { *ptr = (int8x8x2_t) {}; }
Previously we would load zero from the constant pool, whereas
now we just use "stp xzr, xzr". This patch adds a test for
this improvement.
gcc/testsuite/
* gcc.target/aarch64/struct_zero.c: New test.
|
|
The documentation of ASM_INPUT_P implied that the flag has no
effect on ASM_EXPRs that have operands (and which therefore must be
extended asms). In fact we require ASM_INPUT_P to be false for all
extended asms.
gcc/
* tree.h (ASM_INPUT_P): Fix documentation.
|
|
transform [PR116355]
The gen_pow2p function generates (a & -a) == a as a fallback for
POPCOUNT (a) == 1. Not only is the bitmagic not equivalent to
POPCOUNT (a) == 1 but it also introduces UB (consider signed
a = INT_MIN).
This patch rewrites gen_pow2p to always use __builtin_popcount instead.
This means that what the end result GIMPLE code is gets decided by an
already existing machinery in a later pass. That is a cleaner solution
I think. This existing machinery also uses a ^ (a - 1) > a - 1 which is
the correct bitmagic.
While rewriting gen_pow2p I had to add logic for converting the
operand's type to a type that __builtin_popcount accepts. I naturally
also added this logic to gen_log2. Thanks to this, exponential index
transform gains the capability to handle all operand types with
precision at most that of long long int.
gcc/ChangeLog:
PR tree-optimization/116355
* tree-switch-conversion.cc (can_log2): Add capability to
suggest converting the operand to a different type.
(gen_log2): Add capability to generate a conversion in case the
operand is of a type incompatible with the logarithm operation.
(can_pow2p): New function.
(gen_pow2p): Rewrite to use __builtin_popcount instead of
manually inserting an internal fn call or bitmagic. Also add
capability to generate a conversion.
(switch_conversion::is_exp_index_transform_viable): Call
can_pow2p. Store types suggested by can_log2 and gen_log2.
(switch_conversion::exp_index_transform): Params of gen_pow2p
and gen_log2 changed so update their calls.
* tree-switch-conversion.h: Add m_exp_index_transform_log2_type
and m_exp_index_transform_pow2p_type to switch_conversion class
to track type conversions needed to generate the "is power of 2"
and logarithm operations.
gcc/testsuite/ChangeLog:
PR tree-optimization/116355
* gcc.target/i386/switch-exp-transform-1.c: Don't test for
presence of POPCOUNT internal fn after switch conversion. Test
for it after __builtin_popcount has had a chance to get
expanded.
* gcc.target/i386/switch-exp-transform-3.c: Also test char and
short.
Signed-off-by: Filip Kastl <fkastl@suse.cz>
|
|
This extends the scan-ltrans-tree* helpers to create RTL variants. This
is needed to check the behaviour of an RTL pass under LTO.
gcc/ChangeLog:
PR libstdc++/116140
* doc/sourcebuild.texi: Document ltrans-rtl value of kind for
scan-<kind>-dump*.
gcc/testsuite/ChangeLog:
PR libstdc++/116140
* lib/scanltranstree.exp (scan-ltrans-rtl-dump): New.
(scan-ltrans-rtl-dump-not): New.
(scan-ltrans-rtl-dump-dem): New.
(scan-ltrans-rtl-dump-dem-not): New.
(scan-ltrans-rtl-dump-times): New.
|
|
I found it helpful to be able to print a whole SLP instance from gdb.
* tree-vect-slp.cc (debug): Add overload for slp_instance.
|
|
The following fixes a leak of the discovered single-lane store
SLP nodes from which we only use their children. This uncovers
a latent reference counting issue in the interleaving build where
we fail to increment their reference count.
* tree-vect-slp.cc (vect_build_slp_store_interleaving):
Fix reference counting.
(vect_build_slp_instance): Release rhs_nodes.
|
|
This splits out SLP store interleaving into a separate function.
* tree-vect-slp.cc (vect_build_slp_store_interleaving): Split
out from ...
(vect_build_slp_instance): Here.
|
|
The pedwarns for each of these features should be silenced by
the appropriate -Wno-c++??-extensions.
The handle_pragma_diagnostic_impl change is necessary so that we handle
-Wc++23-extensions early so it's available to interpret_float while lexing.
gcc/c-family/ChangeLog:
* c-pragma.cc (handle_pragma_diagnostic_impl): Also handle
-Wc++23-extensions early.
* c-lex.cc (interpret_float): Use -Wc++23-extensions for extended
floating point literal pedwarn.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_simple_type_specifier): Use
-Wc++20-extensions for auto parameter pedwarn.
* pt.cc (do_decl_instantiation, do_type_instantiation): Use
-Wc++11-extensions for 'extern template'.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/extern_template-7.C: New test.
* g++.dg/cpp23/ext-floating19.C: New test.
* g++.dg/cpp2a/abbrev-fn1.C: New test.
|
|
Move the run test of pr116278 to dg/torture and leave the risc-v the
asm check under risc-v part.
PR target/116278
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr116278-run-1.c: Take compile instead of run.
* gcc.target/riscv/pr116278-run-2.c: Ditto.
* gcc.dg/torture/pr116278-run-1.c: New test.
* gcc.dg/torture/pr116278-run-2.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
The .SAT_ADD has 2 operand, when one of the operand may be INTEGER_CST.
For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
Form 3:
#define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \
T __attribute__((noinline)) \
vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
{ \
unsigned i; \
T ret; \
for (i = 0; i < limit; i++) \
{ \
out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
} \
}
DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
It will fail to vectorize as the vectorizable_call will check the
operands is type_compatiable but the imm will be (const_int 9) with
the SImode, which is different from _2 (DImode). Aka:
uint64_t _1;
uint64_t _2;
_1 = .SAT_ADD (_2, 9);
This patch would like to reconcile the imm operand to the operand type
mode of _2 by fold_convert to make the vectorizable_call happy.
The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
gcc/ChangeLog:
* tree-vect-patterns.cc (vect_recog_sat_add_pattern): Add fold
convert for const_int to the type of operand 0.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-9.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
We add pattern for vector rotate, but seems like we forgot adding
mode_idx which used in AVL propgation (riscv-avlprop.cc).
gcc/ChangeLog:
* config/riscv/vector.md (mode_idx): Add vrol and vror.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/rotr.c: New.
|
|
This patch would like to support the form 1 of the scalar signed
integer .SAT_ADD. Aka below example:
Form 1:
#define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
T __attribute__((noinline)) \
sat_s_add_##T##_fmt_1 (T x, T y) \
{ \
T sum = (UT)x + (UT)y; \
return (x ^ y) < 0 \
? sum \
: (sum ^ x) >= 0 \
? sum \
: x < 0 ? MIN : MAX; \
}
DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
We can tell the difference before and after this patch if backend
implemented the ssadd<m>3 pattern similar as below.
Before this patch:
4 │ __attribute__((noinline))
5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
6 │ {
7 │ int64_t sum;
8 │ long unsigned int x.0_1;
9 │ long unsigned int y.1_2;
10 │ long unsigned int _3;
11 │ long int _4;
12 │ long int _5;
13 │ int64_t _6;
14 │ _Bool _11;
15 │ long int _12;
16 │ long int _13;
17 │ long int _14;
18 │ long int _16;
19 │ long int _17;
20 │
21 │ ;; basic block 2, loop depth 0
22 │ ;; pred: ENTRY
23 │ x.0_1 = (long unsigned int) x_7(D);
24 │ y.1_2 = (long unsigned int) y_8(D);
25 │ _3 = x.0_1 + y.1_2;
26 │ sum_9 = (int64_t) _3;
27 │ _4 = x_7(D) ^ y_8(D);
28 │ _5 = x_7(D) ^ sum_9;
29 │ _17 = ~_4;
30 │ _16 = _5 & _17;
31 │ if (_16 < 0)
32 │ goto <bb 3>; [41.00%]
33 │ else
34 │ goto <bb 4>; [59.00%]
35 │ ;; succ: 3
36 │ ;; 4
37 │
38 │ ;; basic block 3, loop depth 0
39 │ ;; pred: 2
40 │ _11 = x_7(D) < 0;
41 │ _12 = (long int) _11;
42 │ _13 = -_12;
43 │ _14 = _13 ^ 9223372036854775807;
44 │ ;; succ: 4
45 │
46 │ ;; basic block 4, loop depth 0
47 │ ;; pred: 2
48 │ ;; 3
49 │ # _6 = PHI <sum_9(2), _14(3)>
50 │ return _6;
51 │ ;; succ: EXIT
52 │
53 │ }
After this patch:
4 │ __attribute__((noinline))
5 │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
6 │ {
7 │ int64_t _4;
8 │
9 │ ;; basic block 2, loop depth 0
10 │ ;; pred: ENTRY
11 │ _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
12 │ return _4;
13 │ ;; succ: EXIT
14 │
15 │ }
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Add the matching for signed .SAT_ADD.
* tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
matching func decl.
(match_unsigned_saturation_add): Try signed .SAT_ADD and rename
to ...
(match_saturation_add): ... here.
(math_opts_dom_walker::after_dom_children): Update the above renamed
func from caller.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
gcc/testsuite:
PR testsuite/116271
* gcc.dg/vect/tsvc/vect-tsvc-s176.c [TRUNCATE_TEST]: Make sure
that m stays the same as the loop bound of the middle loop.
* gcc.dg/vect/tsvc/tsvc.h (get_expected_result) <s176> [TRUNCATE_TEST]:
Adjust expected value.
|
|
This patch would like to add test cases for the unsigned scalar
.SAT_SUB IMM form 4. Aka:
Form 4:
#define DEF_SAT_U_SUB_IMM_FMT_4(T, IMM) \
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_4 (T x) \
{ \
return x > (T)IMM ? x - (T)IMM : 0; \
}
DEF_SAT_U_SUB_IMM_FMT_4(uint64_t, 23)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-13.c: New test.
* gcc.target/riscv/sat_u_sub_imm-13_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-13_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-14.c: New test.
* gcc.target/riscv/sat_u_sub_imm-14_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-14_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-15.c: New test.
* gcc.target/riscv/sat_u_sub_imm-15_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-15_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-16.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-13.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-14.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-15.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-16.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to add test cases for the unsigned scalar
.SAT_SUB IMM form 3. Aka:
Form 3:
#define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_3 (T y) \
{ \
return (T)IMM > y ? (T)IMM - y : 0; \
}
DEF_SAT_U_SUB_IMM_FMT_3(uint64_t, 23)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-10.c: New test.
* gcc.target/riscv/sat_u_sub_imm-10_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-10_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-12.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-10.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-11.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-12.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-9.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
|
|
SPARC does not support vectorizing conditions, which this test relies
on. Use vect_condition as effective target.
Committed as obvious.
PR testsuite/116500
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-switch-ifcvt-1.c: Use vect_condition to
check if vectorizing conditions is supported for target.
|
|
* zh_CN.po: Update.
|
|
Previously, we were building and inserting case_labels manually, which
led to them not being added into the currently running switch via
c_add_case_label. This led to false diagnostics that the user could not
act on.
PR c++/109867
gcc/cp/ChangeLog:
* coroutines.cc (expand_one_await_expression): Replace uses of
build_case_label with finish_case_label.
(build_actor_fn): Ditto.
(create_anon_label_with_ctx): Remove now-unused function.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/torture/pr109867.C: New test.
Reviewed-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
When LRA pulls an address operand out of a MEM it caninoicalizes a
containing MULT into ASHIFT. Adjust the address decomposer to recognize
this form.
PR target/116413
* config/m68k/m68k.cc (m68k_decompose_index): Accept ASHIFT like
MULT.
(m68k_rtx_costs) [PLUS]: Likewise.
(m68k_legitimize_address): Likewise.
|
|
We mention 'X::__ct' instead of 'X::X' in the "names the constructor,
not the type" error for this invalid code:
=== cut here ===
struct X {};
void g () {
X::X x;
}
=== cut here ===
The problem is that we use %<%T::%D%> to build the error message, while
%qE does exactly what we need since we have DECL_CONSTRUCTOR_P. This is
what this patch does.
It also skips until the end of the statement and returns error_mark_node
for this and the preceding if block, to avoid emitting extra (useless)
errors.
PR c++/105483
gcc/cp/ChangeLog:
* parser.cc (cp_parser_expression_statement): Use %qE instead of
incorrect %<%T::%D%>. Skip to end of statement and return
error_mark_node in case of error.
gcc/testsuite/ChangeLog:
* g++.dg/parse/error36.C: Adjust test expectation.
* g++.dg/tc1/dr147.C: Likewise.
* g++.old-deja/g++.other/typename1.C: Likewise.
* g++.dg/diagnostic/pr105483.C: New test.
|
|
These subroutines will be used in expand_const_vector in a future patch.
Relocate so expand_const_vector can use them.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vector_init_insert_elems): Relocate.
(expand_vector_init_trailing_same_elem): Ditto.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
Currently we assert when encountering a non-duplicate boolean vector.
This patch allows non-duplicate vectors to fall through to the
gcc_unreachable and assert there.
This will be useful when adding a catch-all pattern to emit costs and
handle arbitary vectors.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Allow non-duplicate
to fall through other patterns before asserting.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
The comment previously here stated that the Wc0/Wc1 cases are handled by
the vi constraint but that is not true for the 0.0 Wc0 case.
gcc/ChangeLog:
* config/riscv/riscv-v.h (valid_vec_immediate_p): Add new helper.
* config/riscv/riscv-v.cc (valid_vec_immediate_p): Ditto.
(expand_const_vector): Use new helper.
* config/riscv/riscv.cc (riscv_const_insns): Handle 0.0 floating-point
case.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
These cases are handled in the expander
(riscv-v.cc:expand_const_vector). We need the vector builder to detect
these cases so extract that out into a new riscv-v.h header file.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (class rvv_builder): Move to riscv-v.h.
* config/riscv/riscv.cc (riscv_const_insns): Emit placeholder costs for
bool/stepped const vectors.
* config/riscv/riscv-v.h: New file.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
register
This manifests in RTL that is optimized away which causes runtime failures
in the testsuite. Update all patterns to use a temp result register if required.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Use tmp register if
needed.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
order
The corresponding expander (riscv-v.cc:expand_const_vector) matches
const_vec_duplicate_p before const_vec_series_p. Reorder to match this
behavior when calculating costs.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_const_insns): Relocate.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
Prior to this patch the expander would emit vectors like:
{ 0, 0, 5, 5, 10, 10, ...}
as:
{ 0, 0, 2, 2, 4, 4, ...}
This patch sets the step size to the requested value.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Fix STEP size in
expander.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
With MVE, vmov.f64 is always supported (no need for +fp.dp extension).
This patch updates two patterns:
- in movdi_vfp, we incorrectly checked
TARGET_VFP_SINGLE || TARGET_HAVE_MVE instead of
TARGET_VFP_SINGLE && !TARGET_HAVE_MVE, and didn't take into account
these two possibilities when computing the length attribute.
- in thumb2_movdf_vfp, we checked only TARGET_VFP_SINGLE.
No need to update movdf_vfp, since it is enabled only for TARGET_ARM
(which is not the case when MVE is enabled).
The patch also updates gcc.target/arm/armv8_1m-fp64-move-1.c, to
accept only vmov.f64 instead of vmov.f32.
Tested on arm-none-eabi with:
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve.fp
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve.fp+fp.dp
2024-08-21 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/vfp.md (movdi_vfp, thumb2_movdf_vfp): Handle MVE
case.
gcc/testsuite/
* gcc.target/arm/armv8_1m-fp64-move-1.c: Update expected code.
|
|
* gcc.target/i386/pr116174.c: Add the missing */.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
As PR target/116174 shown, we may need to verify labels and the directive
order. Extend check-function-bodies to support matched output lines to
allow label and directives.
gcc/
* doc/sourcebuild.texi (check-function-bodies): Add an optional
argument for matched output lines.
gcc/testsuite/
* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (parse_function_bodies): Append the line if
$up_config(matched) matches the line.
(check-function-bodies): Add an argument for matched. Set
up_config(matched) to $matched. Append the expected line without
$config(line_prefix) to function_regexp if it starts with ".L".
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
This is part of making m68k work with LRA. See PR116429.
In short: setup_sp_offset is internally inconsistent. It wants to
setup the sp_offset for newly generated instructions. sp_offset for
an instruction is always the state of the sp-offset right before that
instruction. For that it starts at the (assumed correct) sp_offset
of the instruction right after the given (new) sequence, and then
iterates that sequence forward simulating its effects on sp_offset.
That can't ever be right: either it needs to start at the front
and simulate forward, or start at the end and simulate backward.
The former seems to be the more natural way. Funnily the local
variable holding that instruction is also called 'before'.
This changes it to the first variant: start before the sequence,
do one simulation step to get the sp-offset state in front of the
sequence and then continue simulating.
More details: in the problematic testcase we start with this
situation (sp_off before 550 is 0):
550: [--sp] = 0 sp_off = 0 {pushexthisi_const}
551: [--sp] = 37 sp_off = -4 {pushexthisi_const}
552: [--sp] = r37 sp_off = -8 {movsi_m68k2}
554: [--sp] = r116 - r37 sp_off = -12 {subsi3}
556: call sp_off = -16
insn 554 doesn't match its constraints and needs some reloads:
Creating newreg=262, assigning class DATA_REGS to r262
554: r262:SI=r262:SI-r37:SI
REG_ARGS_SIZE 0x10
Inserting insn reload before:
996: r262:SI=r116:SI
Inserting insn reload after:
997: [--%sp:SI]=r262:SI
Considering alt=0 of insn 997: (0) =g (1) damSKT
1 Non pseudo reload: reject++
overall=1,losers=0,rld_nregs=0
Choosing alt 0 in insn 997: (0) =g (1) damSKT {*movsi_m68k2} (sp_off=-16)
Note how insn 997 (the after-reload) now has sp_off=-16 already. It all
goes downhill from there. We end up with these insns:
552: [--sp] = r37 sp_off = -8 {movsi_m68k2}
996: r262 = r116 sp_off = -12
554: r262 = r262 - r37 sp_off = -12
997: [--sp] = r262 sp_off = -16 (!!! should be -12)
556: call sp_off = -16
The call insn sp_off remains at the correct -16, but internally it's already
inconsistent here. If the sp_off before an insn is -16, and that insn
pre_decs sp, then the after-insn sp_off should be -20.
PR target/116429
* lra.cc (setup_sp_offset): Start with sp_offset from
before the new sequence, not from after.
|
|
this is part of making m68k work with LRA. See PR116374.
m68k has the property that sometimes the elimation offset
between %sp and %argptr is zero. During setting up elimination
infrastructure it's changes between sp_offset and previous_offset
that feed into insns_with_changed_offsets that ultimately will
setup looking at the instructions so marked.
But the initial values for sp_offset and previous_offset are
also zero. So if the targets INITIAL_ELIMINATION_OFFSET (called
in update_reg_eliminate) is zero then nothing changes, the
instructions in question don't get into the list to consider and
the sp_offset tracking goes wrong.
Solve this by initializing those member with -1 instead of zero.
An initial offset of that value seems very unlikely, as it's
in word-sized increments. This then also reveals a problem in
eliminate_regs_in_insn where it always uses sp_offset-previous_offset
as offset adjustment, even in the first_p pass. That was harmless
when previous_offset was uninitialized as zero. But all the other
code uses a different idiom of checking for first_p (or rather
update_p which is !replace_p&&!first_p), and using sp_offset directly.
So use that as well in eliminate_regs_in_insn.
PR target/116374
* lra-eliminations.cc (init_elim_table): Use -1 as initializer.
(update_reg_eliminate): Accept -1 as not-yet-used marker.
(eliminate_regs_in_insn): Use previous_sp_offset only when
not first_p.
|
|
when experimenting with m68k plus LRA one of the
changes in the backend is to accept ASHIFTs (not only
MULT) as scale code for address indices. When then not
turning on LRA but using reload those addresses are
presented to it which chokes on them. While reload is
going away the change to make them work doesn't really hurt
(and generally seems useful, as MULT and ASHIFT really are
no different). So just add it.
PR target/116413
* final.cc (walk_alter_subreg): Recurse on AHIFT.
|
|
* ipa-devirt.cc (odr_equivalent_or_derived_p): New.
* ipa-utils.h (odr_equivalent_or_derived_p): Declare.
* tree-eh.cc (same_or_derived_type): New.
(match_lp): Use it.
|
|
This includes uncommenting the atomic_flag non-member functions, which
were added by PR libstdc++/103934.
Also generate a hint for std::ignore, which was recently tweaked to be
more generally useful by P2968R2, which r15-2324 implemented.
gcc/cp/ChangeLog:
* cxxapi-data.csv: Add C++20 and C++23 names from <chrono>,
<format>, <generator>, <iterator>, <print>, and <stdfloat>.
Set cxx11 dialect for std::ignore in <tuple>. Uncomment
atomic_flag functions from <atomic>.
* std-name-hint.gperf: Regenerate.
* std-name-hint.h: Regenerate.
|
|
This ensures the generated output says something like 2022-2024 rather
than just 2024.
gcc/cp/ChangeLog:
* gen-cxxapi-file.py: Fix copyright dates in generated output.
|
|
gcc/testsuite/ChangeLog:
* gcc.dg/pr108757-1.c: Fixed dg-comment.
* gcc.dg/pr71071.c: Likewise.
* gcc.dg/tree-ssa/noreturn-1.c: Likewise.
* gcc.dg/tree-ssa/pr56727.c: Likewise.
* gcc.target/arc/loop-2.cpp: Likewise.
* gcc.target/arc/loop-3.c: Likewise.
* gcc.target/arc/pr9001107555.c: Likewise.
* gcc.target/arm/armv8_1m-fp16-move-1.c: Likewise.
* gcc.target/arm/armv8_1m-fp32-move-1.c: Likewise.
* gcc.target/arm/armv8_1m-fp64-move-1.c: Likewise.
* gcc.target/i386/amxint8-asmatt-1.c: Likewise.
* gcc.target/i386/amxint8-asmintel-1.c: Likewise.
* gcc.target/i386/avx512bw-vpermt2w-1.c: Likewise.
* gcc.target/i386/avx512vbmi-vpermt2b-1.c: Likewise.
* gcc.target/i386/endbr_immediate.c: Likewise.
* gcc.target/i386/pr96539.c: Likewise.
* gcc.target/i386/sse2-pr98461-2.c: Likewise.
* gcc.target/m68k/pr39726.c: Likewise.
* gcc.target/m68k/pr52076-1.c: Likewise.
* gcc.target/m68k/pr52076-2.c: Likewise.
* gcc.target/nvptx/v2si-vec-set-extract.c: Likewise.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
It XPASSes after recent commit 5a3387938d4d95717cac29eecd0ba53e0ef9094d
"testsuite: Add -fwrapv to signbit-5.c".
gcc/testsuite/
* gcc.dg/signbit-5.c: Un-XFAIL for GCN.
|
|
This patch fixes gcc.c-torture/compile/opout.c for m68k with LRA
enabled. The test has:
...
z (a, b)
{
return (int) &a + (int) &b + (int) x + (int) z;
}
so it adds the address of two incoming arguments. This ends up
being treated as an LEA in which the "index" is the incoming
argument pointer, which the LEA multiplies by 2. The incoming
argument pointer is then eliminated, leading to:
(plus:SI (plus:SI (ashift:SI (plus:SI (reg/f:SI 24 %argptr)
(const_int -4 [0xfffffffffffffffc]))
(const_int 1 [0x1]))
(reg/f:SI 41 [ _6 ]))
(const_int 20 [0x14]))
In the address_info scheme, the innermost plus has to be treated
as the index "term", since that's the thing that's subject to
index_reg_class.
gcc/
PR middle-end/116413
* rtl.h (address_info): Update commentary.
* rtlanal.cc (valid_base_or_index_term_p): New function, split
out from...
(get_base_term, get_index_term): ...here. Handle elimination PLUSes.
|
|
The sequence of events in this PR is that:
- the function has many addresses in which only a single hard base
register is acceptable. Let's call the hard register H.
- IRA allocates that register to one of the pseudo base registers.
Let's call the pseudo register P.
- Some of the other addresses that require H occur when P is still live.
- LRA therefore has to spill P.
- When it reallocates P, LRA chooses to use FRAME_POINTER_REGNUM,
which has been eliminated to the stack pointer. (This is ok,
since the frame register is free.)
- Spilling P causes LRA to reprocess the instruction that uses P.
- When reprocessing the address that has P as its base, LRA first
applies the new allocation, to get FRAME_POINTER_REGNUM,
and then applies the elimination, to get the stack pointer.
The last step seems wrong: the elimination should only apply to
pre-existing uses of FRAME_POINTER_REGNUM, not to uses that result
from allocating pseudos. Applying both means that we get the wrong
register number, and therefore the wrong class.
The PR is about an existing testcase that fails with LRA on m86k.
gcc/
PR middle-end/116321
* lra-constraints.cc (get_hard_regno): Only apply eliminations
to existing hard registers.
(get_reg_class): Likewise.
|
|
We have a bogus warning about the coroutine state frame pointers
being apparently unused in the resume and destroy functions. Fixed
by making the parameters DECL_ARTIFICIAL.
PR c++/116482
gcc/cp/ChangeLog:
* coroutines.cc
(coro_build_actor_or_destroy_function): Make the parameter
decls DECL_ARTIFICIAL.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr116482.C: New test.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
The following avoids removing stmts with defs that might still have
uses in the IL before calling simple_dce_from_worklist which might
remove those as that will wreck debug stmt generation. Instead first
perform use-based DCE and then remove stmts which may have uses in
code that CFG cleanup will remove. This requires tracking stmts
in to_remove by their SSA def so we can check whether it was removed
before without running into the issue that PHIs can be ggc_free()d
upon removal. So this adds to_remove_defs in addition to to_remove
which has to stay to track GIMPLE_NOPs we want to elide.
PR tree-optimization/116460
* tree-ssa-forwprop.cc (pass_forwprop::execute): First do
simple_dce_from_worklist and then remove stmts in to_remove.
Track defs to be removed in to_remove_defs.
* g++.dg/torture/pr116460.C: New testcase.
|
|
This new test was reported to be still failing on sparc targets.
Here the number of DW_AT_ranges dropped to zero.
The test should pass on this architecture with -Os, -O2 and -O3.
I tried to improve also different known problematic targets,
where only one subroutine had DW_AT_ranges:
Those are armhf (arm with hard float), powerpc and powerpc64.
The best option is to use -Os: So far the only one, where
all two inline instances in this test had two DW_AT_ranges.
gcc/testsuite/ChangeLog:
PR other/116462
* gcc.dg/debug/dwarf2/inline7.c: Switch to -Os optimization.
|
|
This patch would like to allow IMM for the operand 1 of ussub pattern.
Aka .SAT_SUB(x, 22) as the below example.
Form 2:
#define DEF_SAT_U_SUB_IMM_FMT_2(T, IMM) \
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_2 (T x) \
{ \
return x >= (T)IMM ? x - (T)IMM : 0; \
}
DEF_SAT_U_SUB_IMM_FMT_2(uint64_t, 1022)
It is almost the as support imm for operand 0 of ussub pattern, but
allow the second operand to be imm insted of the first operand.
The below test suites are passed for this patch:
1. The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_ussub): Gen xmode for the
second operand, aka y in parameter.
* config/riscv/riscv.md (ussub<mode>3): Allow const_int for operand 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-5.c: New test.
* gcc.target/riscv/sat_u_sub_imm-5_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-5_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-8.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-5.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-6.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-7.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
After importing a header unit we learn about and setup any header
modules that we transitively depend on. However, this causes
'set_filename' to fail an assertion if we then come across this header
as an #include and attempt to translate it into a module. We still need
to do this translation so that libcpp learns that this is a header unit,
but we shouldn't error just because we've already seen it as an import.
Instead this patch merely checks and errors to handle the case of a
broken mapper implementation which supplies a different CMI path from
the one we already got.
As a drive-by fix, also make failing to find the CMI for a module be a
fatal error: any further errors in the TU are unlikely to be helpful.
PR c++/99243
gcc/cp/ChangeLog:
* module.cc (module_state::set_filename): Handle repeated calls
to 'set_filename' as long as the CMI path matches.
(maybe_translate_include): Adjust comment.
gcc/testsuite/ChangeLog:
* g++.dg/modules/map-2.C: Prune additional fatal error message.
* g++.dg/modules/inc-xlate-4_a.H: New test.
* g++.dg/modules/inc-xlate-4_b.H: New test.
* g++.dg/modules/inc-xlate-4_c.H: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
Currently the handling of include translation is confusing to read,
using a tri-state integer without much clarity on what different states
mean. This patch cleans this up to use explicit enumerators indicating
the different possible states instead, and fixes a bug where the option
'-flang-info-include-translate' ended being accidentally unusable.
PR c++/110980
gcc/cp/ChangeLog:
* module.cc (maybe_translate_include): Clean up.
gcc/testsuite/ChangeLog:
* g++.dg/modules/inc-xlate-2_a.H: New test.
* g++.dg/modules/inc-xlate-2_b.H: New test.
* g++.dg/modules/inc-xlate-3.h: New test.
* g++.dg/modules/inc-xlate-3_a.H: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
PR115883
The first of the late-combine passes, propagates some of the copies
made during the (in-time-)combine pass in make_more_copies into the
users of the "original" pseudo registers and removes the "old"
pseudos. That effectively removes attributes such as REG_POINTER,
which matter to LRA. The quoted PR is for an ICE-manifesting bug that
was exposed by the late-combine pass and went back to hiding with this
patch until commit r15-2937-g3673b7054ec2, the fix for PR116236, when
it was actually fixed. To wit, this patch is only incidentally
related to that bug.
In other words, the REG_POINTER attribute should not be required for
LRA to work correctly. This patch merely corrects state for those
propagated register-uses to ante late-combine.
For reasons not investigated, this fixes a failing test
"FAIL: gcc.dg/guality/pr54200.c -Og -DPREVENT_OPTIMIZATION line 20 z == 3"
for x86_64-linux-gnu.
PR middle-end/115883
* combine.cc (make_more_copies): Copy attributes from the original
pseudo to the new copy.
|