Age | Commit message (Collapse) | Author | Files | Lines |
|
On Thu, Aug 29, 2024 at 06:58:12PM -0400, David Malcolm wrote:
> The following patch rewrites the internals of pp_format.
> The tokens and token lists are allocated on the chunk_obstack, and so
> there's no additional heap activity required, with the memory reclaimed
> when the chunk_obstack is freed after phase 3 of formatting.
> +static void *
> +allocate_object (size_t sz, obstack &s)
> +{
> + /* We must not be half-way through an object. */
> + gcc_assert (obstack_base (&s) == obstack_next_free (&s));
> +
> + obstack_grow (&s, obstack_base (&s), sz);
> + void *buf = obstack_finish (&s);
> + return buf;
> }
I think this is wrong. I hoped it would be the reason of the
unexpected libstdc++ warnings on certain architectures after
seeing
==4027220== Source and destination overlap in memcpy(0x4627154, 0x4627154, 12)
==4027220== at 0x404B93E: memcpy (vg_replace_strmem.c:1123)
==4027220== by 0xAAD5618: allocate_object(unsigned int, obstack&) (pretty-print.cc:1183)
==4027220== by 0xAAD8C0E: operator new (pretty-print.cc:1210)
==4027220== by 0xAAD8C0E: make (pretty-print-format-impl.h:305)
==4027220== by 0xAAD8C0E: format_phase_1 (pretty-print.cc:1659)
==4027220== by 0xAAD8C0E: pretty_printer::format(text_info&) (pretty-print.cc:1618)
==4027220== by 0xAAA840E: pp_format (pretty-print.h:583)
==4027220== by 0xAAA840E: diagnostic_context::report_diagnostic(diagnostic_info*) (diagnostic.cc:1260)
==4027220== by 0xAAA8703: diagnostic_context::diagnostic_impl(rich_location*, diagnostic_metadata const*, diagnostic_option_id, char const*, char**, diagnostic_t) (diagnostic.cc:1404)
==4027220== by 0xAAB8682: warning(diagnostic_option_id, char const*, ...) (diagnostic-global-context.cc:166)
==4027220== by 0x97725F5: warn_deprecated_use(tree_node*, tree_node*) (tree.cc:12485)
==4027220== by 0x8B6694B: mark_used(tree_node*, int) (decl2.cc:6121)
==4027220== by 0x8C9E25E: tsubst_expr(tree_node*, tree_node*, int, tree_node*) [clone .part.0] (pt.cc:21626)
==4027220== by 0x8C9E5E6: tsubst_expr(tree_node*, tree_node*, int, tree_node*) [clone .part.0] (pt.cc:20935)
==4027220== by 0x8C9E1D7: tsubst_expr(tree_node*, tree_node*, int, tree_node*) [clone .part.0] (pt.cc:20424)
==4027220== by 0x8C9DF2E: tsubst_expr(tree_node*, tree_node*, int, tree_node*) [clone .part.0] (pt.cc:20496)
==4027220==
etc. valgrind warnings, unfortunately it is not, but still
I think this is a bug.
If the obstack has enough space in it, i.e. if obstack_room (&s) >= sz,
then obstack_grow from obstack_base will copy uninitialized bytes
through memcpy (obstack_base (&s), obstack_base (&s), sz);
(which pedantically isn't valid due to the overlap, and so
the reason why valgrind complains, but in reality I think most
implementations can handle it fine, after all, we also use it for
structure assignments which could have full or no overlap but never
partial).
If obstack_room (&s) < sz, then obstack_grow will first
_obstack_newchunk (&s, sz); which will allocate new memory and
copy the existing data of the object (but the above assertion
guartantees it will copy 0 bytes) and then the memcpy copies
sz bytes from the old base to the new (if unlucky, that could crash
as there could be end of page and unmapped next page in between).
I think we should use obstack_blank instead of obstack_grow, which
does everything obstack_grow does, except for the memcpy of the
uninitialized data.
2024-09-25 Jakub Jelinek <jakub@redhat.com>
* pretty-print.cc (allocate_object): Use obstack_blank rather than
obstack_grow.
|
|
gcc/testsuite/ChangeLog:
* g++.dg/modules/reparent-1_c.C: Fix whitespace around '-' in dg directive.
* gfortran.dg/initialization_25.f90: Ditto.
|
|
Doing this to avoid FPs from grepping but also to avoid the potential
for people learning bad habits.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/caf.exp: Fix 'dg-do-run' typo.
* lib/gfortran-dg.exp: Ditto.
* lib/gm2-dg.exp: Ditto.
* lib/go-dg.exp: Ditto.
|
|
Binutils 2.16 is 13 years old; no need to specifically refer to it as a
requirement.
gcc:
PR target/69374
* doc/install.texi (Specific) <*-*-mingw32>: Remove note regarding
binutils 2.16.
|
|
gcc/ChangeLog:
* match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : -A.
Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
gcc/testsuite/ChangeLog:
* g++.dg/absvect.C: New test.
* gcc.dg/tree-ssa/absfloat16.c: New test.
Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
|
|
This patch enables vectorization of the popcount operation for V2QI, V4QI,
V8QI, V2HI, V4HI, and V2SI modes.
gcc/ChangeLog:
* config/i386/mmx.md:
(VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte QImode.
(popcount<mode>2): New pattern for popcount of V2QI/V4QI/V8QI mode.
(popcount<mode>2): New pattern for popcount of V2HI/V4HI mode.
(popcountv2si2): New pattern for popcount of V2SI mode.
gcc/testsuite/ChangeLog:
* gcc.target/i386/part-vect-popcount-1.c: New test.
|
|
gcc/ChangeLog:
* config/i386/i386.h (VECTOR_STORE_FLAG_VALUE): New macro.
gcc/testsuite/ChangeLog:
* gcc.dg/rtl/x86_64/vector_eq.c: New test.
|
|
r15-3878 exposed a mistake in the testcase, probably from an older
version of the dumping logic.
Apart from the slightly different syntax for the dump line, also check
for importing the type_decl rather than the const_decl (we need the type
anyway and importing the type also brings along the enumerators so it
would be unnecessary to seed an import for them as well).
PR c++/116846
gcc/testsuite/ChangeLog:
* g++.dg/modules/indirect-1_b.C: Fix testcase.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
|
|
Form 3:
#define DEF_VEC_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \
void __attribute__((noinline)) \
vec_sat_s_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
T x = op_1[i]; \
T y = op_2[i]; \
T sum; \
bool overflow = __builtin_add_overflow (x, y, &sum); \
out[i] = overflow ? x < 0 ? MIN : MAX : sum; \
} \
}
DEF_VEC_SAT_S_ADD_FMT_3 (int8_t, uint8_t, INT8_MIN, INT8_MAX)
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-9.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-9.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to support the form 3 of the vector signed
integer .SAT_ADD. Aka below example:
Form 3:
#define DEF_VEC_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \
void __attribute__((noinline)) \
vec_sat_s_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
T x = op_1[i]; \
T y = op_2[i]; \
T sum; \
bool overflow = __builtin_add_overflow (x, y, &sum); \
out[i] = overflow ? x < 0 ? MIN : MAX : sum; \
} \
}
DEF_VEC_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)
Before this patch:
40 │ # ivtmp.7_34 = PHI <0(3), ivtmp.7_30(7)>
41 │ _26 = op_1_12(D) + ivtmp.7_34;
42 │ x_29 = MEM[(int8_t *)_26];
43 │ _1 = op_2_14(D) + ivtmp.7_34;
44 │ y_24 = MEM[(int8_t *)_1];
45 │ _9 = .ADD_OVERFLOW (y_24, x_29);
46 │ _7 = IMAGPART_EXPR <_9>;
47 │ if (_7 != 0)
48 │ goto <bb 6>; [50.00%]
49 │ else
50 │ goto <bb 5>; [50.00%]
51 │ ;; succ: 6
52 │ ;; 5
53 │
54 │ ;; basic block 5, loop depth 1
55 │ ;; pred: 4
56 │ _42 = REALPART_EXPR <_9>;
57 │ _2 = out_17(D) + ivtmp.7_34;
58 │ MEM[(int8_t *)_2] = _42;
59 │ ivtmp.7_27 = ivtmp.7_34 + 1;
60 │ if (_13 != ivtmp.7_27)
61 │ goto <bb 7>; [89.00%]
62 │ else
63 │ goto <bb 8>; [11.00%]
64 │ ;; succ: 7
65 │ ;; 8
66 │
67 │ ;; basic block 6, loop depth 1
68 │ ;; pred: 4
69 │ _38 = x_29 < 0;
70 │ _39 = (signed char) _38;
71 │ _40 = -_39;
72 │ _41 = _40 ^ 127;
73 │ _33 = out_17(D) + ivtmp.7_34;
74 │ MEM[(int8_t *)_33] = _41;
75 │ ivtmp.7_25 = ivtmp.7_34 + 1;
76 │ if (_13 != ivtmp.7_25)
After this patch:
77 │ _94 = .SELECT_VL (ivtmp_92, POLY_INT_CST [16, 16]);
78 │ vect_x_13.9_81 = .MASK_LEN_LOAD (vectp_op_1.7_79, 8B, { -1, ... }, _94, 0);
79 │ vect_y_15.12_85 = .MASK_LEN_LOAD (vectp_op_2.10_83, 8B, { -1, ... }, _94, 0);
80 │ vect_patt_49.13_86 = .SAT_ADD (vect_x_13.9_81, vect_y_15.12_85);
81 │ .MASK_LEN_STORE (vectp_out.14_88, 8B, { -1, ... }, _94, 0, vect_patt_49.13_86);
82 │ vectp_op_1.7_80 = vectp_op_1.7_79 + _94;
83 │ vectp_op_2.10_84 = vectp_op_2.10_83 + _94;
84 │ vectp_out.14_89 = vectp_out.14_88 + _94;
85 │ ivtmp_93 = ivtmp_92 - _94;
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Add optional nop_convert for signed SAT_ADD case 4.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
|
|
PR testsuite/116701 shows that left-behind files from
unnamed gfortran open statements (named unit.N, where N =
unit number) can interfere with the result of a subsequent
run. While that's unlikely to happen for a "real" fortran
target or a test with a deleting close-statement, test-cases
should not rely on previous test-cases passing and not
execute along different execution paths depending on earlier
runs, even if the difference is benevolent.
Most but not all fortran test-cases go through
gfortran-dg-runtest (gfortran.dg) or fortran-torture-execute
(gfortran.fortran-torture). However, the exceptions, with
more complex framework and call-chains, either don't run or
don't have open-statements, so a more complex solution
doesn't seem worthwhile. If test-cases with open-statements
are added later to those parts of the test-suite, calls to
fortran-delete-unit-files at the right spot may be added or
worst case, "manual" cleanup-calls added, like:
! { dg-final { remote_file target delete "fort.10" } }
Put the new proc in fortran-modules.exp since that's where other
common fortran-testsuite dejagnu-library functions are located.
PR testsuite/116701
* lib/fortran-modules.exp (fortran-delete-unit-files): New proc.
* lib/gfortran-dg.exp (gfortran-dg-runtest): Call
fortran-delete-unit-files after executing test.
* lib/fortran-torture.exp (fortran-torture-execute): Ditto.
|
|
Mark the newly typo-fixed dg-final bits as XFAIL until investigated.
gcc/testsuite/ChangeLog:
PR c++/116846
* g++.dg/modules/indirect-1_b.C: Add XFAIL.
|
|
Fix typos in dejagnu 'dg-*' directives with erroneous underscores like
'dg_'.
gcc/testsuite/ChangeLog:
PR debug/30161
PR c++/91826
PR c++/116846
* g++.dg/debug/dwarf2/template-func-params-7.C: Fix errant underscore.
Cleanup whitespace in directives too.
* g++.dg/lookup/pr91826.C: Fix errant underscore.
* g++.dg/modules/indirect-1_b.C: Ditto.
* gcc.target/powerpc/vsx-builtin-msum.c: Ditto.
|
|
The documentation of gfortran options uses @code wrappings for arguments
to @opindex. This is superfluous, as 'op' index is a texinfo 'code' index,
that is it already implicitly formats its arguments as if in a @code block.
The superfluous wrapping has the effect of creating a nested
<code class="..."> tag inside the regular automatic <code> tag, in the
option index HTML page, preventing the recognition of the corresponding
option by the option URL generation script.
This change removes those superfluous @code wrappings. Additionally,
variables appearing as separate argument in index are removed, permitting
a few more URL recognition. Finally, the URL files are regenerated with the
new URLs recognized on the updated HTML files.
By the way, a spurious 'option' is removed from the label of the std= option
in the index, without any effect on URL recognition.
PR other/116801
gcc/fortran/ChangeLog:
* invoke.texi: Remove @code wrapping in arguments to @opindex.
(std=): Remove spurious 'option' in index.
(idirafter, imultilib, iprefix, isysroot, iquote, isystem,
fintrinsic-modules-path): Remove variable from index.
* lang.opt.urls: Regenerate.
gcc/ada/ChangeLog:
* gcc-interface/lang.opt.urls: Regenerate.
gcc/c-family/ChangeLog:
* c.opt.urls: Regenerate.
gcc/ChangeLog:
* common.opt.urls: Regenerate.
gcc/d/ChangeLog:
* lang.opt.urls: Regenerate.
gcc/go/ChangeLog:
* lang.opt.urls: Regenerate.
gcc/m2/ChangeLog:
* lang.opt.urls: Regenerate.
gcc/rust/ChangeLog:
* lang.opt.urls: Regenerate.
|
|
The following patch adds GENERIC and GIMPLE folders for various
x86 min/max builtins.
As discussed, these builtins have effectively x < y ? x : y
(or x > y ? x : y) behavior.
The GENERIC folding is done if all the (relevant) arguments are
constants (such as VECTOR_CST for vectors) and is done because
the GIMPLE folding can't easily handle masking, rounding and the
ss/sd cases (in a way that it would be pattern recognized back to the
corresponding instructions). The GIMPLE folding is also done just
for TARGET_SSE4 or later when optimizing, otherwise it is apparently
not matched back.
2024-09-25 Jakub Jelinek <jakub@redhat.com>
PR target/116738
* config/i386/i386.cc (ix86_fold_builtin): Handle
IX86_BUILTIN_M{IN,AX}{S,P}{S,H,D}*.
(ix86_gimple_fold_builtin): Handle IX86_BUILTIN_M{IN,AX}P{S,H,D}*.
* gcc.target/i386/avx512f-pr116738-1.c: New test.
* gcc.target/i386/avx512f-pr116738-2.c: New test.
|
|
Address override only applies to the (reg32) part in the thread address
fs:(reg32). Don't rewrite thread address like
(set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 98 [ __gmpfr_emax.0_1 ])
(mem/c:SI (plus:SI (plus:SI (unspec:SI [
(const_int 0 [0])
] UNSPEC_TP)
(reg:SI 107))
(const:SI (unspec:SI [
(symbol_ref:SI ("previous_emax") [flags 0x1a] <var_decl 0x7fffe9a11cf0 previous_emax>)
] UNSPEC_DTPOFF))) [1 previous_emax+0 S4 A32])))
if address override is used to avoid the invalid memory operand like
cmpl %fs:previous_emax@dtpoff(%eax), %r12d
gcc/
PR target/116839
* config/i386/i386.cc (ix86_rewrite_tls_address_1): Make it
static. Return if TLS address is thread register plus an integer
register.
gcc/testsuite/
PR target/116839
* gcc.target/i386/pr116839.c: New file.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
libtool defaults to filtering flags passed at link-time.
This brings the filtering in GCC's 'fork' of libtool into sync with
upstream libtool commit 22a7e547e9857fc94fe5bc7c921d9a4b49c09f8e.
In particular, this now allows some harmless diagnostic flags (especially
useful for things like -Werror=odr), more optimization flags, and some
Clang-specific options.
GCC's -flto documentation mentions:
> To use the link-time optimizer, -flto and optimization options should be
> specified at compile time and during the final link. It is recommended
> that you compile all the files participating in the same link with the
> same options and also specify those options at link time.
This allows compliance with that.
* ltmain.sh (func_mode_link): Allow various flags through filter.
|
|
These dg-bogus directives were bogus as they missed a closing brace.
```
+PASS: 23_containers/array/capacity/empty.cc -std=gnu++17 (test for bogus messages, line 54)
PASS: 23_containers/array/capacity/empty.cc -std=gnu++17 (test for excess errors)
PASS: 23_containers/array/capacity/empty.cc -std=gnu++17 execution test
+PASS: 23_containers/array/capacity/max_size.cc -std=gnu++17 (test for bogus messages, line 54)
PASS: 23_containers/array/capacity/max_size.cc -std=gnu++17 (test for excess errors)
PASS: 23_containers/array/capacity/max_size.cc -std=gnu++17 execution test
+PASS: 23_containers/array/capacity/size.cc -std=gnu++17 (test for bogus messages, line 54)
```
libstdc++-v3/ChangeLog:
PR libstdc++/101831
* testsuite/23_containers/array/capacity/empty.cc: Add missing brace.
* testsuite/23_containers/array/capacity/max_size.cc: Ditto.
* testsuite/23_containers/array/capacity/size.cc: Ditto.
|
|
gcc/testsuite/ChangeLog:
* gfortran.dg/unsigned_25.f90: Change KIND=16 to KIND=8.
|
|
While looking into improving phiprop, I noticed that
the current pr70740.c testcase was being optimized almost
all the way before phiprop because the addresses were considered
the same; the arrays were all zero in size.
This adds an alternative testcase which changes the array sizes to be 1
and phiprop can and will act on this testcase now and the fix which was
being tested is actually tested now.
Tested on x86_64-linux-gnu.
PR tree-optimization/70740
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr70740-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
For generic, `a != 0 ? a * b : 0` would match where `b` would be an expression
which trap (in the case of the testcase, it was an integer division but it could be any).
This adds a new helper function, expr_no_side_effects_p which tests if there is no side effects
and the expression is not trapping which might be used in other locations.
Changes since v1:
* v2: Add move check to helper function instead of inlining it.
PR middle-end/116772
gcc/ChangeLog:
* generic-match-head.cc (expr_no_side_effects_p): New function
* gimple-match-head.cc (expr_no_side_effects_p): New function
* match.pd (`a != 0 ? a / b : 0`): Check expr_no_side_effects_p.
(`a != 0 ? a * b : 0`, `a != 0 ? a & b : 0`): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr116772-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Seems we already allow the partial specializations the way the DR clarifies,
so this patch just adds a testcase which verifies that.
2024-09-25 Jakub Jelinek <jakub@redhat.com>
* g++.dg/DRs/dr2874.C: New test.
|
|
Seems we already handle it the way the DR clarifies, if double/long double
and std::float64_t have the same mode, foo has long double type (while
x + y would be _Float64 in C23), so this patch just adds a testcase which
verifies that.
2024-09-25 Jakub Jelinek <jakub@redhat.com>
* g++.dg/DRs/dr2836.C: New test.
|
|
Seems we already handle delete expressions the way the DR clarifies,
so this patch just adds a testcase which verifies that.
2024-09-25 Jakub Jelinek <jakub@redhat.com>
* g++.dg/DRs/dr2728.C: New test.
|
|
In expressions like (a != b || ((a ^ b) & c) == d) and
(a != b || (a ^ b) == c), (a ^ b) is folded to false.
In the equivalent expressions (((a ^ b) & c) == d || a != b) and
((a ^ b) == c || a != b) this is not happening.
This patch adds the following simplifications in match.pd:
((a ^ b) & c) cmp d || a != b --> 0 cmp d || a != b
(a ^ b) cmp c || a != b --> 0 cmp c || a != b
PR tree-optimization/114326
gcc/ChangeLog:
* match.pd: Add two patterns to fold a ^ b to 0, when a == b.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/fold-xor-and-or.c: New test.
* gcc.dg/tree-ssa/fold-xor-or.c: New test.
Tested-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>
|
|
When min != max we know min ^ max != 0.
* value-range.cc (get_bitmask_from_range): Remove redundant
compare of xorv with zero.
|
|
wide_int_storage shows up high in the profile for the testcase in
PR114855 where the apparent issue is that the conditional jump
on 'precision' after the (inlined) memcpy stalls the pipeline due
to the data dependence and required store-to-load forwarding. We
can add scheduling freedom by instead testing precision as from the
source which speeds up the function by 30%. I've applied the
same logic to the copy CTOR.
* wide-int.h (wide_int_storage::wide_int_storage): Branch
on source precision to avoid data dependence on memcpy
destination.
(wide_int_storage::operator=): Likewise.
|
|
While futzing around with PR116416 I noticed that we can use
the _SLOT and _INITIAL macros to make the code more readable.
gcc/c-family/ChangeLog:
* c-pretty-print.cc (c_pretty_printer::primary_expression): Use
TARGET_EXPR accessors.
(c_pretty_printer::expression): Likewise.
gcc/cp/ChangeLog:
* coroutines.cc (build_co_await): Use TARGET_EXPR accessors.
(finish_co_yield_expr): Likewise.
(register_awaits): Likewise.
(tmp_target_expr_p): Likewise.
(flatten_await_stmt): Likewise.
* error.cc (dump_expr): Likewise.
* semantics.cc (finish_omp_target_clauses): Likewise.
* tree.cc (bot_manip): Likewise.
(cp_tree_equal): Likewise.
* typeck.cc (cxx_mark_addressable): Likewise.
(cp_build_compound_expr): Likewise.
(cp_build_modify_expr): Likewise.
(check_return_expr): Likewise.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
The following function:
int foo(int *a, int j)
{
int k = j - 1;
return a[j - 1] == a[k];
}
does not fold to `return 1;` using -O2 or higher. The cause of this is that
the expression `4 * j + (-4)` for the index computation is not folded to
`4 * (j - 1)`. Existing simplifications that handle similar cases are applied
when A == C, which is not the case in this instance.
A previous attempt to address this issue is
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649896.html
This patch adds the following simplification in match.pd:
(A * B) + (-C) -> (B - C/A) * A, if C a multiple of A
which also handles cases where the index is j - 2, j - 3, etc.
Bootstrapped for all languages and regression tested on x86-64 and aarch64.
PR tree-optimization/109393
gcc/ChangeLog:
* match.pd: (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A.
gcc/testsuite/ChangeLog:
* gcc.dg/pr109393.c: New test.
Tested-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>
|
|
The reassoc pass currently walks dominators in a recursive way where
I ran into a stack overflow with. The following replaces it with
worklists following patterns used elsewhere.
* tree-ssa-reassoc.cc (break_up_subtract_bb): Remove recursion.
(reassociate_bb): Likewise.
(do_reassoc): Implement worklist based dominator walks for
both break_up_subtract_bb and reassociate_bb.
|
|
Remove some ad-hoc simplification code in the forward threader, as the
call into the ranger in m_simplifier->simplify() will handle anything we
can do manually in simplify_control_stmt_condition_1.
In PR114855, DOM time is reduced from 120s to 92s (-23%) and overall
compilation time from 235s to 205s (-12%). The total thread count at -O1 is
unchanged for the testcase.
In our bootstrap .ii benchmark suite, I see we thread 3 threads less
over all files at -O1. At -O2, the backward threader picks up one more,
for no difference over all.
PR tree-optimization/114855
gcc/ChangeLog:
* tree-ssa-threadedge.cc: Remove unneeded recursion.
|
|
In r15-3714-gd3a7302ec5985a I added -Wsystem-headers to the libstdc++ build
flags to help catch problems in the library. This patch takes a different
approach, of disabling the #pragma system_header unless _GLIBCXX_SYSHDR is
defined. As a result, the testsuites will treat them as non-system-headers
to get better warning coverage during regression testing of both gcc and
libstdc++, not just when building the library.
My rationale for the #ifdef instead of just removing the #pragma is the
three G++ tests that want to test libstdc++ system header behavior, so we
need a way to select it.
This doesn't affect installed libraries, as they get their
system-header status from the lookup path. But testsuite_flags
--build-includes gives -I directives rather than -isystem.
This patch doesn't change the headers in config/ because I'm not compiling
with most of them, so won't see any warnings that need fixing. Adjusting
them could happen later, or we can not bother.
libstdc++-v3/ChangeLog:
* acinclude.m4 (WARN_FLAGS): Remove -Wsystem-headers.
* configure: Regenerate.
* include/bits/algorithmfwd.h: #ifdef out #pragma GCC system_header.
* include/bits/atomic_base.h
* include/bits/atomic_futex.h
* include/bits/atomic_timed_wait.h
* include/bits/atomic_wait.h
* include/bits/basic_ios.h
* include/bits/basic_string.h
* include/bits/boost_concept_check.h
* include/bits/char_traits.h
* include/bits/charconv.h
* include/bits/chrono.h
* include/bits/chrono_io.h
* include/bits/codecvt.h
* include/bits/concept_check.h
* include/bits/cpp_type_traits.h
* include/bits/elements_of.h
* include/bits/enable_special_members.h
* include/bits/erase_if.h
* include/bits/forward_list.h
* include/bits/functional_hash.h
* include/bits/gslice.h
* include/bits/gslice_array.h
* include/bits/hashtable.h
* include/bits/indirect_array.h
* include/bits/invoke.h
* include/bits/ios_base.h
* include/bits/iterator_concepts.h
* include/bits/locale_classes.h
* include/bits/locale_facets.h
* include/bits/locale_facets_nonio.h
* include/bits/localefwd.h
* include/bits/mask_array.h
* include/bits/max_size_type.h
* include/bits/memory_resource.h
* include/bits/memoryfwd.h
* include/bits/move_only_function.h
* include/bits/node_handle.h
* include/bits/ostream_insert.h
* include/bits/out_ptr.h
* include/bits/parse_numbers.h
* include/bits/postypes.h
* include/bits/quoted_string.h
* include/bits/range_access.h
* include/bits/ranges_base.h
* include/bits/refwrap.h
* include/bits/sat_arith.h
* include/bits/semaphore_base.h
* include/bits/slice_array.h
* include/bits/std_abs.h
* include/bits/std_function.h
* include/bits/std_mutex.h
* include/bits/std_thread.h
* include/bits/stl_iterator_base_funcs.h
* include/bits/stl_iterator_base_types.h
* include/bits/stl_tree.h
* include/bits/stream_iterator.h
* include/bits/streambuf_iterator.h
* include/bits/stringfwd.h
* include/bits/this_thread_sleep.h
* include/bits/unique_lock.h
* include/bits/uses_allocator_args.h
* include/bits/utility.h
* include/bits/valarray_after.h
* include/bits/valarray_array.h
* include/bits/valarray_before.h
* include/bits/version.h
* include/c_compatibility/fenv.h
* include/c_compatibility/inttypes.h
* include/c_compatibility/stdint.h
* include/decimal/decimal.h
* include/experimental/bits/net.h
* include/experimental/bits/shared_ptr.h
* include/ext/aligned_buffer.h
* include/ext/alloc_traits.h
* include/ext/atomicity.h
* include/ext/concurrence.h
* include/ext/numeric_traits.h
* include/ext/pod_char_traits.h
* include/ext/pointer.h
* include/ext/stdio_filebuf.h
* include/ext/stdio_sync_filebuf.h
* include/ext/string_conversions.h
* include/ext/type_traits.h
* include/ext/vstring.h
* include/ext/vstring_fwd.h
* include/ext/vstring_util.h
* include/parallel/algorithmfwd.h
* include/parallel/numericfwd.h
* include/tr1/functional_hash.h
* include/tr1/hashtable.h
* include/tr1/random.h
* libsupc++/exception.h
* libsupc++/hash_bytes.h
* include/bits/basic_ios.tcc
* include/bits/basic_string.tcc
* include/bits/fstream.tcc
* include/bits/istream.tcc
* include/bits/locale_classes.tcc
* include/bits/locale_facets.tcc
* include/bits/locale_facets_nonio.tcc
* include/bits/ostream.tcc
* include/bits/sstream.tcc
* include/bits/streambuf.tcc
* include/bits/string_view.tcc
* include/bits/version.tpl
* include/experimental/bits/string_view.tcc
* include/ext/pb_ds/detail/resize_policy/hash_prime_size_policy_imp.hpp
* include/ext/random.tcc
* include/ext/vstring.tcc
* include/tr2/bool_set.tcc
* include/tr2/dynamic_bitset.tcc
* include/bits/c++config
* include/c/cassert
* include/c/cctype
* include/c/cerrno
* include/c/cfloat
* include/c/ciso646
* include/c/climits
* include/c/clocale
* include/c/cmath
* include/c/csetjmp
* include/c/csignal
* include/c/cstdarg
* include/c/cstddef
* include/c/cstdio
* include/c/cstdlib
* include/c/cstring
* include/c/ctime
* include/c/cuchar
* include/c/cwchar
* include/c/cwctype
* include/c_global/cassert
* include/c_global/ccomplex
* include/c_global/cctype
* include/c_global/cerrno
* include/c_global/cfenv
* include/c_global/cfloat
* include/c_global/cinttypes
* include/c_global/ciso646
* include/c_global/climits
* include/c_global/clocale
* include/c_global/cmath
* include/c_global/csetjmp
* include/c_global/csignal
* include/c_global/cstdalign
* include/c_global/cstdarg
* include/c_global/cstdbool
* include/c_global/cstddef
* include/c_global/cstdint
* include/c_global/cstdio
* include/c_global/cstdlib
* include/c_global/cstring
* include/c_global/ctgmath
* include/c_global/ctime
* include/c_global/cuchar
* include/c_global/cwchar
* include/c_global/cwctype
* include/c_std/cassert
* include/c_std/cctype
* include/c_std/cerrno
* include/c_std/cfloat
* include/c_std/ciso646
* include/c_std/climits
* include/c_std/clocale
* include/c_std/cmath
* include/c_std/csetjmp
* include/c_std/csignal
* include/c_std/cstdarg
* include/c_std/cstddef
* include/c_std/cstdio
* include/c_std/cstdlib
* include/c_std/cstring
* include/c_std/ctime
* include/c_std/cuchar
* include/c_std/cwchar
* include/c_std/cwctype
* include/debug/array
* include/debug/bitset
* include/debug/deque
* include/debug/forward_list
* include/debug/list
* include/debug/map
* include/debug/set
* include/debug/string
* include/debug/unordered_map
* include/debug/unordered_set
* include/debug/vector
* include/decimal/decimal
* include/experimental/algorithm
* include/experimental/any
* include/experimental/array
* include/experimental/buffer
* include/experimental/chrono
* include/experimental/contract
* include/experimental/deque
* include/experimental/executor
* include/experimental/filesystem
* include/experimental/forward_list
* include/experimental/functional
* include/experimental/internet
* include/experimental/io_context
* include/experimental/iterator
* include/experimental/list
* include/experimental/map
* include/experimental/memory
* include/experimental/memory_resource
* include/experimental/net
* include/experimental/netfwd
* include/experimental/numeric
* include/experimental/propagate_const
* include/experimental/ratio
* include/experimental/regex
* include/experimental/scope
* include/experimental/set
* include/experimental/socket
* include/experimental/string
* include/experimental/string_view
* include/experimental/synchronized_value
* include/experimental/system_error
* include/experimental/timer
* include/experimental/tuple
* include/experimental/type_traits
* include/experimental/unordered_map
* include/experimental/unordered_set
* include/experimental/vector
* include/ext/algorithm
* include/ext/cmath
* include/ext/functional
* include/ext/iterator
* include/ext/memory
* include/ext/numeric
* include/ext/random
* include/ext/rb_tree
* include/ext/rope
* include/parallel/algorithm
* include/std/algorithm
* include/std/any
* include/std/array
* include/std/atomic
* include/std/barrier
* include/std/bit
* include/std/bitset
* include/std/charconv
* include/std/chrono
* include/std/codecvt
* include/std/complex
* include/std/concepts
* include/std/condition_variable
* include/std/coroutine
* include/std/deque
* include/std/execution
* include/std/expected
* include/std/filesystem
* include/std/format
* include/std/forward_list
* include/std/fstream
* include/std/functional
* include/std/future
* include/std/generator
* include/std/iomanip
* include/std/ios
* include/std/iosfwd
* include/std/iostream
* include/std/istream
* include/std/iterator
* include/std/latch
* include/std/limits
* include/std/list
* include/std/locale
* include/std/map
* include/std/memory
* include/std/memory_resource
* include/std/mutex
* include/std/numbers
* include/std/numeric
* include/std/optional
* include/std/ostream
* include/std/print
* include/std/queue
* include/std/random
* include/std/ranges
* include/std/ratio
* include/std/regex
* include/std/scoped_allocator
* include/std/semaphore
* include/std/set
* include/std/shared_mutex
* include/std/span
* include/std/spanstream
* include/std/sstream
* include/std/stack
* include/std/stacktrace
* include/std/stdexcept
* include/std/streambuf
* include/std/string
* include/std/string_view
* include/std/syncstream
* include/std/system_error
* include/std/text_encoding
* include/std/thread
* include/std/tuple
* include/std/type_traits
* include/std/typeindex
* include/std/unordered_map
* include/std/unordered_set
* include/std/utility
* include/std/valarray
* include/std/variant
* include/std/vector
* include/std/version
* include/tr1/array
* include/tr1/cfenv
* include/tr1/cinttypes
* include/tr1/cmath
* include/tr1/complex
* include/tr1/cstdbool
* include/tr1/cstdint
* include/tr1/cstdio
* include/tr1/cstdlib
* include/tr1/cwchar
* include/tr1/cwctype
* include/tr1/functional
* include/tr1/memory
* include/tr1/random
* include/tr1/regex
* include/tr1/tuple
* include/tr1/type_traits
* include/tr1/unordered_map
* include/tr1/unordered_set
* include/tr1/utility
* include/tr2/bool_set
* include/tr2/dynamic_bitset
* include/tr2/type_traits
* libsupc++/atomic_lockfree_defines.h
* libsupc++/compare
* libsupc++/cxxabi.h
* libsupc++/cxxabi_forced.h
* libsupc++/cxxabi_init_exception.h
* libsupc++/exception
* libsupc++/initializer_list
* libsupc++/new
* libsupc++/typeinfo: Likewise.
* testsuite/20_util/ratio/operations/ops_overflow_neg.cc
* testsuite/23_containers/array/tuple_interface/get_neg.cc
* testsuite/23_containers/vector/cons/destructible_debug_neg.cc
* testsuite/24_iterators/operations/prev_neg.cc
* testsuite/ext/type_traits/add_unsigned_floating_neg.cc
* testsuite/ext/type_traits/add_unsigned_integer_neg.cc
* testsuite/ext/type_traits/remove_unsigned_floating_neg.cc
* testsuite/ext/type_traits/remove_unsigned_integer_neg.cc: Adjust
line numbers.
gcc/testsuite/ChangeLog
* g++.dg/analyzer/fanalyzer-show-events-in-system-headers-default.C
* g++.dg/analyzer/fanalyzer-show-events-in-system-headers-no.C
* g++.dg/diagnostic/disable.C: #define _GLIBCXX_SYSHDR.
|
|
The CI saw failures on 17_intro/headers/c++2011/parallel_mode.cc due to
-Wdeprecated-declarations warnings in some parallel/ headers.
libstdc++-v3/ChangeLog:
* include/parallel/base.h: Suppress -Wdeprecated-declarations.
* include/parallel/multiseq_selection.h: Likewise.
|
|
The following makes us use bitmap tree view for the always-executed-BBs
bitmap as computed by IPA utils find_always_executed_bbs and used by
IPA modref (where it shows up in the profile for PR114855.
* ipa-utils.cc (find_always_executed_bbs): Switch result
bitmap to tree view.
|
|
Older versions of the OpenMP specification were not clear about what counted
as device usage. Newer (like TR13) are rather clear. Hence, this commit adds
GCC's target-used flag also when a 'declare target' or an 'interop' are
encountered. (The latter only to Fortran as C/C++ parsing support is still
missing.) TR13 also lists 'dispatch' as target-used construct (as it has the
device clause) and 'device_safesync' as clause with global requirement
property, but both are not yet supported in GCC.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_declare_target): Set target-used bit
in omp_requires_mask.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_declare_target): Set target-used bit
in omp_requires_mask.
gcc/fortran/ChangeLog:
* parse.cc (decode_omp_directive): Set target-used bit of
omp_requires_mask when encountering the declare_target or interop
directive.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/interop-1.f90: Add dg-error for missing
omp requires requirement and declare_variant usage.
* gfortran.dg/gomp/requires-8.f90: Likewise.
|
|
Some print code for debugging is committed by mistake, remove them
from the test header file.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/scalar_sat_binary_run_xxx.h: Remove printf
code for debugging.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
For the testcase in PR114855 at -O1 add_store_equivs shows up as the
main sink for bitmap_set_bit because it uses a bitmap to mark all
seen insns by UID to make sure the forward walk in memref_used_between_p
will find the insn in question. Given we do have a CFG here the
functions operation is questionable, given memref_used_between_p
together with the walk of all insns is obviously quadratic in the
worst case that whole thing should be re-done ... but, for the
testcase, using a sbitmap of size get_max_uid () + 1 gets
bitmap_set_bit off the profile and improves IRA time from 15.58s (8%)
to 3.46s (2%).
Now, given above quadraticness I wonder whether we should instead
gate add_store_equivs on optimize > 1 or flag_expensive_optimizations.
PR rtl-optimization/114855
* ira.cc (add_store_equivs): Use sbitmap for tracking
visited insns.
|
|
IRAs add_store_equivs is quadratic in the size of the function worst
case, disable it when -fno-expensive-optimizations which means at
-O1 and -Og.
* ira.cc (ira): Gate add_store_equivs on flag_expensive_optimizations.
|
|
For the testcase in PR114855 VRP takes 320.41s (23%) (after mitigating
backwards threader slowness). This is mostly due to the bitmap check
in equiv_oracle::find_equiv_dom. The following turns this bitmap
to tree view, trading the linear search for a O(log N) one which
improves VRP time to 54.54s (5%).
PR tree-optimization/114855
* value-relation.cc (equiv_oracle::equiv_oracle): Switch
m_equiv_set to tree view.
|
|
Take scan-assembler-times for vnclip insn check instead of function body,
as we only care about if we can generate the fixed point insn vnclip.
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: Remove
func body check and take scan asm times instead.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-9.c: Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Take scan-assembler-times for vssub insn check instead of function body,
as we only care about if we can generate the fixed point insn vssub.
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Remove
func body check and take scan asm times instead.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-33.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-34.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-35.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-36.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-37.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-38.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-39.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-40.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c: Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Take scan-assembler-times for vsadd insn check instead of function body,
as we only care about if we can generate the fixed point insn vsadd.
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: Remove
func body check and take scan asm times instead.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-9.c: Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
gcc/ChangeLog:
* config/i386/i386.opt: Update the features included in apxf.
|
|
The testcase decare-variant-duplicates.c added in commit
96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4 failed on 32-bit x86
because on that target "i386" is defined as a preprocessor macro
and cannot be used as an identifier. Fixed by rewriting that test
not to do that.
gcc/testsuite/ChangeLog
* c-c++-common/gomp/declare-variant-duplicates.c: Avoid using
"i386" as an identifier.
|
|
|
|
This patch adds random number support for UNSIGNED, plus fixes
two bugs, with array I/O where the type used to be set to BT_INTEGER,
and for division with the divisor being a constant.
gcc/fortran/ChangeLog:
* check.cc (gfc_check_random_number): Adjust for unsigned.
* iresolve.cc (gfc_resolve_random_number): Handle unsigned.
* trans-expr.cc (gfc_conv_expr_op): Handle BT_UNSIGNED for divide.
* trans-types.cc (gfc_get_dtype_rank_type): Handle BT_UNSIGNED.
* gfortran.texi: Add RANDOM_NUMBER for UNSIGNED.
libgfortran/ChangeLog:
* gfortran.map: Add _gfortran_random_m1, _gfortran_random_m2,
_gfortran_random_m4, _gfortran_random_m8 and _gfortran_random_m16.
* intrinsics/random.c (random_m1): New function.
(random_m2): New function.
(random_m4): New function.
(random_m8): New function.
(random_m16): New function.
(arandom_m1): New function.
(arandom_m2): New function.
(arandom_m4): New function.
(arandom_m8): New funciton.
(arandom_m16): New function.
gcc/testsuite/ChangeLog:
* gfortran.dg/unsigned_30.f90: New test.
|
|
gcc/fortran/ChangeLog:
* check.cc (gfc_check_transf_bit_intrins): Handle unsigned.
* gfortran.texi: Docment IANY, IALL and IPARITY for unsigned.
* iresolve.cc (gfc_resolve_iall): Set flag to use integer
if type is BT_UNSIGNED.
(gfc_resolve_iany): Likewise.
(gfc_resolve_iparity): Likewise.
* simplify.cc (do_bit_and): Adjust asserts for BT_UNSIGNED.
(do_bit_ior): Likewise.
(do_bit_xor): Likewise
gcc/testsuite/ChangeLog:
* gfortran.dg/unsigned_29.f90: New test.
|
|
Forgot to regenerate URLs for the C++23 P2718R0 patch.
2024-09-24 Jakub Jelinek <jakub@redhat.com>
* c.opt.urls: Regenerate.
|
|
gcc/fortran/ChangeLog:
* gfortran.texi: Document SUM and PRODUCT.
* iresolve.cc (resolve_transformational): New argument,
use_integer, to translate calls to unsigned to calls to
integer.
(gfc_resolve_product): Use it
(gfc_resolve_sum): Use it.
* simplify.cc (init_result_expr): Handle BT_UNSIGNED.
libgfortran/ChangeLog:
* generated/product_c10.c: Regenerated.
* generated/product_c16.c: Regenerated.
* generated/product_c17.c: Regenerated.
* generated/product_c4.c: Regenerated.
* generated/product_c8.c: Regenerated.
* generated/product_i1.c: Regenerated.
* generated/product_i16.c: Regenerated.
* generated/product_i2.c: Regenerated.
* generated/product_i4.c: Regenerated.
* generated/product_i8.c: Regenarated.
* generated/product_r10.c: Regenerated.
* generated/product_r16.c: Regenerated.
* generated/product_r17.c: Regenerated.
* generated/product_r4.c: Regenerated.
* generated/product_r8.c: Regenarated.
* generated/sum_c10.c: Regenerated.
* generated/sum_c16.c: Regenerated.
* generated/sum_c17.c: Regenerated.
* generated/sum_c4.c: Regenerated.
* generated/sum_c8.c: Regenerated.
* generated/sum_i1.c: Regenerated.
* generated/sum_i16.c: Regenerated.
* generated/sum_i2.c: Regenerated.
* generated/sum_i4.c: Regenerated.
* generated/sum_i8.c: Regenerated.
* generated/sum_r10.c: Regenerated.
* generated/sum_r16.c: Regenerated.
* generated/sum_r17.c: Regenerated.
* generated/sum_r4.c: Regenerated.
* generated/sum_r8.c: Regenerated.
* m4/ifunction.m4: Whitespace fix.
* m4/product.m4: If type is integer, change to unsigned.
* m4/sum.m4: Likewise.
|