aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-10-11middle-end/117086 - fixup vec_cond simplificationsRichard Biener2-21/+36
The following adds missing checks for a vector type result type to simplifications that end up creating a vec_cond. PR middle-end/117086 * match.pd ((op (vec_cond ...) ..) -> (vec_cond ...)): Add missing checks for VECTOR_TYPE_P (type). * gcc.dg/torture/pr117086.c: New testcase.
2024-10-11RISC-V: Add testcases for form 8 of scalar signed SAT_TRUNCPan Li13-0/+271
Form 8: #define DEF_SAT_S_TRUNC_FMT_8(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_8 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN > x || x >= (WT)NT_MAX \ ? x < 0 ? NT_MIN : NT_MAX \ : trunc; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-8-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-8-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-8-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-8-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-8-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-8-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11RISC-V: Add testcases for form 7 of scalar signed SAT_TRUNCPan Li13-0/+271
Form 7: #define DEF_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_7 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN >= x || x >= (WT)NT_MAX \ ? x < 0 ? NT_MIN : NT_MAX \ : trunc; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-7-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-7-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-7-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-7-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-7-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-7-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11RISC-V: Add testcases for form 6 of scalar signed SAT_TRUNCPan Li13-0/+271
Form 6: #define DEF_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_6 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN >= x || x > (WT)NT_MAX \ ? x < 0 ? NT_MIN : NT_MAX \ : trunc; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-6-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-6-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-6-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-6-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-6-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-6-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11RISC-V: Add testcases for form 5 of scalar signed SAT_TRUNCPan Li13-0/+271
Form 5: #define DEF_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_5 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN > x || x > (WT)NT_MAX \ ? x < 0 ? NT_MIN : NT_MAX \ : trunc; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-5-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-5-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-5-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-5-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-5-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-5-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11RISC-V: Add testcases for form 4 of scalar signed SAT_TRUNCPan Li13-0/+271
Form 4: #define DEF_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_4 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN <= x && x < (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-4-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-4-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-4-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-4-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-4-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-4-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11Match: Support form 4 for scalar signed integer SAT_TRUNCPan Li1-0/+1
This patch would like to support the form 4 of the scalar signed integer SAT_TRUNC. Aka below example: Form 4: #define DEF_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_4 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN <= x && x < (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } DEF_SAT_S_TRUNC_FMT_4(int8_t, int16_t, INT8_MIN, INT8_MAX) Before this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_trunc_int16_t_to_int8_t_fmt_4 (int16_t x) 6 │ { 7 │ int8_t trunc; 8 │ unsigned short x.0_1; 9 │ unsigned short _2; 10 │ int8_t _3; 11 │ _Bool _7; 12 │ signed char _8; 13 │ signed char _9; 14 │ signed char _10; 15 │ 16 │ ;; basic block 2, loop depth 0 17 │ ;; pred: ENTRY 18 │ x.0_1 = (unsigned short) x_4(D); 19 │ _2 = x.0_1 + 128; 20 │ if (_2 > 254) 21 │ goto <bb 4>; [50.00%] 22 │ else 23 │ goto <bb 3>; [50.00%] 24 │ ;; succ: 4 25 │ ;; 3 26 │ 27 │ ;; basic block 3, loop depth 0 28 │ ;; pred: 2 29 │ trunc_5 = (int8_t) x_4(D); 30 │ goto <bb 5>; [100.00%] 31 │ ;; succ: 5 32 │ 33 │ ;; basic block 4, loop depth 0 34 │ ;; pred: 2 35 │ _7 = x_4(D) < 0; 36 │ _8 = (signed char) _7; 37 │ _9 = -_8; 38 │ _10 = _9 ^ 127; 39 │ ;; succ: 5 40 │ 41 │ ;; basic block 5, loop depth 0 42 │ ;; pred: 3 43 │ ;; 4 44 │ # _3 = PHI <trunc_5(3), _10(4)> 45 │ return _3; 46 │ ;; succ: EXIT 47 │ 48 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_trunc_int16_t_to_int8_t_fmt_4 (int16_t x) 6 │ { 7 │ int8_t _3; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _3 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _3; 13 │ ;; succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add case 4 matching pattern for signed SAT_TRUNC. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11RISC-V: Add testcases for form 3 of scalar signed SAT_TRUNCPan Li13-0/+271
Form 3: #define DEF_SAT_S_TRUNC_FMT_3(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_3 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN < x && x <= (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-3-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-3-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-3-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-3-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-3-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-3-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11Match: Support form 3 for scalar signed integer SAT_TRUNCPan Li1-0/+3
This patch would like to support the form 3 of the scalar signed integer SAT_TRUNC. Aka below example: Form 3: #define DEF_SAT_S_TRUNC_FMT_3(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_3 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN < x && x <= (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } DEF_SAT_S_TRUNC_FMT_3(int8_t, int16_t, INT8_MIN, INT8_MAX) Before this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_sub_int8_t_fmt_3 (int8_t x, int8_t y) 6 │ { 7 │ signed char _1; 8 │ signed char _2; 9 │ int8_t _3; 10 │ __complex__ signed char _6; 11 │ _Bool _8; 12 │ signed char _9; 13 │ signed char _10; 14 │ signed char _11; 15 │ 16 │ ;; basic block 2, loop depth 0 17 │ ;; pred: ENTRY 18 │ _6 = .SUB_OVERFLOW (x_4(D), y_5(D)); 19 │ _2 = IMAGPART_EXPR <_6>; 20 │ if (_2 != 0) 21 │ goto <bb 4>; [50.00%] 22 │ else 23 │ goto <bb 3>; [50.00%] 24 │ ;; succ: 4 25 │ ;; 3 26 │ 27 │ ;; basic block 3, loop depth 0 28 │ ;; pred: 2 29 │ _1 = REALPART_EXPR <_6>; 30 │ goto <bb 5>; [100.00%] 31 │ ;; succ: 5 32 │ 33 │ ;; basic block 4, loop depth 0 34 │ ;; pred: 2 35 │ _8 = x_4(D) < 0; 36 │ _9 = (signed char) _8; 37 │ _10 = -_9; 38 │ _11 = _10 ^ 127; 39 │ ;; succ: 5 40 │ 41 │ ;; basic block 5, loop depth 0 42 │ ;; pred: 3 43 │ ;; 4 44 │ # _3 = PHI <_1(3), _11(4)> 45 │ return _3; 46 │ ;; succ: EXIT 47 │ 48 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_trunc_int16_t_to_int8_t_fmt_3 (int16_t x) 6 │ { 7 │ int8_t _3; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _3 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _3; 13 │ ;; succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add case 3 matching pattern for signed SAT_TRUNC. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11RISC-V: Add testcases for form 2 of scalar signed SAT_TRUNCPan Li13-0/+271
Form 2: #define DEF_SAT_S_TRUNC_FMT_2(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_2 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN < x && x < (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-2-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-2-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-2-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-2-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-2-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-2-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11Match: Support form 2 for scalar signed integer SAT_TRUNCPan Li1-8/+13
This patch would like to support the form 2 of the scalar signed integer SAT_TRUNC. Aka below example: Form 2: #define DEF_SAT_S_TRUNC_FMT_2(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_2 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN < x && x < (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } DEF_SAT_S_TRUNC_FMT_2(int8_t, int16_t, INT8_MIN, INT8_MAX) Before this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_trunc_int16_t_to_int8_t_fmt_2 (int16_t x) 6 │ { 7 │ int8_t trunc; 8 │ unsigned short x.0_1; 9 │ unsigned short _2; 10 │ int8_t _3; 11 │ _Bool _7; 12 │ signed char _8; 13 │ signed char _9; 14 │ signed char _10; 15 │ 16 │ ;; basic block 2, loop depth 0 17 │ ;; pred: ENTRY 18 │ x.0_1 = (unsigned short) x_4(D); 19 │ _2 = x.0_1 + 127; 20 │ if (_2 > 253) 21 │ goto <bb 4>; [50.00%] 22 │ else 23 │ goto <bb 3>; [50.00%] 24 │ ;; succ: 4 25 │ ;; 3 26 │ 27 │ ;; basic block 3, loop depth 0 28 │ ;; pred: 2 29 │ trunc_5 = (int8_t) x_4(D); 30 │ goto <bb 5>; [100.00%] 31 │ ;; succ: 5 32 │ 33 │ ;; basic block 4, loop depth 0 34 │ ;; pred: 2 35 │ _7 = x_4(D) < 0; 36 │ _8 = (signed char) _7; 37 │ _9 = -_8; 38 │ _10 = _9 ^ 127; 39 │ ;; succ: 5 40 │ 41 │ ;; basic block 5, loop depth 0 42 │ ;; pred: 3 43 │ ;; 4 44 │ # _3 = PHI <trunc_5(3), _10(4)> 45 │ return _3; 46 │ ;; succ: EXIT 47 │ 48 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_trunc_int16_t_to_int8_t_fmt_2 (int16_t x) 6 │ { 7 │ int8_t _3; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _3 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _3; 13 │ ;; succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add case 2 matching pattern for signed SAT_TRUNC. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11i386: Fix up spaceship expanders for -mtune=i[45]86 [PR117053]Jakub Jelinek2-38/+113
The adjusted and new spaceship expanders ICE with -mtune=i486 or -mtune=i586. The problem is that in that case TARGET_ZERO_EXTEND_WITH_AND is true and zero_extendqisi2 isn't allowed in that case, and we can't use the replacement AND, because that clobbers flags and we want to use them again. The following patch fixes that by using in those cases roughly what we want to expand it to after peephole2 optimizations, i.e. xor before the comparison, *setcc_qi_slp and sbbl $0 (or for signed int case xoring of 2 regs, two *setcc_qi_slp, subl). For *setcc_qi_slp, it uses the setcc_si_slp hacks with UNSPEC that were in use for the floating point jp case (so such code is IMHO undesirable for the !TARGET_ZERO_EXTEND_WITH_AND case as we want to give combiner more liberty in that case). 2024-10-11 Jakub Jelinek <jakub@redhat.com> PR target/117053 * config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Handle TARGET_ZERO_EXTEND_WITH_AND differently. (ix86_expand_int_spaceship): Likewise. * g++.target/i386/pr116896-3.C: New test.
2024-10-11tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLPRichard Biener2-1/+20
The following temporarily reverts the support of permuted .MASK_LOAD for the case of non-grouped accesses. PR tree-optimization/117050 * tree-vect-slp.cc (vect_build_slp_tree_2): Do not support permutes of non-grouped .MASK_LOAD. * gcc.dg/vect/pr117050.c: New testcase.
2024-10-11Fix possible wrong-code with masked store-lanesRichard Biener1-10/+20
When we're doing masked store-lanes one mask element applies to all loads of one struct element. This requires uniform masks for all of the SLP lanes, something we already compute into STMT_VINFO_SLP_VECT_ONLY but fail to check when doing SLP store-lanes. The following corrects this. The following also adjusts the store-lane heuristic to properly check for masked or non-masked optab support. * tree-vect-slp.cc (vect_slp_prefer_store_lanes_p): Allow passing in of vectype, pass in whether the stores are masked and query the correct optab. (vect_build_slp_instance): Guard store-lanes query with ! STMT_VINFO_SLP_VECT_ONLY, guaranteeing an uniform mask.
2024-10-11i386: Fix some patterns's mem attribute.Hu, Lin11-10/+12
Hi, all This is another patch to modify some pattern's type attr from ssemov to ssemov2. Some ssemov pattern's mem attr should be load when their 2 operand is a memory operand. Bootstrapped and regtested on x86-64-linux-pc, OK for trunk? BRs, Lin gcc/ChangeLog: * config/i386/sse.md (sse_movhlps): Change type attr from ssemov to ssemov2. (sse_loadhps): Ditto. (*vec_concat<mode>): Ditto. (vec_setv2df_0): Ditto. (sse_loadlps): Change attr from ssemov to ssemov2 except for 2, 3. (sse2_loadhps): Change attr from ssemov to ssemov2 except for 0, 1. (sse2_loadlpd): Change attr from ssemov to ssemov2 except for 0, 1, 2. (sse2_movsd_<mode>): Change attr from ssemov to ssemov2 except for 5. (vec_concatv2df): Change attr from ssemov to ssemov2 except for 0, 1, 2. (*vec_concat<mode>): Change attr from ssemov to ssemov2 for 3, 4. (vec_concatv2di): Change attr from ssemov to ssemov2 except for 0, 1, 2, 3, 4, 5.
2024-10-11Daily bump.GCC Administrator3-1/+142
2024-10-10aarch64: Alter pr116258.c test to correct for big endian.Richard Ball1-1/+2
The test at pr116258.c fails on big endian targets, this is because the test checks that the index of a floating point multiply is 0, which is correct only for little endian. gcc/testsuite/ChangeLog: PR tree-optimization/116258 * gcc.target/aarch64/pr116258.c: Alter test to add big-endian support.
2024-10-10Fix PR116650: check all regs in regrename targetsMichael Matz1-6/+19
(this came up for m68k vs. LRA, but is a generic problem) Regrename wants to use new registers for certain def-use chains. For validity of replacements it needs to check that the selected candidates are unused up to then. That's done in check_new_reg_p. But if it so happens that the new register needs more hardregs than the old register (which happens if the target allows inter-bank moves and the mode is something like a DFmode that needs to be placed into a SImode reg-pair), then check_new_reg_p only checks the first of those registers for free-ness. This is caused by that function looking up the number of necessary hardregs only in terms of the old hardreg number. It of course needs to do that in terms of the new candidate regnumber. The symptom is that regrename sometimes clobbers the higher numbered registers of such a regrename target pair. This patch fixes that problem. (In the particular case of the bug report it was LRA that left over a inter-bank move instruction that triggers regrename, ultimately causing the mis-compile. Reload didn't do that, but in general we of course can't rely on such moves not happening if the target allows them.) This also shows a general confusion in that function and the target hook interface here: for (i = nregs - 1; i >= 0; --) ... || ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i)) it uses nregs in a way that requires it to be the same between old and new register. The problem is that the target hook only gets register numbers, when it instead should get a mode and register numbers and would be called only for the first but not for subsequent registers. I've looked at a number of definitions of that target hook and I think that this is currently harmless in the sense that it would merely rule out some potential reg-renames that would in fact be okay to do. So I'm not changing the target hook interface here and hence that problem remains unfixed. PR rtl-optimization/116650 * regrename.cc (check_new_reg_p): Calculate nregs in terms of the new candidate register.
2024-10-10phiopt: Remove candorest variable return insteadAndrew Pinski1-6/+1
After r15-3560-gb081e6c860eb9688d24365d39, the setting of candorest with the break can just change to a return since this is inside a lambda now. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (pass_phiopt::execute): Remove candorest and return instead of setting candorest. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-10-10RISC-V:Bugfix for C++ code compilation failure with rv32imafc_zve32f[pr116883]Li Xu2-1/+21
From: xuli <xuli1@eswincomputing.com> Example as follows: int main() { unsigned long arraya[128], arrayb[128], arrayc[128]; for (int i = 0; i < 128; i++) { arraya[i] = arrayb[i] + arrayc[i]; } return 0; } Compiled with -march=rv32imafc_zve32f -mabi=ilp32f, it will cause a compilation issue: riscv_vector.h:40:25: error: ambiguating new declaration of 'vint64m4_t __riscv_vle64(vbool16_t, const long long int*, unsigned int)' 40 | #pragma riscv intrinsic "vector" | ^~~~~~~~ riscv_vector.h:40:25: note: old declaration 'vint64m1_t __riscv_vle64(vbool64_t, const long long int*, unsigned int)' With zvl=32b, vbool16_t is registered in init_builtins() with type_common.precision=0x101 (nunits=2), mode_nunits[E_RVVMF16BI]=[2,2]. Normally, vbool64_t is only valid when TARGET_MIN_VLEN > 32, so vbool64_t is not registered in init_builtins(), meaning vbool64_t=null. In order to implement __attribute__((target("arch=+v"))), we must register all vector types and all RVV intrinsics. Therefore, vbool64_t will be registered by default with zvl=128b in reinit_builtins(), resulting in type_common.precision=0x101 (nunits=2) and mode_nunits[E_RVVMF64BI]=[2,2]. We then get TYPE_VECTOR_SUBPARTS(vbool16_t) == TYPE_VECTOR_SUBPARTS(vbool64_t), calculated using type_common.precision, resulting in 2. Since vbool16_t and vbool64_t have the same element type (boolean_type), the compiler treats them as the same type, leading to a re-declaration conflict. After all types and intrinsics have been registered, processing __attribute__((target("arch=+v"))) will update the parameters option and init_adjust_machine_modes. Therefore, to avoid conflicts, we can choose zvl=4096b for the null type reinit_builtins(). command option zvl=32b type nunits vbool64_t => null vbool32_t=> [1,1] vbool16_t=> [2,2] vbool8_t=> [4,4] vbool4_t=> [8,8] vbool2_t=> [16,16] vbool1_t=> [32,32] reinit zvl=128b vbool64_t => [2,2] conflict with zvl32b vbool16_t=> [2,2] reinit zvl=256b vbool64_t => [4,4] conflict with zvl32b vbool8_t=> [4,4] reinit zvl=512b vbool64_t => [8,8] conflict with zvl32b vbool4_t=> [8,8] reinit zvl=1024b vbool64_t => [16,16] conflict with zvl32b vbool2_t=> [16,16] reinit zvl=2048b vbool64_t => [32,32] conflict with zvl32b vbool1_t=> [32,32] reinit zvl=4096b vbool64_t => [64,64] zvl=4096b is ok Signed-off-by: xuli <xuli1@eswincomputing.com> PR target/116883 gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_pragma_intrinsic_flags_pollute): Choose zvl4096b to initialize null type. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/pr116883.C: New test.
2024-10-10vect: Avoid divide by zero for permutes of extern VLA vectorsRichard Sandiford1-3/+12
My recent VLA SLP patches caused a regression with cross compilers in gcc.dg/torture/neon-sve-bridge.c. There we have a VEC_PERM_EXPR created from two BIT_FIELD_REFs, with the child node being an external VLA vector: note: node 0x3704a70 (max_nunits=1, refcnt=2) vector(2) long int note: op: VEC_PERM_EXPR note: stmt 0 val1Return_9 = BIT_FIELD_REF <sveReturn_8, 64, 0>; note: stmt 1 val2Return_10 = BIT_FIELD_REF <sveReturn_8, 64, 64>; note: lane permutation { 0[0] 0[1] } note: children 0x3704b08 note: node (external) 0x3704b08 (max_nunits=1, refcnt=1) svint64_t note: { } For this kind of external node, the SLP_TREE_LANES is normally the total number of lanes in the vector, but it is zero if the vector has variable length: auto nunits = TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode)); unsigned HOST_WIDE_INT const_nunits; if (nunits.is_constant (&const_nunits)) SLP_TREE_LANES (vnode) = const_nunits; This led to division by zero in: /* Check whether the output has N times as many lanes per vector. */ else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits, SLP_TREE_LANES (child) * nunits, &this_unpack_factor) && (i == 0 || unpack_factor == this_unpack_factor)) unpack_factor = this_unpack_factor; No repetition takes place for this kind of external node, so this patch goes with Richard's suggestion to check for external nodes that have no scalar statements. This didn't show up for my native testing since division by zero doesn't trap on AArch64. gcc/ * tree-vect-slp.cc (vectorizable_slp_permutation_1): Set repeating_p to false if we have an external node for a pre-existing vector.
2024-10-10tree-optimization/117060 - fix oversight in vect_build_slp_tree_1Richard Biener2-2/+24
We are failing to match call vs. non-call when dealing with matching loads or stores. PR tree-optimization/117060 * tree-vect-slp.cc (vect_build_slp_tree_1): When comparing calls also fail if the first isn't a call. * gfortran.dg/pr117060.f90: New testcase.
2024-10-10match.pd: Check trunc_mod vector obtap before folding.Jennifer Schmitz2-2/+17
This patch guards the simplification x / y * y == x -> x % y == 0 in match.pd by a check for: 1) Non-vector mode of x OR 2) Lack of support for vector division OR 3) Support of vector modulo The patch was bootstrapped and tested with no regression on aarch64-linux-gnu and x86_64-linux-gnu. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ PR tree-optimization/116831 * match.pd: Guard simplification to trunc_mod with check for mod optab support. gcc/testsuite/ PR tree-optimization/116831 * gcc.dg/torture/pr116831.c: New test.
2024-10-10Allow SLP store of mixed external and constantRichard Biener4-21/+26
vect_build_slp_tree_1 rejected this during SLP discovery because it ran into the rhs code comparison code for stores. The following skips that completely for loads and stores as those are handled later anyway. This needs a heuristic adjustment in vect_get_and_check_slp_defs to avoid fallout with regard to BB vectorization and splitting of a store group vs. demoting one operand to external. gcc.dg/Wstringop-overflow-47.c needs adjustment given we now have vast improvements for code generation. gcc.dg/strlenopt-32.c needs adjustment because the strlen pass doesn't handle _11 = {0, b_6(D)}; __builtin_memcpy (&a, "foo.bar", 8); MEM <vector(2) char> [(char *)&a + 3B] = _11; _9 = strlen (&a); I have opened PR117057 for this. * tree-vect-slp.cc (vect_build_slp_tree_1): Do not compare RHS codes for loads or stores. (vect_get_and_check_slp_defs): Only demote operand to external in case there is more than one operand. * gcc.dg/vect/slp-57.c: New testcase. * gcc.dg/Wstringop-overflow-47.c: Adjust. * gcc.dg/strlenopt-32.c: XFAIL parts.
2024-10-10Add a new tune avx256_avoid_vec_perm for SRF.liuhongt4-2/+43
According to Intel SOM[1], For Crestmont, most 256-bit Intel AVX2 instructions can be decomposed into two independent 128-bit micro-operations, except for a subset of Intel AVX2 instructions, known as cross-lane operations, can only compute the result for an element by utilizing one or more sources belonging to other elements. The 256-bit instructions listed below use more operand sources than can be natively supported by a single reservation station within these microarchitectures. They are decomposed into two μops, where the first μop resolves a subset of operand dependencies across two cycles. The dependent second μop executes the 256-bit operation by using a single 128-bit execution port for two consecutive cycles with a five-cycle latency for a total latency of seven cycles. VPERM2I128 ymm1, ymm2, ymm3/m256, imm8 VPERM2F128 ymm1, ymm2, ymm3/m256, imm8 VPERMPD ymm1, ymm2/m256, imm8 VPERMPS ymm1, ymm2, ymm3/m256 VPERMD ymm1, ymm2, ymm3/m256 VPERMQ ymm1, ymm2/m256, imm8 Instead of setting tune avx128_optimal for SRF, the patch add a new tune avx256_avoid_vec_perm for it. so by default, vectorizer still uses 256-bit VF if cost is profitable, but lowers to 128-bit whenever 256-bit vec_perm is needed for auto-vectorization. w/o vec_perm, performance of 256-bit vectorization should be similar as 128-bit ones(some benchmark results show it's even better than 128-bit vectorization since it enables more parallelism for convert cases.) [1] https://www.intel.com/content/www/us/en/content-details/814198/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs): Add new member m_num_avx256_vec_perm. (ix86_vector_costs::add_stmt_cost): Record 256-bit vec_perm. (ix86_vector_costs::finish_cost): Prevent vectorization for TAREGT_AVX256_AVOID_VEC_PERM when there's 256-bit vec_perm instruction. * config/i386/i386.h (TARGET_AVX256_AVOID_VEC_PERM): New Macro. * config/i386/x86-tune.def (X86_TUNE_AVX256_SPLIT_REGS): Add m_CORE_ATOM. (X86_TUNE_AVX256_AVOID_VEC_PERM): New tune. gcc/testsuite/ChangeLog: * gcc.target/i386/avx256_avoid_vec_perm.c: New test.
2024-10-10Add new microarchitecture tune for SRF/GRR/CWF.liuhongt4-12/+34
For Crestmont, 4-operand vex blendv instructions come from MSROM and is slower than 3-instructions sequence (op1 & mask) | (op2 & ~mask). legacy blendv instruction can still be handled by the decoder. The patch add a new tune which is enabled for all processors except for SRF/CWF. It will use vpand + vpandn + vpor instead of vpblendvb(similar for vblendvps/vblendvpd) for SRF/CWF. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Guard instruction blendv generation under new tune. * config/i386/i386.h (TARGET_SSE_MOVCC_USE_BLENDV): New Macro. * config/i386/x86-tune.def (X86_TUNE_SSE_MOVCC_USE_BLENDV): New tune.
2024-10-10x86: Implement Fast-Math Float Truncation to BF16 via PSRLD InstructionLevy Hsu3-7/+83
gcc/ChangeLog: * config/i386/i386.md: Rewrite insn truncsfbf2. gcc/testsuite/ChangeLog: * gcc.target/i386/truncsfbf-1.c: New test. * gcc.target/i386/truncsfbf-2.c: New test.
2024-10-09diagnostics: move text output member functions to correct fileDavid Malcolm3-87/+71
No functional change intended. gcc/ChangeLog: * diagnostic-format-text.cc (diagnostic_text_output_format::after_diagnostic): Replace call to show_any_path with body, taken from diagnostic.cc. (diagnostic_text_output_format::build_prefix): Move here from diagnostic.cc, updating to use get_diagnostic_kind_text and diagnostic_get_color_for_kind. (diagnostic_text_output_format::file_name_as_prefix): Move here from diagnostic.cc (diagnostic_text_output_format::append_note): Likewise. * diagnostic-format-text.h (diagnostic_text_output_format::show_any_path): Drop decl. * diagnostic.cc (diagnostic_text_output_format::file_name_as_prefix): Move to diagnostic-format-text.cc. (diagnostic_text_output_format::build_prefix): Likewise. (diagnostic_text_output_format::show_any_path): Move to body of diagnostic_text_output_format::after_diagnostic. (diagnostic_text_output_format::append_note): Move to diagnostic-format-text.cc. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-10-09diagnostics: mark the JSON output format as deprecatedDavid Malcolm1-264/+2
The bulk of the documentation for -fdiagnostics-format= is taken up by a description of the "json" format added in r9-4156-g478dd60ddcf177. I don't plan to add any extra features to the "json" format; all my future work on machine-readable GCC diagnostics is likely to be on the SARIF output format (https://gcc.gnu.org/wiki/SARIF). Hence users seeking machine-readable output from GCC should use SARIF. This patch removes the long documentation of the format and describes it as deprecated. gcc/ChangeLog: * doc/invoke.texi (fdiagnostics-format): Describe "json" et al as deprecated, and remove the long description of the output format. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-10-09lto: reimplement print_lto_docs_link [PR116613]David Malcolm1-13/+4
gcc/ChangeLog: PR other/116613 * lto-wrapper.cc (print_lto_docs_link): Use a format string rather than building the string manually. Fix memory leak of "url" by using label_text. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-10-10Daily bump.GCC Administrator5-1/+233
2024-10-10Adjust testcase after relax O2 vectorization.liuhongt21-37/+23
gcc/testsuite/ChangeLog: * gcc.dg/fstack-protector-strong.c: Adjust scan-assembler-times. * gcc.dg/graphite/scop-6.c: Refine the testcase to avoid array out of bounds. * gcc.dg/graphite/scop-9.c: Ditto. * gcc.dg/tree-ssa/ivopts-lt-2.c: Add -fno-tree-vectorize. * gcc.dg/tree-ssa/ivopts-lt.c: Ditto. * gcc.dg/tree-ssa/loop-16.c: Ditto. * gcc.dg/tree-ssa/loop-28.c: Ditto. * gcc.dg/tree-ssa/loop-bound-2.c: Ditto. * gcc.dg/tree-ssa/loop-bound-4.c: Ditto. * gcc.dg/tree-ssa/loop-bound-6.c: Ditto. * gcc.dg/tree-ssa/predcom-4.c: Ditto. * gcc.dg/tree-ssa/predcom-5.c: Ditto. * gcc.dg/tree-ssa/scev-11.c: Ditto. * gcc.dg/tree-ssa/scev-9.c: Ditto. * gcc.dg/tree-ssa/split-path-11.c: Ditto. * gcc.dg/unroll-8.c: Ditto. * gcc.dg/var-expand1.c: Ditto. * gcc.dg/vect/vect-cost-model-6.c: Removed. * gcc.target/i386/pr86270.c: Ditto. * gcc.target/i386/pr86722.c: Ditto. * gcc.target/x86_64/abi/callabi/leaf-2.c: Ditto.
2024-10-10Enable vectorization for unknown tripcount in very cheap cost model but ↵liuhongt2-10/+7
disable epilog vectorization. gcc/ChangeLog: * tree-vect-loop.cc (vect_analyze_loop_costing): Enable vectorization for LOOP_VINFO_PEELING_FOR_NITER in very cheap cost model. (vect_analyze_loop): Disable epilogue vectorization in very cheap cost model. * doc/invoke.texi: Adjust documents for very-cheap cost model.
2024-10-09RISC-V: Optimize branches with shifted immediate operandsJovan Vukic4-3/+63
After the valuable feedback I received, it’s clear to me that the oversight was in the tests showing the benefits of the patch. In the test file, I added functions f5 and f6, which now generate more efficient code with fewer instructions. Before the patch: f5: li a4,2097152 addi a4,a4,-2048 li a5,1167360 and a0,a0,a4 addi a5,a5,-2048 beq a0,a5,.L4 f6: li a5,3407872 addi a5,a5,-2048 and a0,a0,a5 li a5,1114112 beq a0,a5,.L7 After the patch: f5: srli a5,a0,11 andi a5,a5,1023 li a4,569 beq a5,a4,.L5 f6: srli a5,a0,11 andi a5,a5,1663 li a4,544 beq a5,a4,.L9 PR target/115921 gcc/ChangeLog: * config/riscv/iterators.md (any_eq): New code iterator. * config/riscv/riscv.h (COMMON_TRAILING_ZEROS): New macro. (SMALL_AFTER_COMMON_TRAILING_SHIFT): Ditto. * config/riscv/riscv.md (*branch<ANYI:mode>_shiftedarith_<optab>_shifted): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/branch-1.c: Additional tests.
2024-10-09Revert "RISC-V: Add implication for M extension."Jeff Law1-2/+0
This reverts commit 0a193466f2e87acef9b86e0d086bc6f6017518b0.
2024-10-09Revert "RISC-V: Enable builtin __riscv_mul with Zmmul extension."Jeff Law1-1/+1
This reverts commit 2990f5802a727cbd717587c3a345fa940193049f.
2024-10-09Fix LTO bootstrap failure with -Werror=lto-type-mismatchEric Botcazou4-14/+12
In GNAT's implementation model, using convention C (or C_Pass_By_Copy) has no effect on the internal representation of types since the representation is identical to that of C by default. It's even counter-productive given the implementation advice listed in B.3(63-71) so the interface between the front-end and gigi does not use it and instead uses structurally identical types on both sides. gcc/ada PR ada/117038 * fe.h (struct c_array): Add 'const' to declaration of pointer. (C_Source_Buffer): Use consistent formatting. * par-ch3.adb (P_Component_Items): Properly set Aliased_Present on access definition. * sinput.ads: Remove clause for Interfaces.C. (C_Array): Change type of Length to Integer and make both components aliased. Remove Convention aspect. (C_Source_Buffer): Remove all aspects. * sinput.adb (C_Source_Buffer): Adjust to above change.
2024-10-09Remove support for HP-UX 10Eric Botcazou6-2384/+0
gcc/ada * Makefile.rtl: Remove HP-UX 10 section. * libgnarl/s-osinte__hpux-dce.ads: Delete. * libgnarl/s-osinte__hpux-dce.adb: Likewise. * libgnarl/s-taprop__hpux-dce.adb: Likewise. * libgnarl/s-taspri__hpux-dce.ads: Likewise. * libgnat/s-oslock__hpux-dce.ads: Likewise.
2024-10-09c++: more modules and -MJason Merrill1-2/+5
In r15-4119-gc877a27f04f648 I told preprocess_file to use the directives-only scan with modules, but it seems that I also need to set the cpp_option so that communication between _cpp_handle_directive and scan_translation_unit_directives_only works properly in c-c++-common/cpp/embed-6.c. gcc/c-family/ChangeLog: * c-ppoutput.cc (preprocess_file): Set directives_only flag.
2024-10-09testsuite: arm: use effective-target for mod* testsTorbjörn SVENSSON2-2/+2
This fixes a typo introduced in r15-4200-gcf08dd297ca that was reported at https://linaro.atlassian.net/browse/GNU-1369. gcc/testsuite/ChangeLog * gcc.target/arm/mod_2.c: Corrected effective-target to arm_cpu_cortex_a57_ok. * gcc.target/arm/mod_256.c: Likewise. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-10-09aarch64: Fix SVE ACLE gimple folds for C++ LTO [PR116629]Richard Sandiford2-1/+382
The SVE ACLE code has two ways of handling overloaded functions. One, used by C, is to define a single dummy function for each unique overloaded name, with resolve_overloaded_builtin then resolving calls to real non-overloaded functions. The other, used by C++, is to define a separate function for each individual overload. The builtins harness assigns integer function codes programmatically. However, LTO requires it to use the same assignment for every translation unit, regardless of language. This means that C++ TUs need to create (unused) slots for the C overloads and that C TUs need to create (unused) slots for the C++ overloads. In many ways, it doesn't matter whether the LTO frontend itself uses the C approach or the C++ approach to defining overloaded functions, since the LTO frontend never has to resolve source-level overloading. However, the C++ approach of defining a separate function for each overload means that C++ calls never need to be redirected to a different function. Calls to an overload can appear in the LTO dump and survive until expand. In contrast, calls to C's dummy overload functions are resolved by the front end and never survive to LTO (or expand). Some optimisations work by moving between sibling functions, such as _m to _x. If the source function is an overload, the expected destination function is too. The LTO frontend needs to define C++ overloads if it wants to do this optimisation properly for C++. The PR is about a tree checking failure caused by trying to use a stubbed-out C++ overload in LTO. Dealing with that by detecting the stub (rather than changing which overloads are defined) would have turned this from an ice-on-valid to a missed optimisation. In future, it would probably make sense to redirect overloads to non-overloaded functions during gimple folding, in case that exposes more CSE opportunities. But it'd probably be of limited benefit, since it should be rare for code to mix overloaded and non-overloaded uses of the same operation. It also wouldn't be suitable for backports. gcc/ PR target/116629 * config/aarch64/aarch64-sve-builtins.cc (function_builder::function_builder): Use direct overloads for LTO. gcc/testsuite/ PR target/116629 * gcc.target/aarch64/sve/acle/general/pr106326_2.c: New test.
2024-10-09testsuite: Make check-function-bodies work with LTORichard Sandiford1-8/+16
This patch tries to make check-function-bodies automatically choose between reading the regular assembly file and reading the LTO assembly file. There should only ever be one right answer, since check-function-bodies doesn't make sense on slim LTO output. Maybe this will turn out to be impossible to get right, but I'd like to try at least. gcc/testsuite/ * lib/scanasm.exp (check-function-bodies): Look in ltrans0.ltrans.s if the test appears to be using LTO.
2024-10-09libstdc++: Make std::construct_at support arrays (LWG 3436)Jonathan Wakely1-0/+1
The issue was approved at the recent St. Louis meeting, requiring support for bounded arrays, but only without arguments to initialize the array elements. libstdc++-v3/ChangeLog: * include/bits/stl_construct.h (construct_at): Support array types (LWG 3436). * testsuite/20_util/specialized_algorithms/construct_at/array.cc: New test. * testsuite/20_util/specialized_algorithms/construct_at/array_neg.cc: New test. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist-opt1.C: Adjust for different diagnostics from std::construct_at by adding -fconcepts-diagnostics-depth=2.
2024-10-09Clear DR_GROUP_NEXT_ELEMENT upon group dissolvingRichard Biener1-0/+3
I've tried to sanitize DR_GROUP_NEXT_ELEMENT accesses but there are too many so the following instead makes sure DR_GROUP_NEXT_ELEMENT is never non-NULL for !STMT_VINFO_GROUPED_ACCESS. * tree-vect-data-refs.cc (vect_analyze_data_ref_access): When cancelling a DR group also clear DR_GROUP_NEXT_ELEMENT.
2024-10-09tree-optimization/117041 - fix load classification of former grouped loadRichard Biener2-2/+14
When we first detect a grouped load but later dis-associate it we only set DR_GROUP_FIRST_ELEMENT to NULL, indicating it is not a STMT_VINFO_GROUPED_ACCESS but leave DR_GROUP_NEXT_ELEMENT set. This causes a stray DR_GROUP_NEXT_ELEMENT access in get_group_load_store_type to go wrong, indicating a load isn't single_element_p when it actually is, leading to wrong classification and an ICE. PR tree-optimization/117041 * tree-vect-stmts.cc (get_group_load_store_type): Only check DR_GROUP_NEXT_ELEMENT for STMT_VINFO_GROUPED_ACCESS. * gcc.dg/torture/pr117041.c: New testcase.
2024-10-09testsuite: arm: use effective-target for vsel*, mod* and pr65647.c testsTorbjörn SVENSSON20-37/+58
Update test cases to use -mcpu=unset/-march=unset feature introduced in r15-3606-g7d6c6a0d15c. gcc/testsuite/ChangeLog * gcc.target/arm/pr65647.c: Use effective-target arm_arch_v6m. Removed unneeded dg-skip-if. * gcc.target/arm/mod_2.c: Use effective-target arm_cpu_cortex_a57. * gcc.target/arm/mod_256.c: Likewise. * gcc.target/arm/vseleqdf.c: Likewise. * gcc.target/arm/vseleqsf.c: Likewise. * gcc.target/arm/vselgedf.c: Likewise. * gcc.target/arm/vselgesf.c: Likewise. * gcc.target/arm/vselgtdf.c: Likewise. * gcc.target/arm/vselgtsf.c: Likewise. * gcc.target/arm/vselledf.c: Likewise. * gcc.target/arm/vsellesf.c: Likewise. * gcc.target/arm/vselltdf.c: Likewise. * gcc.target/arm/vselltsf.c: Likewise. * gcc.target/arm/vselnedf.c: Likewise. * gcc.target/arm/vselnesf.c: Likewise. * gcc.target/arm/vselvcdf.c: Likewise. * gcc.target/arm/vselvcsf.c: Likewise. * gcc.target/arm/vselvsdf.c: Likewise. * gcc.target/arm/vselvssf.c: Likewise. * lib/target-supports.exp: Define effective-target arm_cpu_cortex_a57. Update effective-target arm_v8_1_lob_ok to use -mcpu=unset. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-10-09Enable LRA for ia64René Rebe3-8/+5
This was tested by bootstrapping GCC natively on ia64-t2-linux-gnu and running the testsuite (based on 236116068151bbc72aaaf53d0f223fe06f7e3bac): https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html For comparison, the same with just 236116068151bbc72aaaf53d0f223fe06f7e3bac: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html gcc/ * config/ia64/ia64.cc: Enable LRA for ia64. * config/ia64/ia64.md: Likewise. * config/ia64/predicates.md: Likewise. Signed-off-by: René Rebe <rene@exactcode.de>
2024-10-09Remove ia64*-*-linux from the list of obsolete targetsRené Rebe1-1/+1
The following un-deprecates ia64*-*-linux for GCC 15. Since we plan to support this for some years to come. gcc/ * config.gcc: Only list ia64*-*-(hpux|vms|elf) in the list of obsoleted targets. contrib/ * config-list.mk (LIST): no --enable-obsolete for ia64-linux. Signed-off-by: René Rebe <rene@exactcode.de>
2024-10-09tree-optimization/116974 - Handle single-lane SLP for OMP scan storeRichard Biener2-26/+60
The following massages the GIMPLE matching way of handling scan stores to work with single-lane SLP. I do not fully understand all the cases that can happen and the stmt matching at vectorizable_store time is less than ideal - but the following gets me all the testcases to pass with and without forced SLP. Long term we want to perform the matching at SLP discovery time, properly chaining the various SLP instances the current state ends up with. PR tree-optimization/116974 * tree-vect-stmts.cc (check_scan_store): Pass in the SLP node instead of just a flag. Allow single-lane scan stores. (vectorizable_store): Adjust. * tree-vect-loop.cc (vect_analyze_loop_2): Empty scan_map before re-trying.
2024-10-09tree-optimization/116575 - handle SLP of permuted masked loadsRichard Biener2-4/+56
The following handles SLP discovery of permuted masked loads which was prohibited (because wrongly handled) for PR114375. In particular with single-lane SLP at the moment all masked group loads appear permuted and we fail to use masked load lanes as well. The following addresses parts of the issues, starting with doing correct basic discovery - namely discover an unpermuted mask load followed by a permute node. In particular groups with gaps do not support masking yet (and didn't before w/o SLP IIRC). There's still issues with how we represent masked load/store-lanes I think, but I first have to get my hands on a good testcase. PR tree-optimization/116575 PR tree-optimization/114375 * tree-vect-slp.cc (vect_build_slp_tree_2): Do not reject permuted mask loads without gaps but instead discover a node for the full unpermuted load and permute that with a VEC_PERM node. * gcc.dg/vect/vect-pr114375.c: Expect vectorization now with avx2.