aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-04-24Daily bump.GCC Administrator3-1/+83
2023-04-23modula2: Add -lnsl -lsocket libraries to gcc/testsuite/lib/gm2.expGaius Mulley1-0/+4
Solaris requires -lnsl -lsocket (present in the driver) but not when running the testsuite. This patch tests target for *-*-solaris2 and conditionally appends the above libraries. gcc/testsuite/ChangeLog: * lib/gm2.exp (gm2_target_compile_default): Conditionally append -lnsl -lsocket to ldflags. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-04-23aarch64: Annotate fcvtn pattern for vec_concat with zeroesKyrylo Tkachov2-1/+33
Using the define_substs in aarch64-simd.md this is a straightforward annotation to remove a redundant fmov insn. So the codegen goes from: foo_d: fcvtn v0.2s, v0.2d fmov d0, d0 ret to the simple: foo_d: fcvtn v0.2s, v0.2d ret Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_): Rename to... (aarch64_float_truncate_lo_<mode><vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: * gcc.target/aarch64/float_truncate_zero.c: New test.
2023-04-23aarch64: Add vect_concat with zeroes annotation to addp patternKyrylo Tkachov2-7/+12
Similar to others, the addp pattern can be safely annotated with <vczle><vczbe> to create the implicit vec_concat-with-zero variants. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (aarch64_addp<mode>): Rename to... (aarch64_addp<mode><vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: Add testing for vpadd intrinsics.
2023-04-23[xstormy16] Update xstormy16_rtx_costs.Roger Sayle2-11/+163
This patch provides an improved rtx_costs target hook on xstormy16. The current implementation has the unfortunate property that it claims that zero_extendhisi2 is very cheap, even though the machine description doesn't provide that instruction/pattern. Doh! Rewriting the xstormy16_rtx_costs function has additional benefits, including making more use of the (short) "mul" instruction when optimizing for size with -Os. 2023-04-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.cc (xstormy16_rtx_costs): Rewrite to provide reasonable values for common arithmetic operations and immediate operands (in several machine modes). gcc/testsuite/ChangeLog * gcc.target/xstormy16/mulhi.c: New test case.
2023-04-23[xstormy16] Add extendhisi2 and zero_extendhisi2 patterns to stormy16.mdRoger Sayle4-7/+46
This patch adds a pair of define_insn patterns to the xstormy16 machine description that provide extendhisi2 and zero_extendhisi2, i.e. 16-bit to 32-bit sign- and zero-extension respectively. This functionality is already synthesized during RTL expansion, but providing patterns allow the semantics to be exposed to the RTL optimizers. To simplify things, this patch introduces a new %h0 output format, for emitting the high_part register name of a double-word (SImode) register pair. The actual code generated is identical to before. Whilst there, I also fixed the instruction lengths and formatting of the zero_extendqihi2 pattern. Then, mostly for documentation purposes as the 'T' constraint isn't yet implemented, I've added a "and Rx,#255" alternative to zero_extendqihi2 that takes advantage of its efficient instruction encoding. 2023-04-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.cc (xstormy16_print_operand): Add %h format specifier to output high_part register name of SImode reg. * config/stormy16/stormy16.md (extendhisi2): New define_insn. (zero_extendqihi2): Fix lengths, consistent formatting and add "and Rx,#255" alternative, for documentation purposes. (zero_extendhisi2): New define_insn. gcc/testsuite/ChangeLog * gcc.target/xstormy16/extendhisi2.c: New test case. * gcc.target/xstormy16/zextendhisi2.c: Likewise.
2023-04-23[xstormy16] Improved SImode shifts by two bits.Roger Sayle2-0/+35
Currently on xstormy16 SImode shifts by a single bit require two instructions, and shifts by other non-zero integer immediate constants require five instructions. This patch implements the obvious optimization that shifts by two bits can be done in four instructions, by using two single-bit sequences. Hence, ashift_2 was previously generated as: mov r7,r2 | shl r2,#2 | shl r3,#2 | shr r7,#14 | or r3,r7 ret and with this patch we now generate: shl r2,#1 | rlc r3,#1 | shl r2,#1 | rlc r3,#1 ret 2023-04-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.cc (xstormy16_output_shift): Implement SImode shifts by two by performing a single bit SImode shift twice. gcc/testsuite/ChangeLog * gcc.target/xstormy16/shiftsi.c: New test case.
2023-04-23Handle NANs in frange::operator== [PR109593]Aldy Hernandez1-0/+10
This patch... commit 10e481b154c5fc63e6ce4b449ce86cecb87a6015 Return true from operator== for two identical ranges containing NAN. removed the check for NANs, which caused us to read from m_min and m_max which are undefined for NANs. gcc/ChangeLog: PR tree-optimization/109593 * value-range.cc (frange::operator==): Handle NANs.
2023-04-23Adjust testcases after better RA decision.liuhongt5-204/+790
After optimization for RA, memory op is not propagated into instructions(>1), and it make testcases not generate vxorps since the memory is loaded into the dest, and the dest is never unused now. So rewrite testcases to make the codegen more stable. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-dest-false-dep-for-glc.c: Rewrite testcase to make the codegen more stable. * gcc.target/i386/avx512dq-dest-false-dep-for-glc.c: Ditto * gcc.target/i386/avx512f-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512vl-dest-false-dep-for-glc.c: Ditto.
2023-04-23Use NO_REGS in cost calculation when the preferred register class are not ↵liuhongt2-1/+20
known yet. gcc/ChangeLog: PR rtl-optimization/108707 * ira-costs.cc (scan_one_insn): Use NO_REGS instead of GENERAL_REGS when preferred reg_class is not known. gcc/testsuite/ChangeLog: * gcc.target/i386/pr108707.c: New test.
2023-04-23Daily bump.GCC Administrator4-1/+100
2023-04-22PHIOPT: Improve readability of tree_ssa_phiopt_workerAndrew Pinski1-25/+21
This small patch just changes around the code slightly to make it easier to understand that the cases were handling diamond shaped BB for both do_store_elim/do_hoist_loads. There is no effect on code output at all since all of the checks are the same still. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Change the code around slightly to move diamond handling for do_store_elim/do_hoist_loads out of the big if/else.
2023-04-22PHIOPT: Improve minmax diamond detection for phiopt1Andrew Pinski2-7/+6
For diamond bb phi node detection, there is a check to make sure bb1 is not empty. But in the case where bb1 is empty except for a predicate, empty_block_p will still return true but the minmax code handles that case already so there is no reason to check if the basic block is empty. This patch removes that check and removes some xfails. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove check on empty_block_p. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-5.c: Remvoe some xfail.
2023-04-22[Committed] Move new test case to gcc.target/avr/mmcu/pr54816.cRoger Sayle1-0/+0
AVR test cases that specify a specific -mmcu option need to be placed in the gcc.target/avr/mmcu subdirectory. Moved thusly. 2023-04-22 Roger Sayle <roger@nextmovesoftware.com> gcc/testsuite/ChangeLog PR target/54816 * gcc.target/avr/pr54816.c: Move to... * gcc.target/avr/mmcu/pr54816.c: ... here.
2023-04-22Fortran: function results never have the ALLOCATABLE attribute [PR109500]Harald Anlauf2-0/+48
Fortran 2018 8.5.3 (ALLOCATABLE attribute) explains in Note 1 that the result of referencing a function whose result variable has the ALLOCATABLE attribute is a value that does not itself have the ALLOCATABLE attribute. gcc/fortran/ChangeLog: PR fortran/109500 * interface.cc (gfc_compare_actual_formal): Reject allocatable functions being used as actual argument for allocable dummy. gcc/testsuite/ChangeLog: PR fortran/109500 * gfortran.dg/allocatable_function_11.f90: New test. Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
2023-04-22testsuite: Fix up pr109011-*.c tests for powerpc [PR109572]Jakub Jelinek5-18/+18
As reported, pr109011-{4,5}.c tests fail on powerpc. I thought they should have the same counts as the corresponding -{2,3}.c tests, the only difference is that -{2,3}.c are int while -{4,5}.c are long long. But there are 2 issues. One is that in the foo function the vectorization costs comparison triggered in, while in -{2,3}.c we use vectorization factor 4 and it was found beneficial, when using long long it was just vf 2 and the scalar cost of doing p[i] = __builtin_ctzll (q[i]) twice looked smaller than the vectorizated statements. I could disable the cost model, but instead chose to add some further arithmetics to those functions to make it beneficial even with vf 2. After that change, pr109011-4.c still failed; I was expecting 4 .CTZ calls there on power9, 3 vectorized and one in scalar code, but for some reason the scalar one didn't trigger. As I really want to count just the vectorized calls, I've added the vect prefix on the variables to ensure I'm only counting vectorized calls and decreased the 4 counts to 3. 2023-04-22 Jakub Jelinek <jakub@redhat.com> PR testsuite/109572 * gcc.dg/vect/pr109011-1.c: In scan-tree-dump-times regexps match also vect prefix to make sure we only count vectorized calls. * gcc.dg/vect/pr109011-2.c: Likewise. On powerpc* expect just count 3 rather than 4. * gcc.dg/vect/pr109011-3.c: In scan-tree-dump-times regexps match also vect prefix to make sure we only count vectorized calls. * gcc.dg/vect/pr109011-4.c: Likewise. On powerpc* expect just count 3 rather than 4. (foo): Add 2 further arithmetic ops to the loop to make it appear worthwhile for vectorization heuristics on powerpc. * gcc.dg/vect/pr109011-5.c: In scan-tree-dump-times regexps match also vect prefix to make sure we only count vectorized calls. (foo): Add 2 further arithmetic ops to the loop to make it appear worthwhile for vectorization heuristics on powerpc.
2023-04-22Fix up bootstrap with GCC 4.[89] after RAII auto_mpfr and autp_mpz [PR109589]Jakub Jelinek2-0/+6
On Tue, Apr 18, 2023 at 03:39:41PM +0200, Richard Biener via Gcc-patches wrote: > The following adds two RAII classes, one for mpz_t and one for mpfr_t > making object lifetime management easier. Both formerly require > explicit initialization with {mpz,mpfr}_init and release with > {mpz,mpfr}_clear. This unfortunately broke bootstrap when using GCC 4.8.x or 4.9.x as it uses deleted friends which weren't supported until PR62101 fixed them in 2014 for GCC 5. The following patch adds an workaround, not deleting those friends for those old versions. While it means if people add those mp*_{init{,2},clear} calls on auto_mp* objects they won't notice when doing non-bootstrap builds using very old system compilers, people should be bootstrapping their changes and it will be caught during bootstraps even when starting with those old compilers, plus most people actually use much newer compilers when developing. 2023-04-22 Jakub Jelinek <jakub@redhat.com> PR bootstrap/109589 * system.h (class auto_mpz): Workaround PR62101 bug in GCC 4.8 and 4.9. * realmpfr.h (class auto_mpfr): Likewise.
2023-04-22Adjust rx movsicc testsJeff Law9-94/+142
The rx port has target specific test movsicc which is naturally meant to verify that if-conversion is happening on the expected cases. Unfortunately the test is poorly written. The core problem is there are 8 distinct tests and each of those tests is expected to generate a specific sequence. Unfortunately, various generic bits might turn an equality test into an inequality test or make other similar changes. The net result is the assembly matching patterns may find a particular sequence, but it may be for a different function than was originally intended. ie, test1's output may match the expected assembly for test5. Ugh! This patch breaks the movsicc test down into 8 distinct tests and adjusts the patterns they match. The nice thing is all these tests are supposed to have branches that use a bCC 1f form. So we can make them a bit more robust by ignoring the actual condition code used. So if we change eq to ne, as long as we match the movsicc pattern, we're OK. And the 1f style is only used by the movsicc pattern. With the tests broken down it's a lot easier to diagnose why one test fails after the recent changes to if-conversion. movsicc-3 fails because of the profitability test. It's more expensive than the other cases because of its use of (const_int 10) rather than (const_int 0). (const_int 0) naturally has a smaller cost. It looks to me like in this context (const_int 10) should have the same cost as (const_int 0). But I'm nowhere near well versed in the cost model for the rx port. So I'm just leaving the test as xfailed. If someone cares enough, they can dig into it further. gcc/testsuite * gcc.target/rx/movsicc.c: Broken down into ... * gcc.target/rx/movsicc-1.c: Here. * gcc.target/rx/movsicc-2.c: Here. * gcc.target/rx/movsicc-3.c: Here. xfail one test. * gcc.target/rx/movsicc-4.c: Here. * gcc.target/rx/movsicc-5.c: Here. * gcc.target/rx/movsicc-6.c: Here. * gcc.target/rx/movsicc-7.c: Here. * gcc.target/rx/movsicc-8.c: Here.
2023-04-22match.pd: Fix fneg/fadd optimization [PR109583]Jakub Jelinek2-1/+27
The following testcase ICEs on x86, foo function since my r14-22 improvement, but bar already since r13-4122. The problem is the same, in the if expression related_vector_mode is called and that starts with gcc_assert (VECTOR_MODE_P (vector_mode)); but nothing in the fneg/fadd match.pd pattern actually checks if the VEC_PERM type has VECTOR_MODE_P (vec_mode). In this case it has BLKmode and so it ICEs. The following patch makes sure we don't ICE on it. 2023-04-22 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109583 * match.pd (fneg/fadd simplify): Don't call related_vector_mode if vec_mode is not VECTOR_MODE_P. * gcc.dg/pr109583.c: New test.
2023-04-22Update loop estimate after header duplicationJan Hubicka6-38/+178
Loop header copying implements partial loop peelng. If all exits of the loop are peeled (which is the common case) the number of iterations decreases by 1. Without noting this, for loops iterating zero times, we end up re-peeling them later in the loop peeling pass which is wasteful. This patch commonizes the code for esitmate update and adds logic to detect when all (likely) exits were peeled by loop-ch. We are still wrong about update of estimate however: if the exits behave randomly with given probability, loop peeling does not decrease expected iteration counts, just decreases probability that loop will be executed. In this case we thus incorrectly decrease any_estimate. Doing so however at least help us to not peel or optimize hard the lop later. If the loop iterates precisely the estimated nuner of iterations. the estimate decreases, but we are wrong about decreasing the header frequncy. We already have logic that tries to prove that loop exit will not be taken in peeled out iterations and it may make sense to special case this. I also fixed problem where we had off by one error in iteration count updating. It makes perfect sense to expect loop to have 0 iterations. However if bounds drops to negative, we lose info about the loop behaviour (since we have no profile data reaching the loop body). Bootstrapped/regtested x86_64-linux, comitted. Honza gcc/ChangeLog: 2023-04-22 Jan Hubicka <hubicka@ucw.cz> Ondrej Kubanek <kubanek0ondrej@gmail.com> * cfgloopmanip.h (adjust_loop_info_after_peeling): Declare. * tree-ssa-loop-ch.cc (ch_base::copy_headers): Fix updating of loop profile and bounds after header duplication. * tree-ssa-loop-ivcanon.cc (adjust_loop_info_after_peeling): Break out from try_peel_loop; fix handling of 0 iterations. (try_peel_loop): Use adjust_loop_info_after_peeling. gcc/testsuite/ChangeLog: 2023-04-22 Jan Hubicka <hubicka@ucw.cz> Ondrej Kubanek <kubanek0ondrej@gmail.com> * gcc.dg/tree-ssa/peel1.c: Decrease number of peels by 1. * gcc.dg/unroll-8.c: Decrease loop iteration estimate. * gcc.dg/tree-prof/peel-2.c: New test.
2023-04-22Daily bump.GCC Administrator5-1/+287
2023-04-21Do not fold ADDR_EXPR conditions leading to builtin_unreachable early.Andrew MacLeod2-1/+27
Ranges can not represent &var globally yet, so we cannot fold these expressions early or we lose the __builtin_unreachable information. PR tree-optimization/109546 gcc/ * tree-vrp.cc (remove_unreachable::remove_and_update_globals): Do not fold conditions with ADDR_EXPR early. gcc/testsuite/ * gcc.dg/pr109546.c: New.
2023-04-21c++: fix 'unsigned typedef-name' extension [PR108099]Jason Merrill4-12/+60
In the comments for PR108099 Jakub provided some testcases that demonstrated that even before the regression noted in the patch we were getting the semantics of this extension wrong: in the unsigned case we weren't producing the corresponding standard unsigned type but another distinct one of the same size, and in the signed case we were just dropping it on the floor and not actually returning a signed type at all. The former issue is fixed by using c_common_signed_or_unsigned_type instead of unsigned_type_for, and the latter issue by adding a (signed_p && typedef_decl) case. This patch introduces a failure on std/ranges/iota/max_size_type.cc due to the latter issue, since the testcase expects 'signed rep_t' to do something sensible, and previously we didn't. Now that we do, it exposes a bug in the __max_diff_type::operator>>= handling of sign extension: when we evaluate -1000 >> 2 in __max_diff_type we keep the MSB set, but leave the second-most-significant bit cleared. PR c++/108099 gcc/cp/ChangeLog: * decl.cc (grokdeclarator): Don't clear typedef_decl after 'unsigned typedef' pedwarn. Use c_common_signed_or_unsigned_type. Also handle 'signed typedef'. gcc/testsuite/ChangeLog: * g++.dg/ext/int128-8.C: Remove xfailed dg-bogus markers. * g++.dg/ext/unsigned-typedef2.C: New test. * g++.dg/ext/unsigned-typedef3.C: New test.
2023-04-21gcc/m2: Drop references to $(P)Arsen Arsenović2-3/+3
$(P) seems to have been a workaround for some old, proprietary make implementations that we no longer support. It was removed in r0-31149-gb8dad04b688e9c. gcc/m2/ChangeLog: * Make-lang.in: Remove references to $(P). * Make-maintainer.in: Ditto.
2023-04-21Adjust x86 testsuite for recent if-conversion cost checkingJeff Law1-1/+4
gcc/testsuite PR testsuite/109549 * gcc.target/i386/cmov6.c: No longer expect this test to generate 'cmov' instructions.
2023-04-21aarch64: Emit single-instruction for smin (x, 0) and smax (x, 0)Kyrylo Tkachov3-15/+97
Motivated by https://reviews.llvm.org/D148249, we can expand to a single instruction for the SMIN (x, 0) and SMAX (x, 0) cases using the combined AND/BIC and ASR operations. Given that we already have well-fitting TARGET_CSSC patterns and expanders for the min/max codes in the backend this patch does some minor refactoring to ensure we emit the right SMAX/SMIN RTL codes for TARGET_CSSC, fall back to the generic expanders or emit a simple SMIN/SMAX with 0 RTX for !TARGET_CSSC that is now matched by a separate pattern. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64.md (aarch64_umax<mode>3_insn): Delete. (umax<mode>3): Emit raw UMAX RTL instead of going through gen_ function for umax. (<optab><mode>3): New define_expand for MAXMIN_NOUMAX codes. (*aarch64_<optab><mode>3_zero): Define. (*aarch64_<optab><mode>3_cssc): Likewise. * config/aarch64/iterators.md (maxminand): New code attribute. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sminmax-asr_1.c: New test.
2023-04-21PR target/108779 aarch64: Implement -mtp= optionKyrylo Tkachov11-1/+94
A user has requested that we support the -mtp= option in aarch64 GCC for changing the TPIDR register to read for TLS accesses. I'm not a big fan of the option name, but we already support it in the arm port and Clang supports it for AArch64 already, where it accepts the 'el0', 'el1', 'el2', 'el3' values. This patch implements the same functionality in GCC. Bootstrapped and tested on aarch64-none-linux-gnu. Confirmed with godbolt that the sequences and options are the same as what Clang accepts/generates. gcc/ChangeLog: PR target/108779 * config/aarch64/aarch64-opts.h (enum aarch64_tp_reg): Define. * config/aarch64/aarch64-protos.h (aarch64_output_load_tp): Define prototype. * config/aarch64/aarch64.cc (aarch64_tpidr_register): Declare. (aarch64_override_options_internal): Handle the above. (aarch64_output_load_tp): New function. * config/aarch64/aarch64.md (aarch64_load_tp_hard): Call aarch64_output_load_tp. * config/aarch64/aarch64.opt (aarch64_tp_reg): Define enum. (mtp=): New option. * doc/invoke.texi (AArch64 Options): Document -mtp=. gcc/testsuite/ChangeLog: PR target/108779 * gcc.target/aarch64/mtp.c: New test. * gcc.target/aarch64/mtp_1.c: New test. * gcc.target/aarch64/mtp_2.c: New test. * gcc.target/aarch64/mtp_3.c: New test. * gcc.target/aarch64/mtp_4.c: New test.
2023-04-21aarch64: PR target/99195 Add scheme to optimise away vec_concat with zeroes ↵Kyrylo Tkachov3-6/+87
on 64-bit Advanced SIMD ops I finally got around to trying out the define_subst approach for PR target/99195. The problem we have is that many Advanced SIMD instructions have 64-bit vector variants that clear the top half of the 128-bit Q register. This would allow the compiler to avoid generating explicit zeroing instructions to concat the 64-bit result with zeroes for code like: vcombine_u16(vadd_u16(a, b), vdup_n_u16(0)) We've been getting user reports of GCC missing this optimisation in real world code, so it's worth doing something about it. The straightforward approach that we've been taking so far is adding extra patterns in aarch64-simd.md that match the 64-bit result in a vec_concat with zeroes. Unfortunately for big-endian the vec_concat operands to match have to be the other way around, so we would end up adding two extra define_insns. This would lead to too much bloat in aarch64-simd.md This patch defines a pair of define_subst constructs that allow us to annotate patterns in aarch64-simd.md with the <vczle> and <vczbe> subst_attrs and the compiler will automatically produce the vec_concat widening patterns, properly gated for BYTES_BIG_ENDIAN when needed. This seems like the least intrusive way to describe the extra zeroing semantics. I've had a look at the generated insn-*.cc files in the build directory and it seems that define_subst does what we want it to do when applied multiple times on a pattern in terms of insn conditions and modes. This patch adds the define_subst machinery and adds the annotations to some of the straightforward binary and unary integer operations. Many more such annotations are possible and I aim add them in future patches if this approach is acceptable. Bootstrapped and tested on aarch64-none-linux-gnu and on aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (add_vec_concat_subst_le): Define. (add_vec_concat_subst_be): Likewise. (vczle): Likewise. (vczbe): Likewise. (add<mode>3): Rename to... (add<mode>3<vczle><vczbe>): ... This. (sub<mode>3): Rename to... (sub<mode>3<vczle><vczbe>): ... This. (mul<mode>3): Rename to... (mul<mode>3<vczle><vczbe>): ... This. (and<mode>3): Rename to... (and<mode>3<vczle><vczbe>): ... This. (ior<mode>3): Rename to... (ior<mode>3<vczle><vczbe>): ... This. (xor<mode>3): Rename to... (xor<mode>3<vczle><vczbe>): ... This. * config/aarch64/iterators.md (VDZ): Define. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: New test.
2023-04-21c++, tree: optimize walk_tree_1 and cp_walk_subtreesPatrick Palka2-118/+119
These functions currently repeatedly dereference tp during the subtree walks, dereferences which the compiler can't CSE because it can't guarantee that the subtree walking doesn't modify *tp. But we already implicitly require that TREE_CODE (*tp) remains the same throughout the subtree walks, so it doesn't seem to be a huge leap to strengthen that to requiring *tp remains the same. So this patch manually CSEs the dereferences of *tp. This means that a callback function can no longer replace *tp with another tree (of the same TREE_CODE) when walking one of its subtrees, but that doesn't sound like a useful capability anyway. gcc/cp/ChangeLog: * tree.cc (cp_walk_subtrees): Avoid repeatedly dereferencing tp. <case DECLTYPE_TYPE>: Use cp_unevaluated and WALK_SUBTREE. <case ALIGNOF_EXPR etc>: Likewise. gcc/ChangeLog: * tree.cc (walk_tree_1): Avoid repeatedly dereferencing tp and type_p.
2023-04-21Fix boostrap failure in tree-ssa-loop-ch.ccJan Hubicka1-7/+9
I managed to mix up patch and its WIP version in previous commit. This patch adds the missing edge iterator and also fixes a side case where new loop header would have multiple latches. gcc/ChangeLog: * tree-ssa-loop-ch.cc (ch_base::copy_headers): Fix previous commit.
2023-04-21expansion: make layout of x_shift*cost[][][] more efficientVineet Gupta1-14/+13
when debugging expmed.[ch] for PR/108987 saw that some of the cost arrays have less than ideal layout as follows: x_shift*cost[0..63][speed][modes] We would want speed to be first index since a typical compile will have that fixed, followed by mode and then the shift values. It should be non-functional from compiler semantics pov, except executing slightly faster due to better locality of shift values for given speed and mode. And also a bit more intutive when debugging. gcc/Changelog: * expmed.h (x_shift*_cost): convert to int [speed][mode][shift]. (shift*_cost_ptr ()): Access x_shift*_cost array directly. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-04-21[aarch64] Use force_reg instead of copy_to_mode_reg.Prathamesh Kulkarni1-6/+6
Use force_reg instead of copy_to_mode_reg in aarch64_simd_dup_constant and aarch64_expand_vector_init to avoid creating pseudo if original value is already in a register. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_simd_dup_constant): Use force_reg instead of copy_to_mode_reg. (aarch64_expand_vector_init): Likewise.
2023-04-21i386: Remove REG_OK_FOR_INDEX/REG_OK_FOR_BASE and their derivativesUros Bizjak3-50/+34
x86 was converted to TARGET_LEGITIMATE_ADDRESS_P long ago. Remove remnants of the conversion. Also, cleanup the remaining macros a bit by introducing INDEX_REGNO_P macro. No functional change. gcc/ChangeLog: 2023-04-21 Uroš Bizjak <ubizjak@gmail.com> * config/i386/i386.h (REG_OK_FOR_INDEX_P, REG_OK_FOR_BASE_P): Remove. (REG_OK_FOR_INDEX_NONSTRICT_P, REG_OK_FOR_BASE_NONSTRICT_P): Ditto. (REG_OK_FOR_INDEX_STRICT_P, REG_OK_FOR_BASE_STRICT_P): Ditto. (FIRST_INDEX_REG, LAST_INDEX_REG): New defines. (LEGACY_INDEX_REG_P, LEGACY_INDEX_REGNO_P): New macros. (INDEX_REG_P, INDEX_REGNO_P): Ditto. (REGNO_OK_FOR_INDEX_P): Use INDEX_REGNO_P predicates. (REGNO_OK_FOR_INDEX_NONSTRICT_P): New macro. (EG_OK_FOR_BASE_NONSTRICT_P): Ditto. * config/i386/predicates.md (index_register_operand): Use REGNO_OK_FOR_INDEX_P and REGNO_OK_FOR_INDEX_NONSTRICT_P macros. * config/i386/i386.cc (ix86_legitimate_address_p): Use REGNO_OK_FOR_BASE_P, REGNO_OK_FOR_BASE_NONSTRICT_P, REGNO_OK_FOR_INDEX_P and REGNO_OK_FOR_INDEX_NONSTRICT_P macros.
2023-04-21Fix latent bug in loop header copying which forgets to update the loop ↵Jan Hubicka1-0/+13
header pointer gcc/ChangeLog: 2023-04-21 Jan Hubicka <hubicka@ucw.cz> Ondrej Kubanek <kubanek0ondrej@gmail.com> * tree-ssa-loop-ch.cc (ch_base::copy_headers): Update loop header and latch.
2023-04-21Add safe_is_aRichard Biener1-0/+13
The following adds safe_is_a, an is_a check handling nullptr gracefully. * is-a.h (safe_is_a): New.
2023-04-21Add operator* to gimple_stmt_iterator and gphi_iteratorRichard Biener1-0/+4
This allows STL style iterator dereference. It's the same as gsi_stmt () or .phi (). * gimple-iterator.h (gimple_stmt_iterator::operator*): Add. (gphi_iterator::operator*): Likewise.
2023-04-21Stabilize inlinerJan Hubicka1-13/+70
The Fibonacci heap can change its behaviour quite significantly for no good reasons when multiple edges with same key occurs. This is quite common for small functions. This patch stabilizes the order by adding edge uids into the info. Again I think this is good idea regardless of the incremental WPA project since we do not want random changes in inline decisions. gcc/ChangeLog: 2023-04-21 Jan Hubicka <hubicka@ucw.cz> Michal Jires <michal@jires.eu> * ipa-inline.cc (class inline_badness): New class. (edge_heap_t, edge_heap_node_t): Use inline_badness for badness instead of sreal. (update_edge_key): Update. (lookup_recursive_calls): Likewise. (recursive_inlining): Likewise. (add_new_edges_to_heap): Likewise. (inline_small_functions): Likewise.
2023-04-21Cleanup odr_types_equivalent_pJan Hubicka1-6/+9
gcc/ChangeLog: 2023-04-21 Jan Hubicka <hubicka@ucw.cz> * ipa-devirt.cc (odr_types_equivalent_p): Cleanup warned checks.
2023-04-21PR modula2/109586 cc1gm2 ICE when compiling large source files.Gaius Mulley1-2/+2
The function m2block_RememberConstant calls m2tree_IsAConstant. However IsAConstant does not recognise TREE_CODE(t) == CONSTRUCTOR as a constant. Without this patch CONSTRUCTOR contants are garbage collected (and not preserved) resulting in a corrupt tree and crash. gcc/m2/ChangeLog: PR modula2/109586 * gm2-gcc/m2tree.cc (m2tree_IsAConstant): Add (TREE_CODE (t) == CONSTRUCTOR) to expression. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-04-21tree-optimization/109573 - avoid ICEing on unexpected live defRichard Biener2-3/+95
The following relaxes the assert in vectorizable_live_operation where we catch currently unhandled cases to also allow an intermediate copy as it happens here but also relax the assert to checking only. PR tree-optimization/109573 * tree-vect-loop.cc (vectorizable_live_operation): Allow unhandled SSA copy as well. Demote assert to checking only. * g++.dg/vect/pr109573.cc: New testcase.
2023-04-21Use correct CFG orders for DF worklist processingRichard Biener1-16/+20
This adjusts the remaining three RPO computes in DF. The DF_FORWARD problems should use a RPO on the forward graph, the DF_BACKWARD problems should use a RPO on the inverted graph. Conveniently now inverted_rev_post_order_compute computes a RPO. We still use post_order_compute and reverse its order for its side-effect of deleting unreachable blocks. This resuls in an overall reduction on visited blocks on cc1files by 5.2%. Because on the reverse CFG most regions are irreducible, there's few cases the number of visited blocks increases. For the set of cc1files I have this is for et-forest.i, graph.i, hwint.i, tree-ssa-dom.i, tree-ssa-loop-ch.i and tree-ssa-threadedge.i. For tree-ssa-dse.i it's off-noise and I've more closely investigated and figured it is really bad luck due to the irreducibility. * df-core.cc (df_analyze): Compute RPO on the reverse graph for DF_BACKWARD problems. (loop_post_order_compute): Rename to ... (loop_rev_post_order_compute): ... this, compute a RPO. (loop_inverted_post_order_compute): Rename to ... (loop_inverted_rev_post_order_compute): ... this, compute a RPO. (df_analyze_loop): Use RPO on the forward graph for DF_FORWARD problems, RPO on the inverted graph for DF_BACKWARD.
2023-04-21change inverted_post_order_compute to inverted_rev_post_order_computeRichard Biener6-44/+53
The following changes the inverted_post_order_compute API back to a plain C array interface and computing a reverse post order since that's what's always required. It will make massaging DF to use the correct iteration orders easier. Elsewhere it requires turning backward iteration over the computed order with forward iteration. * cfganal.h (inverted_rev_post_order_compute): Rename from ... (inverted_post_order_compute): ... this. Add struct function argument, change allocation to a C array. * cfganal.cc (inverted_rev_post_order_compute): Likewise. * lcm.cc (compute_antinout_edge): Adjust. * lra-lives.cc (lra_create_live_ranges_1): Likewise. * tree-ssa-dce.cc (remove_dead_stmt): Likewise. * tree-ssa-pre.cc (compute_antic): Likewise.
2023-04-21change DF to use the proper CFG order for DF_FORWARD problemsRichard Biener2-32/+34
This changes DF to use RPO on the forward graph for DF_FORWARD problems. While that naturally maps to pre_and_rev_postorder_compute we use the existing (wrong) CFG order for DF_BACKWARD problems computed by post_order_compute since that provides the required side-effect of deleting unreachable blocks. The change requires turning the inconsistent vec<int> vs int * back to consistent int *. A followup patch will change the inverted_post_order_compute API and change the DF_BACKWARD problem to use the correct RPO on the backward graph together with statistics I produced last year for the combined effect. * df.h (df_d::postorder_inverted): Change back to int *, clarify comments. * df-core.cc (rest_of_handle_df_finish): Adjust. (df_analyze_1): Likewise. (df_analyze): For DF_FORWARD problems use RPO on the forward graph. Adjust. (loop_inverted_post_order_compute): Adjust API. (df_analyze_loop): Adjust. (df_get_n_blocks): Likewise. (df_get_postorder): Likewise.
2023-04-21RISC-V: Defer vsetvli insertion to later if possible [PR108270]Juzhe-Zhong5-3/+47
Fix issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270. Consider the following testcase: void f (void * restrict in, void * restrict out, int l, int n, int m) { for (int i = 0; i < l; i++){ for (int j = 0; j < m; j++){ for (int k = 0; k < n; k++) { vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17); __riscv_vse8_v_i8mf8 (out + i + j, v, 17); } } } } Compile option: -O3 Before this patch: mv a7,a2 mv a6,a0 mv t1,a1 mv a2,a3 vsetivli zero,17,e8,mf8,ta,ma ble a7,zero,.L1 ble a4,zero,.L1 ble a3,zero,.L1 ... After this patch: mv a7,a2 mv a6,a0 mv t1,a1 mv a2,a3 ble a7,zero,.L1 ble a4,zero,.L1 ble a3,zero,.L1 add a1,a0,a4 li a0,0 vsetivli zero,17,e8,mf8,ta,ma ... This issue is a missed optmization produced by Phase 3 global backward demand fusion instead of LCM. This patch is fixing poor placement of the vsetvl. This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand info backward fusion and propogation) which is I introduced into VSETVL PASS to enhance LCM && improve vsetvl instruction performance. This patch is to supress the Phase 3 too aggressive backward fusion and propagation to the top of the function program when there is no define instruction of AVL (AVL is 0 ~ 31 imm since vsetivli instruction allows imm value instead of reg). You may want to ask why we need Phase 3 to the job. Well, we have so many situations that pure LCM fails to optimize, here I can show you a simple case to demonstrate it: void f (void * restrict in, void * restrict out, int n, int m, int cond) { size_t vl = 101; for (size_t j = 0; j < m; j++){ if (cond) { for (size_t i = 0; i < n; i++) { vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl); __riscv_vse8_v_i8mf8 (out + i, v, vl); } } else { for (size_t i = 0; i < n; i++) { vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl); v = __riscv_vadd_vv_i32mf2 (v,v,vl); __riscv_vse32_v_i32mf2 (out + i, v, vl); } } } } You can see: The first inner loop needs vsetvli e8 mf8 for vle+vse. The second inner loop need vsetvli e32 mf2 for vle+vadd+vse. If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up with : outerloop: ... vsetvli e8mf8 inner loop 1: .... vsetvli e32mf2 inner loop 2: .... However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2 of inner loop 2 into vsetvli e8 mf8, then we will end up with this result after phase 3: outerloop: ... inner loop 1: vsetvli e32mf2 .... inner loop 2: vsetvli e32mf2 .... Then, this demand information after phase 3 will be well optimized after phase 4 (LCM), after Phase 4 result is: vsetvli e32mf2 outerloop: ... inner loop 1: .... inner loop 2: .... You can see this is the optimal codegen after current VSETVL PASS (Phase 3: Demand backward fusion and propagation + Phase 4: LCM ). This is a known issue when I start to implement VSETVL PASS. gcc/ChangeLog: PR target/108270 * config/riscv/riscv-vsetvl.cc (vector_infos_manager::all_empty_predecessor_p): New function. (pass_vsetvl::backward_demand_fusion): Ditto. * config/riscv/riscv-vsetvl.h: Ditto. gcc/testsuite/ChangeLog: PR target/108270 * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt testcase. * gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Ditto. * gcc.target/riscv/rvv/vsetvl/pr108270.c: New test.
2023-04-21riscv: Fix <bitmanip_insn> fallout.Robin Dapp1-1/+1
PR109582: Since r14-116 generic.md uses standard names instead of the types defined in the <bitmanip_insn> iterator (that match instruction names). Change this. gcc/ChangeLog: PR target/109582 * config/riscv/generic.md: Change standard names to insn names.
2023-04-21rs6000: xfail float128 comparison test case that fails on powerpc64.Haochen Gui1-0/+1
This patch xfails a float128 comparison test case on powerpc64 that fails due to a longstanding issue with floating-point compares. See PR58684 for more information. When float128 hardware is enabled (-mfloat128-hardware), xscmpuqp is generated for comparison which is unexpected. When float128 software emulation is enabled (-mno-float128-hardware), we still have to xfail the hardware version (__lekf2_hw) which finally generates xscmpuqp. gcc/testsuite/ PR target/108728 * gcc.dg/torture/float128-cmp-invalid.c: Add xfail.
2023-04-21testsuite: make ppc_cpu_supports_hw as effective target keyword [PR108728]Haochen Gui1-0/+1
gcc/testsuite/ PR target/108728 * lib/target-supports.exp (is-effective-target-keyword): Add ppc_cpu_supports_hw.
2023-04-21Fix LCM dataflow CFG orderRichard Biener1-23/+24
The following fixes the initial order the LCM dataflow routines process BBs. For a forward problem you want reverse postorder, for a backward problem you want reverse postorder on the inverted graph. The LCM iteration has very many other issues but this allows to turn inverted_post_order_compute into computing a reverse postorder more easily. * lcm.cc (compute_antinout_edge): Use RPO on the inverted graph. (compute_laterin): Use RPO. (compute_available): Likewise.
2023-04-21LoongArch: Fix MUSL_DYNAMIC_LINKERPeng Fan1-1/+6
The system based on musl has no '/lib64', so change it. https://wiki.musl-libc.org/guidelines-for-distributions.html, "Multilib/multi-arch" section of this introduces it. gcc/ * config/loongarch/gnu-user.h (MUSL_DYNAMIC_LINKER): Redefine. Signed-off-by: Peng Fan <fanpeng@loongson.cn> Suggested-by: Xi Ruoyao <xry111@xry111.site>
2023-04-21RISC-V: Add local user vsetvl instruction elimination [PR109547]Juzhe-Zhong4-3/+85
This patch is to enhance optimization for auto-vectorization. Before this patch: Loop: vsetvl a5,a2... vsetvl zero,a5... vle After this patch: Loop: vsetvl a5,a2 vle gcc/ChangeLog: PR target/109547 * config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): New function. (vector_insn_info::skip_avl_compatible_p): Ditto. (vector_insn_info::merge): Remove default value. (pass_vsetvl::compute_local_backward_infos): Ditto. (pass_vsetvl::cleanup_insns): Add local vsetvl elimination. * config/riscv/riscv-vsetvl.h: Ditto. gcc/testsuite/ChangeLog: PR target/109547 * gcc.target/riscv/rvv/vsetvl/pr109547.c: New. * gcc.target/riscv/rvv/vsetvl/vsetvl-17.c: Update scan condition.