From 4a960d548b7d7d942f316c5295f6d849b74214f5 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Thu, 23 Sep 2021 10:59:24 +0200 Subject: Avoid invalid loop transformations in jump threading registry. My upcoming improvements to the forward jump threader make it thread more aggressively. In investigating some "regressions", I noticed that it has always allowed threading through empty latches and across loop boundaries. As we have discussed recently, this should be avoided until after loop optimizations have run their course. Note that this wasn't much of a problem before because DOM/VRP couldn't find these opportunities, but with a smarter solver, we trip over them more easily. Because the forward threader doesn't have an independent localized cost model like the new threader (profitable_path_p), it is difficult to catch these things at discovery. However, we can catch them at registration time, with the added benefit that all the threaders (forward and backward) can share the handcuffs. This patch is an adaptation of what we do in the backward threader, but it is not meant to catch everything we do there, as some of the restrictions there are due to limitations of the different block copiers (for example, the generic copier does not re-use existing threading paths). We could ideally remove the now redundant bits in profitable_path_p, but I would prefer not to for two reasons. First, the backward threader uses profitable_path_p as it discovers paths to avoid discovering paths in unprofitable directions. Second, I would like to merge all the forward cost restrictions into the profitability class in the backward threader, not the other way around. Alas, that reshuffling will have to wait for the next release. As usual, there are quite a few tests that needed adjustments. It seems we were quite happily threading improper scenarios. With most of them, as can be seen in pr77445-2.c, we're merely shifting the threading to after loop optimizations. Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): New. (jt_path_registry::register_jump_thread): Call cancel_invalid_paths. * tree-ssa-threadupdate.h (class jt_path_registry): Add cancel_invalid_paths. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/20030714-2.c: Adjust. * gcc.dg/tree-ssa/pr66752-3.c: Adjust. * gcc.dg/tree-ssa/pr77445-2.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust. * gcc.dg/vect/bb-slp-16.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++++--- gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++++++++++------- gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +++- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 ++-- 5 files changed, 23 insertions(+), 15 deletions(-) (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c index eb663f2..9585ff1 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c @@ -32,7 +32,8 @@ get_alias_set (t) } } -/* There should be exactly three IF conditionals if we thread jumps - properly. */ -/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */ +/* There should be exactly 4 IF conditionals if we thread jumps + properly. There used to be 3, but one thread was crossing + loops. */ +/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c index e1464e2..922a331 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */ extern int status, pt; extern int count; @@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a) pt--; } -/* There are 4 jump threading opportunities, all of which will be - realized, which will eliminate testing of FLAG, completely. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */ +/* There are 2 jump threading opportunities (which don't cross loops), + all of which will be realized, which will eliminate testing of + FLAG, completely. */ +/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */ -/* There should be no assignments or references to FLAG, verify they're - eliminated as early as possible. */ -/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */ +/* We used to remove references to FLAG by DCE2, but this was + depending on early threaders threading through loop boundaries + (which we shouldn't do). However, the late threading passes, which + run after loop optimizations , can successfully eliminate the + references to FLAG. Verify that ther are no references by the late + threading passes. */ +/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c index f9fc212..01a0f1f 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c @@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) { aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough to change decisions in switch expansion which in turn can expose new jump threading opportunities. Skip the later tests on aarch64. */ -/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c index 60d4f76..2d78d04 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c @@ -21,5 +21,7 @@ condition. All the cases are picked up by VRP1 as jump threads. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */ + +/* There used to be 6 jump threads found by thread1, but they all + depended on threading through distinct loops in ethread. */ /* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index e3d4b31..16abcde 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough -- cgit v1.1 From 0288527f47cec6698b31ccb3210816415506009e Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Tue, 21 Sep 2021 10:27:53 +0200 Subject: Replace VRP threader with a hybrid forward threader. This patch implements the new hybrid forward threader and replaces the embedded VRP threader with it. With all the pieces that have gone in, the implementation of the hybrid threader is straightforward: convert the current state into SSA imports that the solver will understand, and let the path solver precompute ranges and relations for the path. After this setup is done, we can use the range_query API to solve gimple statements in the threader. The forward threader is now engine agnostic so there are no changes to the threader per se. I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP, because they will also be used in the evrp removal of the DOM/threader, which is my next task. Most of the patch, is actually test changes. I have gone through every single one and verified that we're correct. Most were trivial dump file name changes, but others required going through the IL an certifying that the different IL was expected. For example, in pr59597.c, we have one less thread because the ASSERT_EXPR was getting in the way, and making it seem like things were not crossing loops. The hybrid threader sees the correct representation of the IL, and avoids threading this one case. The final numbers are a 12.16% improvement in jump threads immediately after VRP, and a 0.82% improvement in overall jump threads. The performance drop is 0.6% (plus the 1.43% hit from moving the embedded threader into its own pass). As I've said, I'd prefer to keep the threader in its own pass, but if this is an issue, we can address this with a shared ranger when VRP is replaced with an evrp instance (upcoming). Note, that these numbers are slightly different than what I originally posted. A few correctness tweaks, plus restricting loop threads, made the difference. That being said, I was aiming for par. A 12% gain is just gravy ;-). When we merge the threaders, we should see even better numbers-- and we'll have the benefit of an entire release stress testing the solver. As I mentioned in my introductory note, paths ending in MEM_REF conditional are missing. In reality, this didn't make a difference, as it was so rare. However, as a follow-up, I will distill a test and add a suitable PR to keep us honest. There is a one-line change to libgomp/team.c silencing a new used uninitialized warning. As my previous work with the threaders has shown, warnings flare up after each improvement to jump threading. I expect this to be no different. I've promised Jakub to investigate fully, so I will analyze and add the appropriate PR for the warning experts. Oh yeah, the new pass dump is called vrp-threader[12] to match each VRP[12] pass. However, there's no reason for it to either be named vrp-threader, or for it to live in tree-vrp.c. Tested on x86-64 Linux. OK? p.s. "Did I say 5 weeks? My bad, I meant 5 months." gcc/ChangeLog: * passes.def (pass_vrp_threader): New. * tree-pass.h (make_pass_vrp_threader): Add make_pass_vrp_threader. * tree-ssa-threadedge.c (hybrid_jt_state::register_equivs_stmt): New. (hybrid_jt_simplifier::hybrid_jt_simplifier): New. (hybrid_jt_simplifier::simplify): New. (hybrid_jt_simplifier::compute_ranges_from_state): New. * tree-ssa-threadedge.h (class hybrid_jt_state): New. (class hybrid_jt_simplifier): New. * tree-vrp.c (execute_vrp): Remove ASSERT_EXPR based jump threader. (class hybrid_threader): New. (hybrid_threader::hybrid_threader): New. (hybrid_threader::~hybrid_threader): New. (hybrid_threader::before_dom_children): New. (hybrid_threader::after_dom_children): New. (execute_vrp_threader): New. (class pass_vrp_threader): New. (make_pass_vrp_threader): New. libgomp/ChangeLog: * team.c: Initialize start_data. * testsuite/libgomp.graphite/force-parallel-4.c: Adjust. * testsuite/libgomp.graphite/force-parallel-8.c: Adjust. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr55107.c: Adjust. * gcc.dg/tree-ssa/phi_on_compare-1.c: Adjust. * gcc.dg/tree-ssa/phi_on_compare-2.c: Adjust. * gcc.dg/tree-ssa/phi_on_compare-3.c: Adjust. * gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust. * gcc.dg/tree-ssa/pr21559.c: Adjust. * gcc.dg/tree-ssa/pr59597.c: Adjust. * gcc.dg/tree-ssa/pr61839_1.c: Adjust. * gcc.dg/tree-ssa/pr61839_3.c: Adjust. * gcc.dg/tree-ssa/pr71437.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Adjust. * gcc.dg/tree-ssa/ssa-thread-14.c: Adjust. * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Adjust. * gcc.dg/tree-ssa/vrp106.c: Adjust. * gcc.dg/tree-ssa/vrp55.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr21559.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr59597.c | 13 +++++++++---- gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c | 10 +++++++--- gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr71437.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-16.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c | 5 +++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/vrp106.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/vrp55.c | 6 +++--- 18 files changed, 49 insertions(+), 39 deletions(-) (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c index 5227c87..59663dd 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-Ofast -fdump-tree-vrp1" } */ +/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */ void g (int); void g1 (int); @@ -27,4 +27,4 @@ f (long a, long b, long c, long d, long x) g (a); } -/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c index eaf89bb..0c2f6e0 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-Ofast -fdump-tree-vrp1" } */ +/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */ void g (void); void g1 (void); @@ -20,4 +20,4 @@ f (long a, long b, long c, long d, int x) } } -/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c index d5a1e0b..6a3d359 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-Ofast -fdump-tree-vrp1" } */ +/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */ void g (void); void g1 (void); @@ -22,4 +22,4 @@ f (long a, long b, long c, long d, int x) } } -/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c index 53acabc..9bc4c6d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-Ofast -fdump-tree-vrp1" } */ +/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */ void g (int); void g1 (int); @@ -37,4 +37,4 @@ f (long a, long b, long c, long d, int x) g (c + d); } -/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c b/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c index b406566..51b3b7a 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-evrp-details -fdump-tree-vrp1-details" } */ +/* { dg-options "-O2 -fdump-tree-evrp-details -fdump-tree-vrp-thread1-details" } */ static int blocksize = 4096; @@ -39,6 +39,6 @@ void foo (void) statement. We also realize that the final bytes == 0 test is useless, and thread over it. We also know that toread != 0 is useless when entering while loop and thread over it. */ -/* { dg-final { scan-tree-dump-times "Threaded jump" 3 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "Threaded jump" 3 "vrp-thread1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c index dab16ab..2caa1f5 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-Ofast -fdump-tree-vrp1-details" } */ +/* { dg-options "-Ofast -fdump-tree-vrp-thread1-details" } */ typedef unsigned short u16; typedef unsigned char u8; @@ -56,6 +56,11 @@ main (int argc, char argv[]) return crc; } -/* { dg-final { scan-tree-dump-times "Registering jump thread" 3 "vrp1" } } */ -/* { dg-final { scan-tree-dump-not "joiner" "vrp1" } } */ -/* { dg-final { scan-tree-dump-times "Threaded jump" 3 "vrp1" } } */ +/* Previously we had 3 jump threads, but one of them crossed loops. + The reason the old threader was allowing it, was because there was + an ASSERT_EXPR getting in the way. Without the ASSERT_EXPR, we + have an empty pre-header block as the final block in the thread, + which the threader will simply join with the next block which *is* + in a different loop. */ +/* { dg-final { scan-tree-dump-times "Registering jump thread" 2 "vrp-thread1" } } */ +/* { dg-final { scan-tree-dump-not "joiner" "vrp-thread1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c index ddc53fb..0229a82 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c @@ -1,6 +1,6 @@ /* PR tree-optimization/61839. */ /* { dg-do run } */ -/* { dg-options "-O2 -fdump-tree-vrp1 -fdisable-tree-evrp -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-thread1" } */ +/* { dg-options "-O2 -fdump-tree-vrp-thread1 -fdisable-tree-evrp -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-thread1" } */ /* { dg-require-effective-target int32plus } */ __attribute__ ((noinline)) @@ -38,7 +38,11 @@ int main () } /* Scan for c = 972195717) >> [0, 1] in function foo. */ -/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1 "vrp-thread1" } } */ + +/* Previously we were checking for two ?: with constant PHI arguments, + but now we collapse them into one. */ /* Scan for c = 972195717) >> [2, 3] in function bar. */ -/* { dg-final { scan-tree-dump-times "243048929 : 121524464" 2 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "243048929 : 121524464" 1 "vrp-thread1" } } */ + /* { dg-final { scan-tree-dump-times "486097858" 0 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c index cc322d6..7be1873 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c @@ -1,6 +1,6 @@ /* PR tree-optimization/61839. */ /* { dg-do run } */ -/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-thread1" } */ +/* { dg-options "-O2 -fdump-tree-vrp-thread1 -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-thread1" } */ __attribute__ ((noinline)) int foo (int a, unsigned b) @@ -22,5 +22,5 @@ int main () } /* Scan for c [12, 13] << 8 in function foo. */ -/* { dg-final { scan-tree-dump-times "3072 : 3328" 2 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "3072 : 3328" 1 "vrp-thread1" } } */ /* { dg-final { scan-tree-dump-times "3072" 0 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c index 66a5405..a2386ba 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-ffast-math -O3 -fdump-tree-vrp1-details" } */ +/* { dg-options "-ffast-math -O3 -fdump-tree-vrp-thread1-details" } */ int I = 50, J = 50; int S, L; @@ -39,4 +39,4 @@ void foo (int K) bar (LD, SD); } } -/* { dg-final { scan-tree-dump-times "Threaded jump " 2 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "Threaded jump " 2 "vrp-thread1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c index 856ab38..73969bb 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-dom2-details --param logical-op-non-short-circuit=1 -fdisable-tree-thread1 -fdisable-tree-thread2" } */ +/* { dg-options "-O2 -fdump-tree-dom2-details --param logical-op-non-short-circuit=1 -fdisable-tree-thread1 -fdisable-tree-thread2 -fdisable-tree-vrp-thread1 " } */ static int *bb_ticks; extern void frob (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-16.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-16.c index ffbdc98..1b677f4 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-16.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-16.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-dom2-details -w --param logical-op-non-short-circuit=1" } */ +/* { dg-options "-O2 -fdump-tree-dom2-details -w --param logical-op-non-short-circuit=1 -fdisable-tree-vrp-thread1" } */ unsigned char validate_subreg (unsigned int offset, unsigned int isize, unsigned int osize, int zz, int qq) { diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c index 2d78d04..0246ebf 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-vrp1-details -fdump-tree-thread1-details -std=gnu89 --param logical-op-non-short-circuit=0" } */ +/* { dg-options "-O2 -fdump-tree-vrp-thread1-details -std=gnu89 --param logical-op-non-short-circuit=0" } */ #include "ssa-dom-thread-4.c" @@ -24,4 +24,4 @@ /* There used to be 6 jump threads found by thread1, but they all depended on threading through distinct loops in ethread. */ -/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp-thread1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c index b972f64..8f0a12c 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom2-stats" } */ +/* { dg-options "-O2 -fdump-tree-vrp-thread1-stats -fdump-tree-dom2-stats" } */ void bla(); @@ -16,6 +16,6 @@ void thread_entry_through_header (void) /* There's a single jump thread that should be handled by the VRP jump threading pass. */ -/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp1"} } */ -/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp1"} } */ +/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp-thread1"} } */ +/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp-thread1"} } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c index 521754f..46e464f 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-vrp1-details -fdump-tree-dom2-details -std=gnu89 --param logical-op-non-short-circuit=1" } */ +/* { dg-options "-O2 -fdump-tree-vrp-thread1-details -fdump-tree-dom2-details -std=gnu89 --param logical-op-non-short-circuit=1" } */ struct bitmap_head_def; typedef struct bitmap_head_def *bitmap; typedef const struct bitmap_head_def *const_bitmap; @@ -58,4 +58,5 @@ bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b, code we missed the edge when the first conditional is false (b_elt is zero, which means the second conditional is always zero. VRP1 catches all three. */ -/* { dg-final { scan-tree-dump-times "Threaded" 3 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "Registering jump thread" 2 "vrp-thread1" } } */ +/* { dg-final { scan-tree-dump-times "Path crosses loops" 1 "vrp-thread1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c index f9152b9..8c5cc82 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c @@ -1,7 +1,7 @@ /* { dg-do compile } */ -/* { dg-additional-options "-O2 -fdump-tree-vrp-details --param logical-op-non-short-circuit=1" } */ +/* { dg-additional-options "-O2 -fdump-tree-vrp-thread1-details --param logical-op-non-short-circuit=1" } */ /* { dg-additional-options "-fdisable-tree-thread1" } */ -/* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } } */ +/* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp-thread1" } } */ void foo (void); void bar (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c index ef5611f..86d07ef 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-vrp1-details -fdelete-null-pointer-checks" } */ +/* { dg-options "-O2 -fdump-tree-vrp-thread1-details -fdelete-null-pointer-checks" } */ /* { dg-skip-if "" keeps_null_pointer_checks } */ void oof (void); @@ -29,5 +29,5 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent, /* ARM Cortex-M defined LOGICAL_OP_NON_SHORT_CIRCUIT to false, so skip below test. */ -/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp1" { target { ! arm_cortex_m } } } } */ +/* { dg-final { scan-tree-dump-times "Threaded" 1 "vrp-thread1" { target { ! arm_cortex_m } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c index e2e48d8..f25ea9c 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c @@ -1,6 +1,6 @@ /* PR tree-optimization/18046 */ -/* { dg-options "-O2 -fdump-tree-vrp1-details" } */ -/* { dg-final { scan-tree-dump-times "Threaded jump" 1 "vrp1" } } */ +/* { dg-options "-O2 -fdump-tree-vrp-thread1-details" } */ +/* { dg-final { scan-tree-dump-times "Threaded jump" 1 "vrp-thread1" } } */ /* During VRP we expect to thread the true arm of the conditional through the switch and to the BB that corresponds to the 7 ... 9 case label. */ extern void foo (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c index 8ae9b8d..a478a69 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp55.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-vrp1-blocks-vops-details -fdelete-null-pointer-checks" } */ +/* { dg-options "-O2 -fdump-tree-vrp-thread1-blocks-vops-details -fdelete-null-pointer-checks" } */ void arf (void); @@ -12,6 +12,6 @@ fu (char *p, int x) arf (); } -/* { dg-final { scan-tree-dump-times "Threaded jump" 1 "vrp1" { target { ! keeps_null_pointer_checks } } } } */ -/* { dg-final { scan-tree-dump-times "Threaded jump" 0 "vrp1" { target { keeps_null_pointer_checks } } } } */ +/* { dg-final { scan-tree-dump-times "Threaded jump" 1 "vrp-thread1" { target { ! keeps_null_pointer_checks } } } } */ +/* { dg-final { scan-tree-dump-times "Threaded jump" 0 "vrp-thread1" { target { keeps_null_pointer_checks } } } } */ -- cgit v1.1 From e475ae9bbf0c85a7d60f236c5a744e163a9ef7b8 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Mon, 27 Sep 2021 16:41:01 +0200 Subject: Control all jump threading passes with -fjump-threads. Last year I mentioned that -fthread-jumps was being ignored by the majority of our jump threading passes, and Jeff said he'd be in favor of fixing this. This patch remedies the situation, but it does change existing behavior. Currently -fthread-jumps is only enabled for -O2, -O3, and -Os. This means that even if we restricted all jump threading passes with -fthread-jumps, DOM jump threading would still seep through since it runs at -O1. I propose this patch, but it does mean that DOM jump threading would have to be explicitly enabled with -O1 -fthread-jumps. gcc/ChangeLog: * tree-ssa-threadbackward.c (pass_thread_jumps::gate): Check flag_thread_jumps. (pass_early_thread_jumps::gate): Same. * tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges): Return if !flag_thread_jumps. * tree-ssa-threadupdate.c (jt_path_registry::register_jump_thread): Assert that flag_thread_jumps is true. gcc/testsuite/ChangeLog: * gcc.dg/auto-init-uninit-1.c: Add -fthread-jumps. * gcc.dg/auto-init-uninit-15.c: Same. * gcc.dg/guality/example.c: Same. * gcc.dg/loop-8.c: Same. * gcc.dg/strlenopt-40.c: Same. * gcc.dg/tree-ssa/pr18133-2.c: Same. * gcc.dg/tree-ssa/pr18134.c: Same. * gcc.dg/uninit-1.c: Same. * gcc.dg/uninit-pr44547.c: Same. * gcc.dg/uninit-pr59970.c: Same. --- gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr18134.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c index 8717640..1b40985 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O1 -fdump-tree-optimized-blocks" } */ +/* { dg-options "-O1 -fthread-jumps -fdump-tree-optimized-blocks" } */ int c, d; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c b/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c index cd40ab2..d7f5d24 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O1 -fdump-tree-optimized" } */ +/* { dg-options "-O1 -fthread-jumps -fdump-tree-optimized" } */ int foo (int a) { -- cgit v1.1 From fb8b72ebb5b0bf40f7dfef9154c42320ce46f2a7 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Tue, 28 Sep 2021 09:38:50 +0200 Subject: Return VARYING in range_on_path_entry if nothing found. The problem here is that the solver's code solving unknown SSAs on entry to a path was returning UNDEFINED if there were no incoming edges to the start of the path that were not the function entry block. This caused a cascade of pain down stream. Tested on x86-64 Linux. PR tree-optimization/102511 gcc/ChangeLog: * gimple-range-path.cc (path_range_query::range_on_path_entry): Return VARYING when nothing found. gcc/testsuite/ChangeLog: * gcc.dg/pr102511.c: New test. * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-14.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-14.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-14.c index 3bc4b37..a25fe8b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-14.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-14.c @@ -37,5 +37,5 @@ expand_shift_1 (int code, int unsignedp, int rotate, we will enter the TRUE arm of the conditional and we can thread the test to compute the first first argument of the expand_binop call if we look backwards through the boolean logicals. */ -/* { dg-final { scan-tree-dump-times "Threaded" 1 "dom2"} } */ +/* { dg-final { scan-tree-dump-times "Threaded" 2 "dom2"} } */ -- cgit v1.1 From 5b8b1522e04adc20980f396571be1929a32d148a Mon Sep 17 00:00:00 2001 From: Richard Biener Date: Mon, 27 Sep 2021 12:01:38 +0200 Subject: tree-optimization/100112 - VN last_vuse and redundant store elimination This avoids the last_vuse optimization hindering redundant store elimination by always also recording the original VUSE that was in effect on the load. In stage3 gcc/*.o we have 3182752 times recorded a single entry and 903409 times two entries (that's ~20% overhead). With just recording a single entry the number of hashtable lookups done when walking the vuse->vdef links to find an earlier access is 28961618. When recording the second entry this makes us find that earlier for donwnstream redundant accesses, reducing the number of hashtable lookups to 25401052 (that's a ~10% reduction). 2021-09-27 Richard Biener PR tree-optimization/100112 * tree-ssa-sccvn.c (visit_reference_op_load): Record the referece into the hashtable twice in case last_vuse is different from the original vuse on the stmt. * gcc.dg/tree-ssa/ssa-fre-95.c: New testcase. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c new file mode 100644 index 0000000..b0936be --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c @@ -0,0 +1,25 @@ +/* PR100112 and dups. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-fre1-details -fdump-tree-optimized" } */ + +int *c, *b; +void foo() +{ + int *tem = b; + *tem = 0; + int *footem = c; + c = footem; +} + +void bar() +{ + int *tem = b; + int *bartem = c; + *tem = 0; + c = bartem; +} + +/* We should elide the redundant store in foo, in bar it is not redundant since + the *tem = 0 store might alias. */ +/* { dg-final { scan-tree-dump "Deleted redundant store c = footem" "fre1" } } */ +/* { dg-final { scan-tree-dump "c = bartem" "optimized" } } */ -- cgit v1.1 From 34b1e44e166c58df20a15cb35b6cc8d4d299d415 Mon Sep 17 00:00:00 2001 From: Richard Biener Date: Tue, 28 Sep 2021 12:48:50 +0200 Subject: tree-optimization/99793 - testcase for the PR This adds a testcase for the PR which was fixed with the fix for PR100112. 2021-09-28 Richard Biener PR tree-optimization/99793 * gcc.dg/tree-ssa/pr99793.c: New testcase. --- gcc/testsuite/gcc.dg/tree-ssa/pr99793.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99793.c (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99793.c b/gcc/testsuite/gcc.dg/tree-ssa/pr99793.c new file mode 100644 index 0000000..9127449 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99793.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fstrict-aliasing -fdump-tree-optimized" } */ + +extern void foo(void); +static int a, *b = &a, c, *d = &c; +int main() +{ + int **e = &d; + if (!((unsigned)((*e = d) == 0) - (*b = 1))) + foo(); + return 0; +} + +/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */ -- cgit v1.1 From c32f7df917b01c3636aa85916a36264e807ced9d Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Tue, 28 Sep 2021 11:33:11 +0200 Subject: Enable jump threading at -O1. My previous patch gating all jump threading by -fthread-jumps had the side effect of turning off DOM jump threading at -O1. This causes numerous -Wuninitialized false positives. This patch turns on jump threading at -O1 to minimize the disruption. gcc/ChangeLog: * cfgcleanup.c (pass_jump::execute): Check flag_expensive_optimizations. (pass_jump_after_combine::gate): Same. * doc/invoke.texi (-fthread-jumps): Enable for -O1. * opts.c (default_options_table): Enable -fthread-jumps at -O1. * tree-ssa-threadupdate.c (fwd_jt_path_registry::remove_jump_threads_including): Bail unless flag_thread_jumps. gcc/testsuite/ChangeLog: * gcc.dg/auto-init-uninit-1.c: Adjust. * gcc.dg/auto-init-uninit-15.c: Same. * gcc.dg/guality/example.c: Same. * gcc.dg/loop-8.c: Same. * gcc.dg/strlenopt-40.c: Same. * gcc.dg/tree-ssa/pr18133-2.c: Same. * gcc.dg/tree-ssa/pr18134.c: Same. * gcc.dg/uninit-1.c: Same. * gcc.dg/uninit-pr44547.c: Same. * gcc.dg/uninit-pr59970.c: Same. --- gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr18134.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c index 1b40985..8717640 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr18133-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O1 -fthread-jumps -fdump-tree-optimized-blocks" } */ +/* { dg-options "-O1 -fdump-tree-optimized-blocks" } */ int c, d; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c b/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c index d7f5d24..cd40ab2 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr18134.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O1 -fthread-jumps -fdump-tree-optimized" } */ +/* { dg-options "-O1 -fdump-tree-optimized" } */ int foo (int a) { -- cgit v1.1 From 92cdd338fdbfd33f72e645af1caaa1a357fe9839 Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Tue, 14 Sep 2021 21:31:31 +0200 Subject: reassoc: Test rank biasing Add both positive and negative tests. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/reassoc-46.c: New test. * gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests. * gcc.dg/tree-ssa/reassoc-47.c: New test. * gcc.dg/tree-ssa/reassoc-48.c: New test. * gcc.dg/tree-ssa/reassoc-49.c: New test. * gcc.dg/tree-ssa/reassoc-50.c: New test. * gcc.dg/tree-ssa/reassoc-51.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c | 7 +++++++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++++++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c | 9 ++++++++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c | 9 ++++++++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 ++++++++++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++++++++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 ++++++++++ 7 files changed, 90 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c new file mode 100644 index 0000000..97563dd --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#include "reassoc-46.h" + +/* Check that the loop accumulator is added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h new file mode 100644 index 0000000..e60b490 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h @@ -0,0 +1,33 @@ +#define M 1024 +unsigned int arr1[M]; +unsigned int arr2[M]; +volatile unsigned int sink; + +unsigned int +test (void) +{ + unsigned int sum = 0; + for (int i = 0; i < M; i++) + { +#ifdef MODIFY + /* Modify the loop accumulator using a chain of operations - this should + not affect its rank biasing. */ + sum |= 1; + sum ^= 2; +#endif +#ifdef STORE + /* Save the loop accumulator into a global variable - this should not + affect its rank biasing. */ + sink = sum; +#endif +#ifdef USE + /* Add a tricky use of the loop accumulator - this should prevent its + rank biasing. */ + i = (i + sum) % M; +#endif + /* Use addends with different ranks. */ + sum += arr1[i]; + sum += arr2[((i ^ 1) + 1) % M]; + } + return sum; +} diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c new file mode 100644 index 0000000..1b0f0fd --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define MODIFY +#include "reassoc-46.h" + +/* Check that if the loop accumulator is saved into a global variable, it's + still added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c new file mode 100644 index 0000000..13836eb --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define STORE +#include "reassoc-46.h" + +/* Check that if the loop accumulator is modified using a chain of operations + other than addition, its new value is still added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c new file mode 100644 index 0000000..c1136a4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define MODIFY +#define STORE +#include "reassoc-46.h" + +/* Check that if the loop accumulator is both modified using a chain of + operations other than addition and stored into a global variable, its new + value is still added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c new file mode 100644 index 0000000..e35a4ff --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define MODIFY +#define USE +#include "reassoc-46.h" + +/* Check that if the loop accumulator has multiple uses inside the loop, it's + not forced to the end of the reassociation chain. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 2 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c new file mode 100644 index 0000000..0717567 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define MODIFY +#define STORE +#define USE +#include "reassoc-46.h" + +/* Check that if the loop accumulator has multiple uses inside the loop, it's + not forced to the end of the reassociation chain. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 2 "optimized" } } */ -- cgit v1.1 From 24e30f485bc80c823393d8fc62b65f860230e04b Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Tue, 28 Sep 2021 17:53:57 +0200 Subject: [PR102501] Adjust jump threading testcases for ppc64* and others. I really don't know what to do here. This is a bit of whack-o-mole. The IL is sufficiently different for various architectures that any tweak can cause the number of jump threads to vary. For the pr7745-2.c testcase, we have less threading candidates because 2 of them now cross loop boundaries. Interestingly, this test matches "Jumps threaded", not threads registered, so the block copier can drop threads at copying time adding further confusion. For example, we can register N threads, but the old copier can cancel N-M threads while updating the CFG for a variety of different reasons (removed edges, threading through loop exits, etc). This makes the "Registering jump threads" not to match the total number of threads this test checks for with "Jumps threaded". The pr66752-3.c test OTOH, is just a matter of thread4 eliminating the "if". I had erroneously thought it would always be eliminated by thread3, but we really don't care where it gets cleaned up. All we know is that DCE can't depend on the early threaders doing this work, because it may cross loop boundaries. I've chosen thread4 arbitrarily, but we could just as easily pick the ".optimized" dump. Sorry, I'm really at my wits end here. I don't see any clean path forward, except rewrite these tests as gimple IL. They're close to useless as they sit. gcc/testsuite/ChangeLog: PR testsuite/102501 * gcc.dg/tree-ssa/pr66752-3.c: Adjust. * gcc.dg/tree-ssa/pr77445-2.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c index 922a331..ba7025a 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread4" } */ extern int status, pt; extern int count; @@ -43,4 +43,4 @@ foo (int N, int c, int b, int *a) run after loop optimizations , can successfully eliminate the references to FLAG. Verify that ther are no references by the late threading passes. */ -/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */ +/* { dg-final { scan-tree-dump-not "if .flag" "thread4"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c index 01a0f1f..18f7aab 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c @@ -123,7 +123,7 @@ enum STATES FMS( u8 **in , u32 *transitions) { aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough to change decisions in switch expansion which in turn can expose new jump threading opportunities. Skip the later tests on aarch64. */ -/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: \[7-9\]" "thread1" } } */ /* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */ -- cgit v1.1 From 5f9ccf17de7f7581412c6bffd4a37beca9a79836 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Fri, 1 Oct 2021 13:05:36 +0200 Subject: [PR102546] X << Y being non-zero implies X is also non-zero. This patch teaches this to range-ops. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/102546 * range-op.cc (operator_lshift::op1_range): Teach range-ops that X << Y is non-zero implies X is also non-zero. --- gcc/testsuite/gcc.dg/tree-ssa/pr102546.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102546.c (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c new file mode 100644 index 0000000..4bd9874 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c @@ -0,0 +1,23 @@ +// { dg-do compile } +// { dg-options "-O3 -fdump-tree-optimized" } + +static int a; +static char b, c, d; +void bar(void); +void foo(void); + +int main() { + int f = 0; + for (; f <= 5; f++) { + bar(); + b = b && f; + d = f << f; + if (!(a >= d || f)) + foo(); + c = 1; + for (; c; c = 0) + ; + } +} + +// { dg-final { scan-tree-dump-not "foo" "optimized" } } -- cgit v1.1 From 6c0dd02964a624c65859808f9a40721c3796319a Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Sat, 2 Oct 2021 16:59:26 +0200 Subject: [PR102563] Do not clobber range in operator_lshift::op1_range. We're clobbering the final range before we're done calculating it. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/102563 * range-op.cc (operator_lshift::op1_range): Do not clobber range. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr102563.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/pr102563.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102563.c (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102563.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102563.c new file mode 100644 index 0000000..8871dff --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102563.c @@ -0,0 +1,16 @@ +// { dg-do compile } +// { dg-options "-O2 -w" } + +int _bdf_parse_glyphs_bp; +long _bdf_parse_glyphs_nibbles; + +void _bdf_parse_glyphs_p() +{ + long p_2; + + _bdf_parse_glyphs_nibbles = p_2 << 1; + + for (; 0 < _bdf_parse_glyphs_nibbles;) + if (1 < _bdf_parse_glyphs_nibbles) + _bdf_parse_glyphs_bp = _bdf_parse_glyphs_p; +} -- cgit v1.1 From 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 Mon Sep 17 00:00:00 2001 From: Richard Biener Date: Mon, 4 Oct 2021 10:57:45 +0200 Subject: tree-optimization/102570 - teach VN about internal functions We're now using internal functions for a lot of stuff but there's still missing VN support out of laziness. The following instantiates support and adds testcases for FRE and PRE (hoisting). 2021-10-04 Richard Biener PR tree-optimization/102570 * tree-ssa-sccvn.h (vn_reference_op_struct): Document we are using clique for the internal function code. * tree-ssa-sccvn.c (vn_reference_op_eq): Compare the internal function code. (print_vn_reference_ops): Print the internal function code. (vn_reference_op_compute_hash): Hash it. (copy_reference_ops_from_call): Record it. (visit_stmt): Remove the restriction around internal function calls. (fully_constant_vn_reference_p): Use fold_const_call and handle internal functions. (vn_reference_eq): Compare call return types. * tree-ssa-pre.c (create_expression_by_pieces): Handle generating calls to internal functions. (compute_avail): Remove the restriction around internal function calls. * gcc.dg/tree-ssa/ssa-fre-96.c: New testcase. * gcc.dg/tree-ssa/ssa-pre-33.c: Likewise. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c | 14 ++++++++++++++ gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c | 15 +++++++++++++++ 2 files changed, 29 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c new file mode 100644 index 0000000..fd1d571 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-fre1" } */ + +_Bool f1(unsigned x, unsigned y, unsigned *res) +{ + _Bool t = __builtin_add_overflow(x, y, res); + unsigned res1; + _Bool t1 = __builtin_add_overflow(x, y, &res1); + *res -= res1; + return t==t1; +} + +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 1 "fre1" } } */ +/* { dg-final { scan-tree-dump "return 1;" "fre1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c new file mode 100644 index 0000000..3b3bd62 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-pre" } */ + +_Bool f1(unsigned x, unsigned y, unsigned *res, int flag, _Bool *t) +{ + if (flag) + *t = __builtin_add_overflow(x, y, res); + unsigned res1; + _Bool t1 = __builtin_add_overflow(x, y, &res1); + *res -= res1; + return *t==t1; +} + +/* We should hoist the .ADD_OVERFLOW to before the check. */ +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 1 "pre" } } */ -- cgit v1.1 From ec0124e0acb556cdf5dba0e8d0ca6b69d9537fcc Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Tue, 5 Oct 2021 15:03:34 +0200 Subject: Loosen loop crossing restriction in threader. Crossing loops is generally discouraged from the threader, but we can make an exception when we don't cross the latch or enter another loop, since this is just an early exit out of the loop. In fact, the whole threaded path is logically outside the loop. This has nice secondary effects. For example, objects on the threaded path will no longer necessarily be live throughout the loop, so we can get register allocation improvements. The threaded path can physically move outside the loop resulting in better icache efficiency, etc. Tested on x86-64 Linux, and on a visium-elf cross making sure that the following tests do not have an abort in the final assembly: gcc.c-torture/execute/960218-1.c gcc.c-torture/execute/visium-pending-4.c gcc.c-torture/execute/pr58209.c gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): Loosen restrictions gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-thread-valid.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c | 39 ++++++++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c (limited to 'gcc/testsuite/gcc.dg/tree-ssa') diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c new file mode 100644 index 0000000..7adca97 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c @@ -0,0 +1,39 @@ +// { dg-do compile } +// { dg-options "-O2 -fgimple -fdump-statistics" } + +// This is a collection of threadable paths. To simplify maintenance, +// there should only be one threadable path per function. + +int global; + +// The thread from 3->4->5 crosses loops but is allowed because it +// never crosses the latch (BB3) and is just an early exit out of the +// loop. +int __GIMPLE (ssa) +foo1 (int x) +{ + int D_1420; + int a; + + __BB(2): + a_4 = ~x_3(D); + goto __BB4; + + // Latch. + __BB(3): + global = a_1; + goto __BB4; + + __BB(4,loop_header(1)): + a_1 = __PHI (__BB2: a_4, __BB3: 0); + if (a_1 != 0) + goto __BB3; + else + goto __BB5; + + __BB(5): + return; + +} + +// { dg-final { scan-tree-dump "Jumps threaded\" \"foo1\" 1" "statistics" } } -- cgit v1.1