diff options
author | Luis Machado <luis.machado@linaro.org> | 2018-05-23 16:20:30 +0000 |
---|---|---|
committer | Luis Machado <luisgpm@gcc.gnu.org> | 2018-05-23 16:20:30 +0000 |
commit | 59100dfc42bbe92caff61bca1560da4a30f99906 (patch) | |
tree | 4e9671a83dcf4509e6aa0d832c7fcf438751424a /gcc/tree-ssa-loop-prefetch.c | |
parent | cf290ea3255625513b6ad5a5c4e189c833a67a92 (diff) | |
download | gcc-59100dfc42bbe92caff61bca1560da4a30f99906.zip gcc-59100dfc42bbe92caff61bca1560da4a30f99906.tar.gz gcc-59100dfc42bbe92caff61bca1560da4a30f99906.tar.bz2 |
[Patch 01/02] Introduce prefetch-minimum stride option
This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:
* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.
* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.
I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.
The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.
gcc/ChangeLog:
2018-05-23 Luis Machado <luis.machado@linaro.org>
* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
<minimum_stride>: New const int field.
* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
minimum_stride field defaulting to -1.
(exynosm1_prefetch_tune): Likewise.
(thunderxt88_prefetch_tune): Likewise.
(thunderx_prefetch_tune): Likewise.
(thunderx2t99_prefetch_tune): Likewise.
(qdf24xx_prefetch_tune) <minimum_stride>: Set to 2048.
<default_opt_level>: Set to 3.
(aarch64_override_options_internal): Update to set
PARAM_PREFETCH_MINIMUM_STRIDE.
* doc/invoke.texi (prefetch-minimum-stride): Document new option.
* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
stride is constant and is below the minimum stride threshold.
From-SVN: r260617
Diffstat (limited to 'gcc/tree-ssa-loop-prefetch.c')
-rw-r--r-- | gcc/tree-ssa-loop-prefetch.c | 17 |
1 files changed, 17 insertions, 0 deletions
diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c index 2f10db1..ac89bf7 100644 --- a/gcc/tree-ssa-loop-prefetch.c +++ b/gcc/tree-ssa-loop-prefetch.c @@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups) static bool should_issue_prefetch_p (struct mem_ref *ref) { + /* Some processors may have a hardware prefetcher that may conflict with + prefetch hints for a range of strides. Make sure we don't issue + prefetches for such cases if the stride is within this particular + range. */ + if (cst_and_fits_in_hwi (ref->group->step) + && abs_hwi (int_cst_value (ref->group->step)) + < (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Step for reference %u:%u (%ld) is less than the mininum " + "required stride of %d\n", + ref->group->uid, ref->uid, int_cst_value (ref->group->step), + PREFETCH_MINIMUM_STRIDE); + return false; + } + /* For now do not issue prefetches for only first few of the iterations. */ if (ref->prefetch_before != PREFETCH_ALL) |