aboutsummaryrefslogtreecommitdiff
path: root/gcc/fortran
diff options
context:
space:
mode:
authorHongyu Wang <hongyu.wang@intel.com>2022-09-08 16:52:02 +0800
committerHongyu Wang <hongyu.wang@intel.com>2022-11-14 13:38:06 +0800
commit071e428c24ee8c1ed062597a093708bba29509c9 (patch)
tree3a4eccc9a05e11746e0d9b14146c213fae270ee6 /gcc/fortran
parent5f2ce01a214177460e566d85bf4d44aa18432ed9 (diff)
downloadgcc-071e428c24ee8c1ed062597a093708bba29509c9.zip
gcc-071e428c24ee8c1ed062597a093708bba29509c9.tar.gz
gcc-071e428c24ee8c1ed062597a093708bba29509c9.tar.bz2
Enable small loop unrolling for O2
Modern processors has multiple way instruction decoders For x86, icelake/zen3 has 5 uops, so for small loop with <= 4 instructions (usually has 3 uops with a cmp/jmp pair that can be macro-fused), the decoder would have 2 uops bubble for each iteration and the pipeline could not be fully utilized. Therefore, this patch enables loop unrolling for small size loop at O2 to fullfill the decoder as much as possible. It turns on rtl loop unrolling when targetm.loop_unroll_adjust exists and O2 plus speed only. In x86 backend the default behavior is to unroll small loops with less than 4 insns by 1 time. This improves 548.exchange2 by 9% on icelake and 7.4% on zen3 with 0.9% codesize increment. For other benchmarks the variants are minor and overall codesize increased by 0.2%. The kernel image size increased by 0.06%, and no impact on eembc. gcc/ChangeLog: * common/config/i386/i386-common.cc (ix86_optimization_table): Enable small loop unroll at O2 by default. * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll factor if -munroll-only-small-loops enabled and -funroll-loops/ -funroll-all-loops are disabled. * config/i386/i386.h (struct processor_costs): Add 2 field small_unroll_ninsns and small_unroll_factor. * config/i386/i386.opt: Add -munroll-only-small-loops. * doc/gcc/gcc-command-options/machine-dependent-options/x86-options.rst: Document -munroll-only-small-loops. * doc/gcc/gcc-command-options/option-summary.rst: Likewise. * loop-init.cc (pass_rtl_unroll_loops::gate): Enable rtl loop unrolling for -O2-speed and above if target hook loop_unroll_adjust exists. (pass_rtl_unroll_loops::execute): Set UAP_UNROLL flag when target hook loop_unroll_adjust exists. * config/i386/x86-tune-costs.h: Update all processor costs with small_unroll_ninsns = 4 and small_unroll_factor = 2. gcc/testsuite/ChangeLog: * gcc.dg/guality/loop-1.c: Add additional option -mno-unroll-only-small-loops. * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops. * gcc.target/i386/pr93002.c: Likewise.
Diffstat (limited to 'gcc/fortran')
0 files changed, 0 insertions, 0 deletions