aboutsummaryrefslogtreecommitdiff
path: root/gcc/fortran/trans-expr.c
diff options
context:
space:
mode:
authorKewen Lin <linkw@linux.ibm.com>2021-11-29 21:22:27 -0600
committerKewen Lin <linkw@linux.ibm.com>2021-11-29 21:22:27 -0600
commitaca68829d723a11f73b84d59401568015959f432 (patch)
treea230fea6c4c6db4a6a779398a6cdbb31ee925b66 /gcc/fortran/trans-expr.c
parentbcb163eee8c290a1c023f89b401ba7406dcac605 (diff)
downloadgcc-aca68829d723a11f73b84d59401568015959f432.zip
gcc-aca68829d723a11f73b84d59401568015959f432.tar.gz
gcc-aca68829d723a11f73b84d59401568015959f432.tar.bz2
rs6000: Modify the way for extra penalized cost
This patch follows the discussions here[1][2], where Segher pointed out the existing way to guard the extra penalized cost for strided/elementwise loads with a magic bound does not scale. The way with nunits * stmt_cost can get one much exaggerated penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320, that's why we need one bound. To make it better and more readable, the penalized cost is simplified as: unsigned adjusted_cost = (nunits == 2) ? 2 : 1; unsigned extra_cost = nunits * adjusted_cost; For V2DI/V2DF, it uses 2 penalized cost for each scalar load while for the other modes, it uses 1. It's mainly concluded from the performance evaluations. One thing might be related is that: More units vector gets constructed, more instructions are used. It has more chances to schedule them better (even run in parallelly when enough available units at that time), so it seems reasonable not to penalize more for them. The SPEC2017 evaluations on Power8/Power9/Power10 at option sets O2-vect and Ofast-unroll show this change is neutral. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580099.html gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_cost_data::update_target_cost_per_stmt): Adjust the way to compute extra penalized cost. Remove useless parameter. (rs6000_cost_data::rs6000_add_stmt_cost): Adjust the call to function update_target_cost_per_stmt.
Diffstat (limited to 'gcc/fortran/trans-expr.c')
0 files changed, 0 insertions, 0 deletions