diff options
author | Kyrylo Tkachov <ktkachov@nvidia.com> | 2024-09-20 05:11:39 -0700 |
---|---|---|
committer | Kyrylo Tkachov <ktkachov@nvidia.com> | 2024-10-04 15:15:11 +0200 |
commit | f000cb8cbc58b23a91c84d47d69481904981a1d9 (patch) | |
tree | 16e280efda76efa9351eb4d5d7162df6e7cbff89 /gcc/config/aarch64 | |
parent | e1205218936e416881c4f465d51c033b29044d79 (diff) | |
download | gcc-f000cb8cbc58b23a91c84d47d69481904981a1d9.zip gcc-f000cb8cbc58b23a91c84d47d69481904981a1d9.tar.gz gcc-f000cb8cbc58b23a91c84d47d69481904981a1d9.tar.bz2 |
aarch64: Set Armv9-A generic L1 cache line size to 64 bytes
I'd like to use a value of 64 bytes for the L1 cache size for Armv9-A
generic tuning.
As described in g:9a99559a478111f7fbeec29bd78344df7651c707 this value is used
to set the std::hardware_destructive_interference_size value which we want to
be not overly large when running concurrent applications on large core-count
systems.
The generic value for Armv8-A systems and the port baseline is 256 bytes
because that's what the A64FX CPU has, as set de-facto in
aarch64_override_options_internal.
But for Armv9-A CPUs as far as I know there isn't anything larger
than 64 bytes, so we should be able to use the smaller value here and reduce
the size of concurrent structs that use
std::hardware_destructive_interference_size to pad their fields.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
* config/aarch64/tuning_models/generic_armv9_a.h
(generic_armv9a_prefetch_tune): Define.
(generic_armv9_a_tunings): Use the above.
Diffstat (limited to 'gcc/config/aarch64')
-rw-r--r-- | gcc/config/aarch64/tuning_models/generic_armv9_a.h | 14 |
1 files changed, 13 insertions, 1 deletions
diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h b/gcc/config/aarch64/tuning_models/generic_armv9_a.h index 999985e..76b3e4c 100644 --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h @@ -207,6 +207,18 @@ static const struct cpu_vector_cost generic_armv9_a_vector_cost = &generic_armv9_a_vec_issue_info /* issue_info */ }; +/* Generic prefetch settings (which disable prefetch). */ +static const cpu_prefetch_tune generic_armv9a_prefetch_tune = +{ + 0, /* num_slots */ + -1, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + -1, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + static const struct tune_params generic_armv9_a_tunings = { &cortexa76_extra_costs, @@ -239,7 +251,7 @@ static const struct tune_params generic_armv9_a_tunings = (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune, + &generic_armv9a_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ }; |