riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Jan Hubicka <jh@suse.cz>	2022-12-22 02:16:24 +0100
committer	Jan Hubicka <jh@suse.cz>	2022-12-22 02:16:24 +0100
commit	bbe04bade0cc3b17e62c2af3d89b899367e7d2d1 (patch)
tree	b51f81927c88afeafd58c432bc618ec4c3b4257b /gcc/lto
parent	de282a2012049ea7d1236f8cb6f946385057c20f (diff)
download	gcc-bbe04bade0cc3b17e62c2af3d89b899367e7d2d1.zip gcc-bbe04bade0cc3b17e62c2af3d89b899367e7d2d1.tar.gz gcc-bbe04bade0cc3b17e62c2af3d89b899367e7d2d1.tar.bz2

Update znver4 costs

Update cost of znver4 mostly based on data measued by Agner Fog. Compared to previous generations x87 became bit slower which is probably not big deal (and we have minimal benchmarking coverage for it). One interesting improvement is reducation of FMA cost. I also updated costs of AVX256 loads/stores based on latencies (not throughput which is twice of avx256). Overall AVX512 vectorization seems to improve noticeably some of TSVC benchmarks but since internally 512 vectors are split to 256 vectors it is somewhat risky and does not win in SPEC scores (mostly by regressing benchmarks with loop that have small trip count like x264 and exchange), so for now I am going to set AVX256_OPTIMAL tune but I am still playing with it. We improved since ZNVER1 on choosing vectorization size and also have vectorized prologues/epilogues so it may be possible to make avx512 small win overall. 2022-12-22 Jan Hubicka <hubicka@ucw.cz> * config/i386/x86-tune-costs.h (znver4_cost): Upate costs of FP and SSE moves, division multiplication, gathers, L2 cache size, and more complex FP instrutions.

Diffstat (limited to 'gcc/lto')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: