riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Christoph Müllner <christoph.muellner@vrull.eu>	2024-12-05 20:39:25 +0100
committer	Christoph Müllner <christoph.muellner@vrull.eu>	2024-12-20 10:43:56 +0100
commit	eee2891312a9b42acabcc82739604c9fa8421757 (patch)
tree	f708a29308d0316833a83e19e81f8d130756d519 /gcc/system.h
parent	8af296c290216e03bc20e7291e64c19e0d94cfd6 (diff)
download	gcc-eee2891312a9b42acabcc82739604c9fa8421757.zip gcc-eee2891312a9b42acabcc82739604c9fa8421757.tar.gz gcc-eee2891312a9b42acabcc82739604c9fa8421757.tar.bz2

forwprop: Fix lane handling for VEC_PERM sequence blending

In PR117830 a miscompilation of 464.h264ref was reported. An analysis showed that wrong code was generated because of unsatisfied assumptions. This patch addresses these issues. The first assumption was that we could independently analyze the two vec-perms at the start of a vec-perm-simplify sequence and use the information later for calculating a final vec-perm selector that utilizes fewer lanes. However, this information does not help much, because for changing the selector entry, we need to ensure that both elements of the operand vectors v_1 and v_2 remain equal. This is addressed by removing the function get_vect_selector_index_map and checking for this equality in the loop where we create the new selector. The calculation of the selector vector for the blended sequence assumed that the indices of the selector vector of the narrowed sequences are increasing. This assumption does not hold in general. This was fixed by allowing a wrap-around when searching for an empty lane. Further, there was an issue in the calculation of the selector vector entries for the second sequence. The code did not consider that the lanes of the second sequence could have been moved. A relevant property of this patch is that it introduces a couple of nested loops, where the out loop iterates from i=0..nelts and the inner loop iterates from j=0..i. To avoid performance concerns, a check is introduced that ensures nelts won't exceed 4 lanes. The added test case is derived from h264ref (the other cases from the benchmark have the same structure and don't provide additional coverage). Bootstrapped and regression-tested on x86-64 and aarch64. Further, tested on CPU 2006 h264ref and CPU 2017 x264. PR117830 gcc/ChangeLog: * tree-ssa-forwprop.cc (get_vect_selector_index_map): Removed. (recognise_vec_perm_simplify_seq): Fix calculation of vec-perm selectors of narrowed sequence. (calc_perm_vec_perm_simplify_seqs): Fixing calculation of vec-perm selectors of the blended sequence. (process_vec_perm_simplify_seq_list): Add whitespace to dump string to avoid bad formatted dump output. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vector-11.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Diffstat (limited to 'gcc/system.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: