aboutsummaryrefslogtreecommitdiff
path: root/gcc/system.h
diff options
context:
space:
mode:
authorChristoph Müllner <christoph.muellner@vrull.eu>2024-12-05 20:39:25 +0100
committerChristoph Müllner <christoph.muellner@vrull.eu>2024-12-20 10:43:56 +0100
commiteee2891312a9b42acabcc82739604c9fa8421757 (patch)
treef708a29308d0316833a83e19e81f8d130756d519 /gcc/system.h
parent8af296c290216e03bc20e7291e64c19e0d94cfd6 (diff)
downloadgcc-eee2891312a9b42acabcc82739604c9fa8421757.zip
gcc-eee2891312a9b42acabcc82739604c9fa8421757.tar.gz
gcc-eee2891312a9b42acabcc82739604c9fa8421757.tar.bz2
forwprop: Fix lane handling for VEC_PERM sequence blending
In PR117830 a miscompilation of 464.h264ref was reported. An analysis showed that wrong code was generated because of unsatisfied assumptions. This patch addresses these issues. The first assumption was that we could independently analyze the two vec-perms at the start of a vec-perm-simplify sequence and use the information later for calculating a final vec-perm selector that utilizes fewer lanes. However, this information does not help much, because for changing the selector entry, we need to ensure that both elements of the operand vectors v_1 and v_2 remain equal. This is addressed by removing the function get_vect_selector_index_map and checking for this equality in the loop where we create the new selector. The calculation of the selector vector for the blended sequence assumed that the indices of the selector vector of the narrowed sequences are increasing. This assumption does not hold in general. This was fixed by allowing a wrap-around when searching for an empty lane. Further, there was an issue in the calculation of the selector vector entries for the second sequence. The code did not consider that the lanes of the second sequence could have been moved. A relevant property of this patch is that it introduces a couple of nested loops, where the out loop iterates from i=0..nelts and the inner loop iterates from j=0..i. To avoid performance concerns, a check is introduced that ensures nelts won't exceed 4 lanes. The added test case is derived from h264ref (the other cases from the benchmark have the same structure and don't provide additional coverage). Bootstrapped and regression-tested on x86-64 and aarch64. Further, tested on CPU 2006 h264ref and CPU 2017 x264. PR117830 gcc/ChangeLog: * tree-ssa-forwprop.cc (get_vect_selector_index_map): Removed. (recognise_vec_perm_simplify_seq): Fix calculation of vec-perm selectors of narrowed sequence. (calc_perm_vec_perm_simplify_seqs): Fixing calculation of vec-perm selectors of the blended sequence. (process_vec_perm_simplify_seq_list): Add whitespace to dump string to avoid bad formatted dump output. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vector-11.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Diffstat (limited to 'gcc/system.h')
0 files changed, 0 insertions, 0 deletions