riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Ju-Zhe Zhong <juzhe.zhong@rivai.ai>	2023-07-10 19:35:47 +0800
committer	Pan Li <pan2.li@intel.com>	2023-07-11 20:39:58 +0800
commit	6c96d1e4e877d045f7e0c22786244653d2e1cf99 (patch)
tree	83c6c825b50a96332cfba14e6851d4cced56693c /libgcc/floatunsixf.c
parent	4736ddd11874fe215662ac18877ce8eded1f5976 (diff)
download	gcc-6c96d1e4e877d045f7e0c22786244653d2e1cf99.zip gcc-6c96d1e4e877d045f7e0c22786244653d2e1cf99.tar.gz gcc-6c96d1e4e877d045f7e0c22786244653d2e1cf99.tar.bz2

VECT: Add COND_LEN_* operations for loop control with length targets

Hi, Richard and Richi. This patch is adding cond_len_* operations pattern for target support loop control with length. These patterns will be used in these following case: 1. Integer division: void f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n) { for (int i = 0; i < n; ++i) { a[i] = b[i] / c[i]; } } ARM SVE IR: ... max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... }); Loop: ... # loop_mask_29 = PHI <next_mask_37(4), max_mask_36(3)> ... vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29); ... vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29); vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, vect__4.8_28); ... .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24); ... next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... }); ... For target like RVV who support loop control with length, we want to see IR as follows: Loop: ... # loop_len_29 = SELECT_VL ... vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29); ... vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29); vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, vect__4.8_28, loop_len_29, bias); ... .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24); ... next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... }); ... Notice here, we use dummp_mask = { -1, -1, .... , -1 } 2. Integer conditional division: Similar case with (1) but with condtion: void f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * cond, int n) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i] / c[i]; } } ARM SVE: ... max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... }); Loop: ... # loop_mask_55 = PHI <next_mask_77(5), max_mask_76(4)> ... vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55); mask__29.10_58 = vect__4.9_56 != { 0, ... }; vec_mask_and_61 = loop_mask_55 & mask__29.10_58; ... vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61); ... vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61); vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, vect__6.13_62); ... .MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68); ... next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... }); Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to gurantee the correct result. However, target with length control can not perform this elegant flow, for RVV, we would expect: Loop: ... loop_len_55 = SELECT_VL ... mask__29.10_58 = vect__4.9_56 != { 0, ... }; ... vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, vect__8.16_66, vect__6.13_62, loop_len_55, bias); ... Here we expect COND_LEN_DIV predicated by a real mask which is the outcome of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... }; and a real length which is produced by loop control : loop_len_55 = SELECT_VL 3. conditional Floating-point operations (no -ffast-math): void f (float *restrict a, float *restrict b, int32_t *restrict cond, int n) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i] + a[i]; } } ARM SVE IR: max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... }); ... # loop_mask_49 = PHI <next_mask_71(4), max_mask_70(3)> ... mask__27.10_52 = vect__4.9_50 != { 0, ... }; vec_mask_and_55 = loop_mask_49 & mask__27.10_52; ... vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56); ... next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... }); ... For RVV, we would expect IR: ... loop_len_49 = SELECT_VL ... mask__27.10_52 = vect__4.9_50 != { 0, ... }; ... vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, vect__8.16_60, vect__6.13_56, loop_len_49, bias); ... 4. Conditional un-ordered reduction: int32_t f (int32_t *restrict a, int32_t *restrict cond, int n) { int32_t result = 0; for (int i = 0; i < n; ++i) { if (cond[i]) result += a[i]; } return result; } ARM SVE IR: Loop: # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)> ... # loop_mask_40 = PHI <next_mask_58(4), max_mask_57(3)> ... mask__17.11_43 = vect__4.10_41 != { 0, ... }; vec_mask_and_46 = loop_mask_40 & mask__17.11_43; ... vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37); ... next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... }); ... Epilogue: _53 = .REDUC_PLUS (vect__33.16_51); [tail call] For RVV, we expect: Loop: # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)> ... loop_len_40 = SELECT_VL ... mask__17.11_43 = vect__4.10_41 != { 0, ... }; ... vect__33.16_51 = .COND_LEN_ADD (mask__17.11_43, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37, loop_len_40, bias); ... next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... }); ... Epilogue: _53 = .REDUC_PLUS (vect__33.16_51); [tail call] I name these patterns as "cond_len_*" since I want the length operand comes after mask operand and all other operands except length operand same order as "cond_*" patterns. Such order will make life easier in the following loop vectorizer support. gcc/ChangeLog: * doc/md.texi: Add COND_LEN_* operations for loop control with length. * internal-fn.cc (cond_len_unary_direct): Ditto. (cond_len_binary_direct): Ditto. (cond_len_ternary_direct): Ditto. (expand_cond_len_unary_optab_fn): Ditto. (expand_cond_len_binary_optab_fn): Ditto. (expand_cond_len_ternary_optab_fn): Ditto. (direct_cond_len_unary_optab_supported_p): Ditto. (direct_cond_len_binary_optab_supported_p): Ditto. (direct_cond_len_ternary_optab_supported_p): Ditto. * internal-fn.def (COND_LEN_ADD): Ditto. (COND_LEN_SUB): Ditto. (COND_LEN_MUL): Ditto. (COND_LEN_DIV): Ditto. (COND_LEN_MOD): Ditto. (COND_LEN_RDIV): Ditto. (COND_LEN_MIN): Ditto. (COND_LEN_MAX): Ditto. (COND_LEN_FMIN): Ditto. (COND_LEN_FMAX): Ditto. (COND_LEN_AND): Ditto. (COND_LEN_IOR): Ditto. (COND_LEN_XOR): Ditto. (COND_LEN_SHL): Ditto. (COND_LEN_SHR): Ditto. (COND_LEN_FMA): Ditto. (COND_LEN_FMS): Ditto. (COND_LEN_FNMA): Ditto. (COND_LEN_FNMS): Ditto. (COND_LEN_NEG): Ditto. * optabs.def (OPTAB_D): Ditto.

Diffstat (limited to 'libgcc/floatunsixf.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: