aboutsummaryrefslogtreecommitdiff
path: root/libstdc++-v3
diff options
context:
space:
mode:
authorPrathamesh Kulkarni <prathamesh.kulkarni@linaro.org>2023-08-16 16:51:44 +0530
committerPrathamesh Kulkarni <prathamesh.kulkarni@linaro.org>2023-08-16 16:51:44 +0530
commita7dba4a1c05a76026d88dcccc0b519cf83bff9a2 (patch)
tree16d14892f94dfc4edd4bbe0704429e781e2abbf9 /libstdc++-v3
parent1b7418ba1baf0d43fff6c6a68b8134813a35c1d9 (diff)
downloadgcc-a7dba4a1c05a76026d88dcccc0b519cf83bff9a2.zip
gcc-a7dba4a1c05a76026d88dcccc0b519cf83bff9a2.tar.gz
gcc-a7dba4a1c05a76026d88dcccc0b519cf83bff9a2.tar.bz2
Extend fold_vec_perm to handle VLA vector_cst.
The patch extends fold_vec_perm to fold VLA vector_csts. For eg: arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x sel = { 0, len, ...} npatterns = 2, nelts_per_pattern = 1, len = 4 + 4x res = VEC_PERM_EXPR<arg0, arg1, sel> --> { arg0[0], arg1[0], ... }, npatterns = 2, nelts_per_pattern = 1 Eg 2: arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x sel = {0, 1, 2, ...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x For this case the index 2 in sel is ambiguous for len 2 + 2x: if x = 0, runtime vector length = 2 and sel[i] will choose arg1[0] if x > 0, runtime vector length > 2 and sel[i] choose arg0[2]. So we return NULL_TREE for this case. This leads us to defining a constraint that a stepped sequence in sel, should only select a particular pattern from a particular input vector. Eg 3: arg0 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x arg1 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x sel = { len, 0, 2, ... } npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x sel contains a single pattern with stepped sequence: {0, 2, ...}. Let, a1 = the first element of stepped part of sequence, which is 0. Let esel = number of total elements in stepped sequence. Thus, esel = len / sel_npatterns = (4 + 4x) / 1 = 4 + 4x Let S = step of the sequence, which is 2 in this case. Let ae = last element of the stepped sequence. Thus, ae = a1 + (esel - 2) * S = 0 + (4 + 4x - 2) * 2 = 4 + 8x To ensure that we select elements from the same input vector, a1 /trunc len = ae /trunc len. Let, q1 = a1 /trunc len = 0 / (4 + 4x) = 0 Let, qe = ae /trunc len = (4 + 8x) / (4 + 4x) = 1 Since q1 != qe, we cross input vectors, and return NULL_TREE for this case. However, if sel was: sel = {len, 0, 1, ...} The only change in this case is S = 1. So, ae = a1 + (esel - 2) * S = 0 + (4 + 4x - 2) * 1 = 2 + 4x In this case, a1/len == ae/len == 0, and the stepped sequence chooses all elements from arg0. Thus, res = {arg1[0], arg0[0], arg0[1], ...} For VLA folding, sel has to conform to constraints imposed in valid_mask_for_fold_vec_perm_cst_p. test_fold_vec_perm_cst defines several unit-tests for VLA folding. gcc/ChangeLog: * fold-const.cc (INCLUDE_ALGORITHM): Add Include. (valid_mask_for_fold_vec_perm_cst_p): New function. (fold_vec_perm_cst): Likewise. (fold_vec_perm): Adjust assert and call fold_vec_perm_cst. (test_fold_vec_perm_cst): New namespace. (test_fold_vec_perm_cst::build_vec_cst_rand): New function. (test_fold_vec_perm_cst::validate_res): Likewise. (test_fold_vec_perm_cst::validate_res_vls): Likewise. (test_fold_vec_perm_cst::builder_push_elems): Likewise. (test_fold_vec_perm_cst::test_vnx4si_v4si): Likewise. (test_fold_vec_perm_cst::test_v4si_vnx4si): Likewise. (test_fold_vec_perm_cst::test_all_nunits): Likewise. (test_fold_vec_perm_cst::test_nunits_min_2): Likewise. (test_fold_vec_perm_cst::test_nunits_min_4): Likewise. (test_fold_vec_perm_cst::test_nunits_min_8): Likewise. (test_fold_vec_perm_cst::test_nunits_max_4): Likewise. (test_fold_vec_perm_cst::is_simple_vla_size): Likewise. (test_fold_vec_perm_cst::test): Likewise. (fold_const_cc_tests): Call test_fold_vec_perm_cst::test. Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Diffstat (limited to 'libstdc++-v3')
0 files changed, 0 insertions, 0 deletions