diff options
author | Bill Schmidt <wschmidt@linux.vnet.ibm.com> | 2014-10-06 15:27:32 +0000 |
---|---|---|
committer | William Schmidt <wschmidt@gcc.gnu.org> | 2014-10-06 15:27:32 +0000 |
commit | cec5d8be5591842084cf656b2ef900ff85089aae (patch) | |
tree | ffd14bfdb460ec5093d708d3a41bc7b2d65053fd /gcc | |
parent | 63b9f71bb35359333efbc5a57073abea111eb496 (diff) | |
download | gcc-cec5d8be5591842084cf656b2ef900ff85089aae.zip gcc-cec5d8be5591842084cf656b2ef900ff85089aae.tar.gz gcc-cec5d8be5591842084cf656b2ef900ff85089aae.tar.bz2 |
rs6000.c (analyze_swaps commentary): Add discussion of permutes and why we don't handle them.
2014-10-06 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (analyze_swaps commentary): Add
discussion of permutes and why we don't handle them.
From-SVN: r215951
Diffstat (limited to 'gcc')
-rw-r--r-- | gcc/ChangeLog | 5 | ||||
-rw-r--r-- | gcc/config/rs6000/rs6000.c | 47 |
2 files changed, 52 insertions, 0 deletions
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6ef2664..13ac914 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2014-10-06 Bill Schmidt <wschmidt@linux.vnet.ibm.com> + + * config/rs6000/rs6000.c (analyze_swaps commentary): Add + discussion of permutes and why we don't handle them. + 2014-10-06 Eric Botcazou <ebotcazou@adacore.com> * config/sparc/predicates.md (int_register_operand): Delete. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 489c65e..8b35a04 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -33431,6 +33431,53 @@ emit_fusion_gpr_load (rtx target, rtx mem) than deleting a swap, we convert the load/store into a permuting load/store (which effectively removes the swap). */ +/* Notes on Permutes + + We do not currently handle computations that contain permutes. There + is a general transformation that can be performed correctly, but it + may introduce more expensive code than it replaces. To handle these + would require a cost model to determine when to perform the optimization. + This commentary records how this could be done if desired. + + The most general permute is something like this (example for V16QI): + + (vec_select:V16QI (vec_concat:V32QI (op1:V16QI) (op2:V16QI)) + (parallel [(const_int a0) (const_int a1) + ... + (const_int a14) (const_int a15)])) + + where a0,...,a15 are in [0,31] and select elements from op1 and op2 + to produce in the result. + + Regardless of mode, we can convert the PARALLEL to a mask of 16 + byte-element selectors. Let's call this M, with M[i] representing + the ith byte-element selector value. Then if we swap doublewords + throughout the computation, we can get correct behavior by replacing + M with M' as follows: + + { M[i+8]+8 : i < 8, M[i+8] in [0,7] U [16,23] + M'[i] = { M[i+8]-8 : i < 8, M[i+8] in [8,15] U [24,31] + { M[i-8]+8 : i >= 8, M[i-8] in [0,7] U [16,23] + { M[i-8]-8 : i >= 8, M[i-8] in [8,15] U [24,31] + + This seems promising at first, since we are just replacing one mask + with another. But certain masks are preferable to others. If M + is a mask that matches a vmrghh pattern, for example, M' certainly + will not. Instead of a single vmrghh, we would generate a load of + M' and a vperm. So we would need to know how many xxswapd's we can + remove as a result of this transformation to determine if it's + profitable; and preferably the logic would need to be aware of all + the special preferable masks. + + Another form of permute is an UNSPEC_VPERM, in which the mask is + already in a register. In some cases, this mask may be a constant + that we can discover with ud-chains, in which case the above + transformation is ok. However, the common usage here is for the + mask to be produced by an UNSPEC_LVSL, in which case the mask + cannot be known at compile time. In such a case we would have to + generate several instructions to compute M' as above at run time, + and a cost model is needed again. */ + /* This is based on the union-find logic in web.c. web_entry_base is defined in df.h. */ class swap_web_entry : public web_entry_base |