rs6000.c (analyze_swaps commentary): Add discussion of permutes and why we don't handle them.

2014-10-06 Bill Schmidt <wschmidt@linux.vnet.ibm.com> * config/rs6000/rs6000.c (analyze_swaps commentary): Add discussion of permutes and why we don't handle them. From-SVN: r215951
author: Bill Schmidt <wschmidt@linux.vnet.ibm.com> 2014-10-06 15:27:32 +0000
committer: William Schmidt <wschmidt@gcc.gnu.org> 2014-10-06 15:27:32 +0000
commit: cec5d8be5591842084cf656b2ef900ff85089aae (patch)
tree: ffd14bfdb460ec5093d708d3a41bc7b2d65053fd /gcc
parent: 63b9f71bb35359333efbc5a57073abea111eb496 (diff)
download: gcc-cec5d8be5591842084cf656b2ef900ff85089aae.zip
gcc-cec5d8be5591842084cf656b2ef900ff85089aae.tar.gz
gcc-cec5d8be5591842084cf656b2ef900ff85089aae.tar.bz2
2 files changed, 52 insertions, 0 deletions
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6ef2664..13ac914 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-10-06  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
+
+	* config/rs6000/rs6000.c (analyze_swaps commentary): Add
+	discussion of permutes and why we don't handle them.
+
 2014-10-06  Eric Botcazou  <ebotcazou@adacore.com>
 
 	* config/sparc/predicates.md (int_register_operand): Delete.
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 489c65e..8b35a04 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -33431,6 +33431,53 @@ emit_fusion_gpr_load (rtx target, rtx mem)
    than deleting a swap, we convert the load/store into a permuting
    load/store (which effectively removes the swap).  */
 
+/* Notes on Permutes
+
+   We do not currently handle computations that contain permutes.  There
+   is a general transformation that can be performed correctly, but it
+   may introduce more expensive code than it replaces.  To handle these
+   would require a cost model to determine when to perform the optimization.
+   This commentary records how this could be done if desired.
+
+   The most general permute is something like this (example for V16QI):
+
+   (vec_select:V16QI (vec_concat:V32QI (op1:V16QI) (op2:V16QI))
+                     (parallel [(const_int a0) (const_int a1)
+                                 ...
+                                (const_int a14) (const_int a15)]))
+
+   where a0,...,a15 are in [0,31] and select elements from op1 and op2
+   to produce in the result.
+
+   Regardless of mode, we can convert the PARALLEL to a mask of 16
+   byte-element selectors.  Let's call this M, with M[i] representing
+   the ith byte-element selector value.  Then if we swap doublewords
+   throughout the computation, we can get correct behavior by replacing
+   M with M' as follows:
+
+            { M[i+8]+8 : i < 8, M[i+8] in [0,7] U [16,23]
+    M'[i] = { M[i+8]-8 : i < 8, M[i+8] in [8,15] U [24,31]
+            { M[i-8]+8 : i >= 8, M[i-8] in [0,7] U [16,23]
+            { M[i-8]-8 : i >= 8, M[i-8] in [8,15] U [24,31]
+
+   This seems promising at first, since we are just replacing one mask
+   with another.  But certain masks are preferable to others.  If M
+   is a mask that matches a vmrghh pattern, for example, M' certainly
+   will not.  Instead of a single vmrghh, we would generate a load of
+   M' and a vperm.  So we would need to know how many xxswapd's we can
+   remove as a result of this transformation to determine if it's
+   profitable; and preferably the logic would need to be aware of all
+   the special preferable masks.
+
+   Another form of permute is an UNSPEC_VPERM, in which the mask is
+   already in a register.  In some cases, this mask may be a constant
+   that we can discover with ud-chains, in which case the above
+   transformation is ok.  However, the common usage here is for the
+   mask to be produced by an UNSPEC_LVSL, in which case the mask 
+   cannot be known at compile time.  In such a case we would have to
+   generate several instructions to compute M' as above at run time,
+   and a cost model is needed again.  */
+
 /* This is based on the union-find logic in web.c.  web_entry_base is
    defined in df.h.  */
 class swap_web_entry : public web_entry_base
author	Bill Schmidt <wschmidt@linux.vnet.ibm.com>	2014-10-06 15:27:32 +0000
committer	William Schmidt <wschmidt@gcc.gnu.org>	2014-10-06 15:27:32 +0000
commit	cec5d8be5591842084cf656b2ef900ff85089aae (patch)
tree	ffd14bfdb460ec5093d708d3a41bc7b2d65053fd /gcc
parent	63b9f71bb35359333efbc5a57073abea111eb496 (diff)
download	gcc-cec5d8be5591842084cf656b2ef900ff85089aae.zip gcc-cec5d8be5591842084cf656b2ef900ff85089aae.tar.gz gcc-cec5d8be5591842084cf656b2ef900ff85089aae.tar.bz2