Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

This patch would like to add the middle-end presentation for the saturation add. Aka set the result of add to the max when overflow. It will take the pattern similar as below. SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) Take uint8_t as example, we will have: * SAT_ADD (1, 254) => 255. * SAT_ADD (1, 255) => 255. * SAT_ADD (2, 255) => 255. * SAT_ADD (255, 255) => 255. Given below example for the unsigned scalar integer uint64_t: uint64_t sat_add_u64 (uint64_t x, uint64_t y) { return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); } Before this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { long unsigned int _1; _Bool _2; long unsigned int _3; long unsigned int _4; uint64_t _7; long unsigned int _10; __complex__ long unsigned int _11; ;; basic block 2, loop depth 0 ;; pred: ENTRY _11 = .ADD_OVERFLOW (x_5(D), y_6(D)); _1 = REALPART_EXPR <_11>; _10 = IMAGPART_EXPR <_11>; _2 = _10 != 0; _3 = (long unsigned int) _2; _4 = -_3; _7 = _1 | _4; return _7; ;; succ: EXIT } After this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { uint64_t _7; ;; basic block 2, loop depth 0 ;; pred: ENTRY _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call] return _7; ;; succ: EXIT } The below tests are passed for this patch: 1. The riscv fully regression tests. 3. The x86 bootstrap tests. 4. The x86 fully regression tests. PR target/51492 PR target/112600 gcc/ChangeLog: * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD to the return true switch case(s). * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. * match.pd: Add unsigned SAT_ADD match(es). * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern func decl generated in match.pd match. (match_saturation_arith): New func impl to match the saturation arith. (math_opts_dom_walker::after_dom_children): Try match saturation arith when IOR expr. Signed-off-by: Pan Li <pan2.li@intel.com>
author: Pan Li <pan2.li@intel.com> 2024-05-15 10:14:05 +0800
committer: Pan Li <pan2.li@intel.com> 2024-05-16 20:08:16 +0800
commit: 52b0536710ff3f3ace72ab00ce9ef6c630cd1183 (patch)
tree: 0e33ea915143143205a0c52ebd4c5e89b1726b7f /gcc/match.pd
parent: da73261ce7731be7f2b164f1db796878cdc23365 (diff)
download: gcc-52b0536710ff3f3ace72ab00ce9ef6c630cd1183.zip
gcc-52b0536710ff3f3ace72ab00ce9ef6c630cd1183.tar.gz
gcc-52b0536710ff3f3ace72ab00ce9ef6c630cd1183.tar.bz2
1 files changed, 51 insertions, 0 deletions
diff --git a/gcc/match.pd b/gcc/match.pd
index 07e743a..0f9c34f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        || POINTER_TYPE_P (itype))
       && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
 
+/* Unsigned Saturation Add */
+(match (usadd_left_part_1 @0 @1)
+ (plus:c @0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_left_part_2 @0 @1)
+ (realpart (IFN_ADD_OVERFLOW:c @0 @1))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_1 @0 @1)
+ (negate (convert (lt (plus:c @0 @1) @0)))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_1 @0 @1)
+ (negate (convert (gt @0 (plus:c @0 @1))))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_2 @0 @1)
+ (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
+   because the sub part of left_part_2 cannot work with right_part_1.
+   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
+   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
+
+/* Unsigned saturation add, case 1 (branchless):
+   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
+   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
+
+/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
    x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
author	Pan Li <pan2.li@intel.com>	2024-05-15 10:14:05 +0800
committer	Pan Li <pan2.li@intel.com>	2024-05-16 20:08:16 +0800
commit	52b0536710ff3f3ace72ab00ce9ef6c630cd1183 (patch)
tree	0e33ea915143143205a0c52ebd4c5e89b1726b7f /gcc/match.pd
parent	da73261ce7731be7f2b164f1db796878cdc23365 (diff)
download	gcc-52b0536710ff3f3ace72ab00ce9ef6c630cd1183.zip gcc-52b0536710ff3f3ace72ab00ce9ef6c630cd1183.tar.gz gcc-52b0536710ff3f3ace72ab00ce9ef6c630cd1183.tar.bz2