From 370c2ebe8fa20e0812cd2d533d4ed38ee2d37c85 Mon Sep 17 00:00:00 2001 From: Richard Sandiford Date: Tue, 3 Jul 2018 09:59:37 +0000 Subject: [14/n] PR85694: Rework overwidening detection This patch is the main part of PR85694. The aim is to recognise at least: signed char *a, *b, *c; ... for (int i = 0; i < 2048; i++) c[i] = (a[i] + b[i]) >> 1; as an over-widening pattern, since the addition and shift can be done on shorts rather than ints. However, it ended up being a lot more general than that. The current over-widening pattern detection is limited to a few simple cases: logical ops with immediate second operands, and shifts by a constant. These cases are enough for common pixel-format conversion and can be detected in a peephole way. The loop above requires two generalisations of the current code: support for addition as well as logical ops, and support for non-constant second operands. These are harder to detect in the same peephole way, so the patch tries to take a more global approach. The idea is to get information about the minimum operation width in two ways: (1) by using the range information attached to the SSA_NAMEs (effectively a forward walk, since the range info is context-independent). (2) by back-propagating the number of output bits required by users of the result. As explained in the comments, there's a balance to be struck between narrowing an individual operation and fitting in with the surrounding code. The approach is pretty conservative: if we could narrow an operation to N bits without changing its semantics, it's OK to do that if: - no operations later in the chain require more than N bits; or - all internally-defined inputs are extended from N bits or fewer, and at least one of them is single-use. See the comments for the rationale. I didn't bother adding STMT_VINFO_* wrappers for the new fields since the code seemed more readable without. 2018-06-20 Richard Sandiford gcc/ * poly-int.h (print_hex): New function. * dumpfile.h (dump_dec, dump_hex): Declare. * dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions. * tree-vectorizer.h (_stmt_vec_info): Add min_output_precision, min_input_precision, operation_precision and operation_sign. * tree-vect-patterns.c (vect_get_range_info): New function. (vect_same_loop_or_bb_p, vect_single_imm_use) (vect_operation_fits_smaller_type): Delete. (vect_look_through_possible_promotion): Add an optional single_use_p parameter. (vect_recog_over_widening_pattern): Rewrite to use new stmt_vec_info infomration. Handle one operation at a time. (vect_recog_cast_forwprop_pattern, vect_narrowable_type_p) (vect_truncatable_operation_p, vect_set_operation_type) (vect_set_min_input_precision): New functions. (vect_determine_min_output_precision_1): Likewise. (vect_determine_min_output_precision): Likewise. (vect_determine_precisions_from_range): Likewise. (vect_determine_precisions_from_users): Likewise. (vect_determine_stmt_precisions, vect_determine_precisions): Likewise. (vect_vect_recog_func_ptrs): Put over_widening first. Add cast_forwprop. (vect_pattern_recog): Call vect_determine_precisions. gcc/testsuite/ * gcc.dg/vect/vect-widen-mult-u8-u32.c: Check specifically for a widen_mult pattern. * gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new over-widening messages. * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise. * gcc.dg/vect/vect-over-widen-2.c: Likewise. * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise. * gcc.dg/vect/vect-over-widen-3.c: Likewise. * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise. * gcc.dg/vect/vect-over-widen-4.c: Likewise. * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise. * gcc.dg/vect/bb-slp-over-widen-1.c: New test. * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise. * gcc.dg/vect/vect-over-widen-5.c: Likewise. * gcc.dg/vect/vect-over-widen-6.c: Likewise. * gcc.dg/vect/vect-over-widen-7.c: Likewise. * gcc.dg/vect/vect-over-widen-8.c: Likewise. * gcc.dg/vect/vect-over-widen-9.c: Likewise. * gcc.dg/vect/vect-over-widen-10.c: Likewise. * gcc.dg/vect/vect-over-widen-11.c: Likewise. * gcc.dg/vect/vect-over-widen-12.c: Likewise. * gcc.dg/vect/vect-over-widen-13.c: Likewise. * gcc.dg/vect/vect-over-widen-14.c: Likewise. * gcc.dg/vect/vect-over-widen-15.c: Likewise. * gcc.dg/vect/vect-over-widen-16.c: Likewise. * gcc.dg/vect/vect-over-widen-17.c: Likewise. * gcc.dg/vect/vect-over-widen-18.c: Likewise. * gcc.dg/vect/vect-over-widen-19.c: Likewise. * gcc.dg/vect/vect-over-widen-20.c: Likewise. * gcc.dg/vect/vect-over-widen-21.c: Likewise. From-SVN: r262333 --- gcc/ChangeLog | 26 + gcc/dumpfile.c | 22 + gcc/dumpfile.h | 2 + gcc/poly-int.h | 19 + gcc/testsuite/ChangeLog | 33 + gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c | 66 ++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c | 65 ++ .../gcc.dg/vect/vect-over-widen-1-big-array.c | 6 +- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c | 7 +- gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c | 19 + gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c | 63 ++ gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c | 19 + gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c | 50 + gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c | 18 + gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c | 52 + gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c | 18 + gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c | 46 + gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c | 50 + gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c | 53 + .../gcc.dg/vect/vect-over-widen-2-big-array.c | 9 +- gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c | 9 +- gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c | 53 + gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c | 51 + .../gcc.dg/vect/vect-over-widen-3-big-array.c | 6 +- gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c | 5 +- .../gcc.dg/vect/vect-over-widen-4-big-array.c | 6 +- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c | 7 +- gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c | 51 + gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c | 16 + gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c | 53 + gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c | 19 + gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c | 58 ++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c | 2 +- gcc/tree-vect-patterns.c | 1040 +++++++++++++------- gcc/tree-vectorizer.h | 15 + 35 files changed, 1672 insertions(+), 362 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c (limited to 'gcc') diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 1526dd5..ab97a84 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,31 @@ 2018-07-03 Richard Sandiford + * poly-int.h (print_hex): New function. + * dumpfile.h (dump_dec, dump_hex): Declare. + * dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions. + * tree-vectorizer.h (_stmt_vec_info): Add min_output_precision, + min_input_precision, operation_precision and operation_sign. + * tree-vect-patterns.c (vect_get_range_info): New function. + (vect_same_loop_or_bb_p, vect_single_imm_use) + (vect_operation_fits_smaller_type): Delete. + (vect_look_through_possible_promotion): Add an optional + single_use_p parameter. + (vect_recog_over_widening_pattern): Rewrite to use new + stmt_vec_info infomration. Handle one operation at a time. + (vect_recog_cast_forwprop_pattern, vect_narrowable_type_p) + (vect_truncatable_operation_p, vect_set_operation_type) + (vect_set_min_input_precision): New functions. + (vect_determine_min_output_precision_1): Likewise. + (vect_determine_min_output_precision): Likewise. + (vect_determine_precisions_from_range): Likewise. + (vect_determine_precisions_from_users): Likewise. + (vect_determine_stmt_precisions, vect_determine_precisions): Likewise. + (vect_vect_recog_func_ptrs): Put over_widening first. + Add cast_forwprop. + (vect_pattern_recog): Call vect_determine_precisions. + +2018-07-03 Richard Sandiford + * tree-vect-patterns.c (vect_mark_pattern_stmts): Remove pattern statements that have been replaced by further pattern statements. (vect_pattern_recog_1): Clear STMT_VINFO_PATTERN_DEF_SEQ on failure. diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c index 3296299..7ed1796 100644 --- a/gcc/dumpfile.c +++ b/gcc/dumpfile.c @@ -633,6 +633,28 @@ template void dump_dec (dump_flags_t, const poly_uint64 &); template void dump_dec (dump_flags_t, const poly_offset_int &); template void dump_dec (dump_flags_t, const poly_widest_int &); +void +dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn) +{ + if (dump_file && (dump_kind & pflags)) + print_dec (value, dump_file, sgn); + + if (alt_dump_file && (dump_kind & alt_flags)) + print_dec (value, alt_dump_file, sgn); +} + +/* Output VALUE in hexadecimal to appropriate dump streams. */ + +void +dump_hex (dump_flags_t dump_kind, const poly_wide_int &value) +{ + if (dump_file && (dump_kind & pflags)) + print_hex (value, dump_file); + + if (alt_dump_file && (dump_kind & alt_flags)) + print_hex (value, alt_dump_file); +} + /* The current dump scope-nesting depth. */ static int dump_scope_depth; diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h index a417241..4a71ef7 100644 --- a/gcc/dumpfile.h +++ b/gcc/dumpfile.h @@ -439,6 +439,8 @@ extern bool enable_rtl_dump_file (void); template void dump_dec (dump_flags_t, const poly_int &); +extern void dump_dec (dump_flags_t, const poly_wide_int &, signop); +extern void dump_hex (dump_flags_t, const poly_wide_int &); /* In tree-dump.c */ extern void dump_node (const_tree, dump_flags_t, FILE *); diff --git a/gcc/poly-int.h b/gcc/poly-int.h index d6e4dee..b3b61e2 100644 --- a/gcc/poly-int.h +++ b/gcc/poly-int.h @@ -2420,6 +2420,25 @@ print_dec (const poly_int_pod &value, FILE *file) poly_coeff_traits::signedness ? SIGNED : UNSIGNED); } +/* Use print_hex to print VALUE to FILE. */ + +template +void +print_hex (const poly_int_pod &value, FILE *file) +{ + if (value.is_constant ()) + print_hex (value.coeffs[0], file); + else + { + fprintf (file, "["); + for (unsigned int i = 0; i < N; ++i) + { + print_hex (value.coeffs[i], file); + fputc (i == N - 1 ? ']' : ',', file); + } + } +} + /* Helper for calculating the distance between two points P1 and P2, in cases where known_le (P1, P2). T1 and T2 are the types of the two positions, in either order. The coefficients of P2 - P1 have diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 38e85e4..0028d4f 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,38 @@ 2018-07-03 Richard Sandiford + * gcc.dg/vect/vect-widen-mult-u8-u32.c: Check specifically for a + widen_mult pattern. + * gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new + over-widening messages. + * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise. + * gcc.dg/vect/vect-over-widen-2.c: Likewise. + * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise. + * gcc.dg/vect/vect-over-widen-3.c: Likewise. + * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise. + * gcc.dg/vect/vect-over-widen-4.c: Likewise. + * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise. + * gcc.dg/vect/bb-slp-over-widen-1.c: New test. + * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise. + * gcc.dg/vect/vect-over-widen-5.c: Likewise. + * gcc.dg/vect/vect-over-widen-6.c: Likewise. + * gcc.dg/vect/vect-over-widen-7.c: Likewise. + * gcc.dg/vect/vect-over-widen-8.c: Likewise. + * gcc.dg/vect/vect-over-widen-9.c: Likewise. + * gcc.dg/vect/vect-over-widen-10.c: Likewise. + * gcc.dg/vect/vect-over-widen-11.c: Likewise. + * gcc.dg/vect/vect-over-widen-12.c: Likewise. + * gcc.dg/vect/vect-over-widen-13.c: Likewise. + * gcc.dg/vect/vect-over-widen-14.c: Likewise. + * gcc.dg/vect/vect-over-widen-15.c: Likewise. + * gcc.dg/vect/vect-over-widen-16.c: Likewise. + * gcc.dg/vect/vect-over-widen-17.c: Likewise. + * gcc.dg/vect/vect-over-widen-18.c: Likewise. + * gcc.dg/vect/vect-over-widen-19.c: Likewise. + * gcc.dg/vect/vect-over-widen-20.c: Likewise. + * gcc.dg/vect/vect-over-widen-21.c: Likewise. + +2018-07-03 Richard Sandiford + * gcc.dg/vect/vect-mixed-size-cond-1.c: New test. 2018-07-02 Jim Wilson diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c new file mode 100644 index 0000000..60e7b79 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c @@ -0,0 +1,66 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +/* Deliberate use of signed >>. */ +#define DEF_LOOP(SIGNEDNESS) \ + void __attribute__ ((noipa)) \ + f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \ + SIGNEDNESS char *restrict b, \ + SIGNEDNESS char *restrict c) \ + { \ + a[0] = (b[0] + c[0]) >> 1; \ + a[1] = (b[1] + c[1]) >> 1; \ + a[2] = (b[2] + c[2]) >> 1; \ + a[3] = (b[3] + c[3]) >> 1; \ + a[4] = (b[4] + c[4]) >> 1; \ + a[5] = (b[5] + c[5]) >> 1; \ + a[6] = (b[6] + c[6]) >> 1; \ + a[7] = (b[7] + c[7]) >> 1; \ + a[8] = (b[8] + c[8]) >> 1; \ + a[9] = (b[9] + c[9]) >> 1; \ + a[10] = (b[10] + c[10]) >> 1; \ + a[11] = (b[11] + c[11]) >> 1; \ + a[12] = (b[12] + c[12]) >> 1; \ + a[13] = (b[13] + c[13]) >> 1; \ + a[14] = (b[14] + c[14]) >> 1; \ + a[15] = (b[15] + c[15]) >> 1; \ + } + +DEF_LOOP (signed) +DEF_LOOP (unsigned) + +#define N 16 + +#define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C) \ + { \ + SIGNEDNESS char a[N], b[N], c[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + b[i] = BASE_B + i * 15; \ + c[i] = BASE_C + i * 14; \ + asm volatile ("" ::: "memory"); \ + } \ + f_##SIGNEDNESS (a, b, c); \ + for (int i = 0; i < N; ++i) \ + if (a[i] != (BASE_B + BASE_C + i * 29) >> 1) \ + __builtin_abort (); \ + } + +int +main (void) +{ + check_vect (); + + TEST_LOOP (signed, -128, -120); + TEST_LOOP (unsigned, 4, 10); + + return 0; +} + +/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */ +/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */ +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c new file mode 100644 index 0000000..b26317c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c @@ -0,0 +1,65 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +/* Deliberate use of signed >>. */ +#define DEF_LOOP(SIGNEDNESS) \ + void __attribute__ ((noipa)) \ + f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \ + SIGNEDNESS char *restrict b, \ + SIGNEDNESS char c) \ + { \ + a[0] = (b[0] + c) >> 1; \ + a[1] = (b[1] + c) >> 1; \ + a[2] = (b[2] + c) >> 1; \ + a[3] = (b[3] + c) >> 1; \ + a[4] = (b[4] + c) >> 1; \ + a[5] = (b[5] + c) >> 1; \ + a[6] = (b[6] + c) >> 1; \ + a[7] = (b[7] + c) >> 1; \ + a[8] = (b[8] + c) >> 1; \ + a[9] = (b[9] + c) >> 1; \ + a[10] = (b[10] + c) >> 1; \ + a[11] = (b[11] + c) >> 1; \ + a[12] = (b[12] + c) >> 1; \ + a[13] = (b[13] + c) >> 1; \ + a[14] = (b[14] + c) >> 1; \ + a[15] = (b[15] + c) >> 1; \ + } + +DEF_LOOP (signed) +DEF_LOOP (unsigned) + +#define N 16 + +#define TEST_LOOP(SIGNEDNESS, BASE_B, C) \ + { \ + SIGNEDNESS char a[N], b[N], c[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + b[i] = BASE_B + i * 15; \ + asm volatile ("" ::: "memory"); \ + } \ + f_##SIGNEDNESS (a, b, C); \ + for (int i = 0; i < N; ++i) \ + if (a[i] != (BASE_B + C + i * 15) >> 1) \ + __builtin_abort (); \ + } + +int +main (void) +{ + check_vect (); + + TEST_LOOP (signed, -128, -120); + TEST_LOOP (unsigned, 4, 250); + + return 0; +} + +/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */ +/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */ +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c index b701b7b..9e5f464 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c @@ -58,7 +58,9 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c index 3140829..c2d0797 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c @@ -62,8 +62,9 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c new file mode 100644 index 0000000..394a5a1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-10.c @@ -0,0 +1,19 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#ifndef SIGNEDNESS +#define SIGNEDNESS unsigned +#define BASE_B 4 +#define BASE_C 40 +#endif + +#include "vect-over-widen-9.c" + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c new file mode 100644 index 0000000..97ab57f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c @@ -0,0 +1,63 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS +#define SIGNEDNESS signed +#define BASE_B -128 +#define BASE_C -100 +#endif + +#define N 50 + +/* Both range analysis and backward propagation from the truncation show + that these calculations can be done in SIGNEDNESS short, with "res" + being extended for the store to d[i]. */ +void __attribute__ ((noipa)) +f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b, + SIGNEDNESS char *restrict c, int *restrict d) +{ + for (int i = 0; i < N; ++i) + { + /* Deliberate use of signed >>. */ + int res = b[i] + c[i]; + a[i] = (res + (res >> 1)) >> 2; + d[i] = res; + } +} + +int +main (void) +{ + check_vect (); + + SIGNEDNESS char a[N], b[N], c[N]; + int d[N]; + for (int i = 0; i < N; ++i) + { + b[i] = BASE_B + i * 5; + c[i] = BASE_C + i * 4; + asm volatile ("" ::: "memory"); + } + f (a, b, c, d); + for (int i = 0; i < N; ++i) + { + int res = BASE_B + BASE_C + i * 9; + if (a[i] != ((res + (res >> 1)) >> 2)) + __builtin_abort (); + if (d[i] != res) + __builtin_abort (); + } + + return 0; +} + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */ +/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c new file mode 100644 index 0000000..0d5473e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-12.c @@ -0,0 +1,19 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#ifndef SIGNEDNESS +#define SIGNEDNESS unsigned +#define BASE_B 4 +#define BASE_C 40 +#endif + +#include "vect-over-widen-11.c" + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */ +/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c new file mode 100644 index 0000000..b89ed8b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c @@ -0,0 +1,50 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS +#define SIGNEDNESS signed +#define BASE_B -128 +#define BASE_C -120 +#endif + +#define N 50 + +/* We rely on range analysis to show that these calculations can be done + in SIGNEDNESS short. */ +void __attribute__ ((noipa)) +f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b, + SIGNEDNESS char *restrict c) +{ + for (int i = 0; i < N; ++i) + a[i] = (b[i] + c[i]) / 2; +} + +int +main (void) +{ + check_vect (); + + SIGNEDNESS char a[N], b[N], c[N]; + for (int i = 0; i < N; ++i) + { + b[i] = BASE_B + i * 5; + c[i] = BASE_C + i * 4; + asm volatile ("" ::: "memory"); + } + f (a, b, c); + for (int i = 0; i < N; ++i) + if (a[i] != (BASE_B + BASE_C + i * 9) / 2) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(signed char\)} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c new file mode 100644 index 0000000..7b5ba23 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-14.c @@ -0,0 +1,18 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#ifndef SIGNEDNESS +#define SIGNEDNESS unsigned +#define BASE_B 4 +#define BASE_C 40 +#endif + +#include "vect-over-widen-13.c" + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(unsigned char\)} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c new file mode 100644 index 0000000..e898e87 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c @@ -0,0 +1,52 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS +#define SIGNEDNESS signed +#define BASE_B -128 +#define BASE_C -120 +#endif + +#define N 50 + +/* We rely on range analysis to show that these calculations can be done + in SIGNEDNESS short, with the result being extended to int for the + store. */ +void __attribute__ ((noipa)) +f (int *restrict a, SIGNEDNESS char *restrict b, + SIGNEDNESS char *restrict c) +{ + for (int i = 0; i < N; ++i) + a[i] = (b[i] + c[i]) / 2; +} + +int +main (void) +{ + check_vect (); + + int a[N]; + SIGNEDNESS char b[N], c[N]; + for (int i = 0; i < N; ++i) + { + b[i] = BASE_B + i * 5; + c[i] = BASE_C + i * 4; + asm volatile ("" ::: "memory"); + } + f (a, b, c); + for (int i = 0; i < N; ++i) + if (a[i] != (BASE_B + BASE_C + i * 9) / 2) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */ +/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c new file mode 100644 index 0000000..0429345 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-16.c @@ -0,0 +1,18 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#ifndef SIGNEDNESS +#define SIGNEDNESS unsigned +#define BASE_B 4 +#define BASE_C 40 +#endif + +#include "vect-over-widen-15.c" + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */ +/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c new file mode 100644 index 0000000..0448260 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c @@ -0,0 +1,46 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#define N 1024 + +/* This should not be treated as an over-widening pattern, even though + "(b[i] & 0xef) | 0x80)" could be done in unsigned chars. */ + +void __attribute__ ((noipa)) +f (unsigned short *restrict a, unsigned short *restrict b) +{ + for (__INTPTR_TYPE__ i = 0; i < N; ++i) + { + unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4); + a[i] = foo; + } +} + +int +main (void) +{ + check_vect (); + + unsigned short a[N], b[N]; + for (int i = 0; i < N; ++i) + { + a[i] = i; + b[i] = i * 3; + asm volatile ("" ::: "memory"); + } + f (a, b); + for (int i = 0; i < N; ++i) + if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4))) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c new file mode 100644 index 0000000..ecb74d7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c @@ -0,0 +1,50 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#define N 1024 + +/* This should be treated as an over-widening pattern: we can truncate + b to unsigned char after loading it and do all the computation in + unsigned char. */ + +void __attribute__ ((noipa)) +f (unsigned char *restrict a, unsigned short *restrict b) +{ + for (__INTPTR_TYPE__ i = 0; i < N; ++i) + { + unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4); + a[i] = foo; + } +} + +int +main (void) +{ + check_vect (); + + unsigned char a[N]; + unsigned short b[N]; + for (int i = 0; i < N; ++i) + { + a[i] = i; + b[i] = i * 3; + asm volatile ("" ::: "memory"); + } + f (a, b); + for (int i = 0; i < N; ++i) + if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4))) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* &} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* |} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* <<} "vect" } } */ +/* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c new file mode 100644 index 0000000..11546fe --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c @@ -0,0 +1,53 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#define N 111 + +/* This shouldn't be treated as an over-widening operation: it's better + to reuse the extensions of di and ei for di + ei than to add them + as shorts and introduce a third extension. */ + +void __attribute__ ((noipa)) +f (unsigned int *restrict a, unsigned int *restrict b, + unsigned int *restrict c, unsigned char *restrict d, + unsigned char *restrict e) +{ + for (__INTPTR_TYPE__ i = 0; i < N; ++i) + { + unsigned int di = d[i]; + unsigned int ei = e[i]; + a[i] = di; + b[i] = ei; + c[i] = di + ei; + } +} + +int +main (void) +{ + check_vect (); + + unsigned int a[N], b[N], c[N]; + unsigned char d[N], e[N]; + for (int i = 0; i < N; ++i) + { + d[i] = i * 2 + 3; + e[i] = i + 100; + asm volatile ("" ::: "memory"); + } + f (a, b, c, d, e); + for (int i = 0; i < N; ++i) + if (a[i] != i * 2 + 3 + || b[i] != i + 100 + || c[i] != i * 3 + 103) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c index 651ef7c..82aec9f 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c @@ -57,7 +57,12 @@ int main (void) return 0; } -/* Final value stays in int, so no over-widening is detected at the moment. */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */ +/* This is an over-widening even though the final result is still an int. + It's better to do one vector of ops on chars and then widen than to + widen and then do 4 vectors of ops on ints. */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c index eb9683e..0bcbd4f 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c @@ -57,7 +57,12 @@ int main (void) return 0; } -/* Final value stays in int, so no over-widening is detected at the moment. */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */ +/* This is an over-widening even though the final result is still an int. + It's better to do one vector of ops on chars and then widen than to + widen and then do 4 vectors of ops on ints. */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c new file mode 100644 index 0000000..47f970d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c @@ -0,0 +1,53 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#define N 111 + +/* This shouldn't be treated as an over-widening operation: it's better + to reuse the extensions of di and ei for di + ei than to add them + as shorts and introduce a third extension. */ + +void __attribute__ ((noipa)) +f (unsigned int *restrict a, unsigned int *restrict b, + unsigned int *restrict c, unsigned char *restrict d, + unsigned char *restrict e) +{ + for (__INTPTR_TYPE__ i = 0; i < N; ++i) + { + int di = d[i]; + int ei = e[i]; + a[i] = di; + b[i] = ei; + c[i] = di + ei; + } +} + +int +main (void) +{ + check_vect (); + + unsigned int a[N], b[N], c[N]; + unsigned char d[N], e[N]; + for (int i = 0; i < N; ++i) + { + d[i] = i * 2 + 3; + e[i] = i + 100; + asm volatile ("" ::: "memory"); + } + f (a, b, c, d, e); + for (int i = 0; i < N; ++i) + if (a[i] != i * 2 + 3 + || b[i] != i + 100 + || c[i] != i * 3 + 103) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c new file mode 100644 index 0000000..6e13f26 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c @@ -0,0 +1,51 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#define N 111 + +/* This shouldn't be treated as an over-widening operation: it's better + to reuse the extensions of di and ei for di + ei than to add them + as shorts and introduce a third extension. */ + +void __attribute__ ((noipa)) +f (unsigned int *restrict a, unsigned int *restrict b, + unsigned int *restrict c, unsigned char *restrict d, + unsigned char *restrict e) +{ + for (__INTPTR_TYPE__ i = 0; i < N; ++i) + { + a[i] = d[i]; + b[i] = e[i]; + c[i] = d[i] + e[i]; + } +} + +int +main (void) +{ + check_vect (); + + unsigned int a[N], b[N], c[N]; + unsigned char d[N], e[N]; + for (int i = 0; i < N; ++i) + { + d[i] = i * 2 + 3; + e[i] = i + 100; + asm volatile ("" ::: "memory"); + } + f (a, b, c, d, e); + for (int i = 0; i < N; ++i) + if (a[i] != i * 2 + 3 + || b[i] != i + 100 + || c[i] != i * 3 + 103) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c index e419f20..37da7c9 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c @@ -59,7 +59,9 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c index 6ca6be7..4138480 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c @@ -57,6 +57,9 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c index 4ce532b..514337c 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c @@ -62,7 +62,9 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c index 9dd1ea5..3d536d5 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c @@ -66,8 +66,9 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */ -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c new file mode 100644 index 0000000..56d2396 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c @@ -0,0 +1,51 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS +#define SIGNEDNESS signed +#define BASE_B -128 +#define BASE_C -100 +#endif + +#define N 50 + +/* Both range analysis and backward propagation from the truncation show + that these calculations can be done in SIGNEDNESS short. */ +void __attribute__ ((noipa)) +f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b, + SIGNEDNESS char *restrict c) +{ + /* Deliberate use of signed >>. */ + for (int i = 0; i < N; ++i) + a[i] = (b[i] + c[i]) >> 1; +} + +int +main (void) +{ + check_vect (); + + SIGNEDNESS char a[N], b[N], c[N]; + for (int i = 0; i < N; ++i) + { + b[i] = BASE_B + i * 5; + c[i] = BASE_C + i * 4; + asm volatile ("" ::: "memory"); + } + f (a, b, c); + for (int i = 0; i < N; ++i) + if (a[i] != (BASE_B + BASE_C + i * 9) >> 1) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c new file mode 100644 index 0000000..9fe0e05 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-6.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#define SIGNEDNESS unsigned +#define BASE_B 4 +#define BASE_C 40 + +#include "vect-over-widen-5.c" + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c new file mode 100644 index 0000000..a8166b3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c @@ -0,0 +1,53 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS +#define SIGNEDNESS signed +#define BASE_B -128 +#define BASE_C -100 +#define D -120 +#endif + +#define N 50 + +/* Both range analysis and backward propagation from the truncation show + that these calculations can be done in SIGNEDNESS short. */ +void __attribute__ ((noipa)) +f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b, + SIGNEDNESS char *restrict c, SIGNEDNESS char d) +{ + int promoted_d = d; + for (int i = 0; i < N; ++i) + /* Deliberate use of signed >>. */ + a[i] = (b[i] + c[i] + promoted_d) >> 2; +} + +int +main (void) +{ + check_vect (); + + SIGNEDNESS char a[N], b[N], c[N]; + for (int i = 0; i < N; ++i) + { + b[i] = BASE_B + i * 5; + c[i] = BASE_C + i * 4; + asm volatile ("" ::: "memory"); + } + f (a, b, c, D); + for (int i = 0; i < N; ++i) + if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c new file mode 100644 index 0000000..238f577 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-8.c @@ -0,0 +1,19 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#ifndef SIGNEDNESS +#define SIGNEDNESS unsigned +#define BASE_B 4 +#define BASE_C 40 +#define D 251 +#endif + +#include "vect-over-widen-7.c" + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c new file mode 100644 index 0000000..a50f819 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c @@ -0,0 +1,58 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_pack_trunc } */ +/* { dg-require-effective-target vect_unpack } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS +#define SIGNEDNESS signed +#define BASE_B -128 +#define BASE_C -100 +#endif + +#define N 50 + +/* Both range analysis and backward propagation from the truncation show + that these calculations can be done in SIGNEDNESS short. */ +void __attribute__ ((noipa)) +f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b, + SIGNEDNESS char *restrict c) +{ + for (int i = 0; i < N; ++i) + { + /* Deliberate use of signed >>. */ + int res = b[i] + c[i]; + a[i] = (res + (res >> 1)) >> 2; + } +} + +int +main (void) +{ + check_vect (); + + SIGNEDNESS char a[N], b[N], c[N]; + for (int i = 0; i < N; ++i) + { + b[i] = BASE_B + i * 5; + c[i] = BASE_C + i * 4; + asm volatile ("" ::: "memory"); + } + f (a, b, c); + for (int i = 0; i < N; ++i) + { + int res = BASE_B + BASE_C + i * 9; + if (a[i] != ((res + (res >> 1)) >> 2)) + __builtin_abort (); + } + + return 0; +} + +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */ +/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */ +/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c index 4d20f16..f38859a 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c @@ -43,5 +43,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi || vect_unpack } } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */ -/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */ +/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */ diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 4ffec66..91076f4 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -47,6 +47,40 @@ along with GCC; see the file COPYING3. If not see #include "omp-simd-clone.h" #include "predict.h" +/* Return true if we have a useful VR_RANGE range for VAR, storing it + in *MIN_VALUE and *MAX_VALUE if so. Note the range in the dump files. */ + +static bool +vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value) +{ + value_range_type vr_type = get_range_info (var, min_value, max_value); + wide_int nonzero = get_nonzero_bits (var); + signop sgn = TYPE_SIGN (TREE_TYPE (var)); + if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value, + nonzero, sgn) == VR_RANGE) + { + if (dump_enabled_p ()) + { + dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var); + dump_printf (MSG_NOTE, " has range ["); + dump_hex (MSG_NOTE, *min_value); + dump_printf (MSG_NOTE, ", "); + dump_hex (MSG_NOTE, *max_value); + dump_printf (MSG_NOTE, "]\n"); + } + return true; + } + else + { + if (dump_enabled_p ()) + { + dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var); + dump_printf (MSG_NOTE, " has no range info\n"); + } + return false; + } +} + /* Report that we've found an instance of pattern PATTERN in statement STMT. */ @@ -190,40 +224,6 @@ vect_supportable_direct_optab_p (tree otype, tree_code code, return true; } -/* Check whether STMT2 is in the same loop or basic block as STMT1. - Which of the two applies depends on whether we're currently doing - loop-based or basic-block-based vectorization, as determined by - the vinfo_for_stmt for STMT1 (which must be defined). - - If this returns true, vinfo_for_stmt for STMT2 is guaranteed - to be defined as well. */ - -static bool -vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) -{ - stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); - return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2); -} - -/* If the LHS of DEF_STMT has a single use, and that statement is - in the same loop or basic block, return it. */ - -static gimple * -vect_single_imm_use (gimple *def_stmt) -{ - tree lhs = gimple_assign_lhs (def_stmt); - use_operand_p use_p; - gimple *use_stmt; - - if (!single_imm_use (lhs, &use_p, &use_stmt)) - return NULL; - - if (!vect_same_loop_or_bb_p (def_stmt, use_stmt)) - return NULL; - - return use_stmt; -} - /* Round bit precision PRECISION up to a full element. */ static unsigned int @@ -347,7 +347,9 @@ vect_unpromoted_value::set_op (tree op_in, vect_def_type dt_in, is possible to convert OP' back to OP using a possible sign change followed by a possible promotion P. Return this OP', or null if OP is not a vectorizable SSA name. If there is a promotion P, describe its - input in UNPROM, otherwise describe OP' in UNPROM. + input in UNPROM, otherwise describe OP' in UNPROM. If SINGLE_USE_P + is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved + have more than one user. A successful return means that it is possible to go from OP' to OP via UNPROM. The cast from OP' to UNPROM is at most a sign change, @@ -374,7 +376,8 @@ vect_unpromoted_value::set_op (tree op_in, vect_def_type dt_in, static tree vect_look_through_possible_promotion (vec_info *vinfo, tree op, - vect_unpromoted_value *unprom) + vect_unpromoted_value *unprom, + bool *single_use_p = NULL) { tree res = NULL_TREE; tree op_type = TREE_TYPE (op); @@ -420,7 +423,14 @@ vect_look_through_possible_promotion (vec_info *vinfo, tree op, if (!def_stmt) break; if (dt == vect_internal_def) - caster = vinfo_for_stmt (def_stmt); + { + caster = vinfo_for_stmt (def_stmt); + /* Ignore pattern statements, since we don't link uses for them. */ + if (single_use_p + && !STMT_VINFO_RELATED_STMT (caster) + && !has_single_use (res)) + *single_use_p = false; + } else caster = NULL; gassign *assign = dyn_cast (def_stmt); @@ -1371,363 +1381,318 @@ vect_recog_widen_sum_pattern (vec *stmts, tree *type_out) return pattern_stmt; } +/* Recognize cases in which an operation is performed in one type WTYPE + but could be done more efficiently in a narrower type NTYPE. For example, + if we have: -/* Return TRUE if the operation in STMT can be performed on a smaller type. + ATYPE a; // narrower than NTYPE + BTYPE b; // narrower than NTYPE + WTYPE aw = (WTYPE) a; + WTYPE bw = (WTYPE) b; + WTYPE res = aw + bw; // only uses of aw and bw - Input: - STMT - a statement to check. - DEF - we support operations with two operands, one of which is constant. - The other operand can be defined by a demotion operation, or by a - previous statement in a sequence of over-promoted operations. In the - later case DEF is used to replace that operand. (It is defined by a - pattern statement we created for the previous statement in the - sequence). - - Input/output: - NEW_TYPE - Output: a smaller type that we are trying to use. Input: if not - NULL, it's the type of DEF. - STMTS - additional pattern statements. If a pattern statement (type - conversion) is created in this function, its original statement is - added to STMTS. + then it would be more efficient to do: - Output: - OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new - operands to use in the new pattern statement for STMT (will be created - in vect_recog_over_widening_pattern ()). - NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern - statements for STMT: the first one is a type promotion and the second - one is the operation itself. We return the type promotion statement - in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of - the second pattern statement. */ + NTYPE an = (NTYPE) a; + NTYPE bn = (NTYPE) b; + NTYPE resn = an + bn; + WTYPE res = (WTYPE) resn; -static bool -vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type, - tree *op0, tree *op1, gimple **new_def_stmt, - vec *stmts) -{ - enum tree_code code; - tree const_oprnd, oprnd; - tree interm_type = NULL_TREE, half_type, new_oprnd, type; - gimple *def_stmt, *new_stmt; - bool first = false; - bool promotion; + Other situations include things like: - *op0 = NULL_TREE; - *op1 = NULL_TREE; - *new_def_stmt = NULL; + ATYPE a; // NTYPE or narrower + WTYPE aw = (WTYPE) a; + WTYPE res = aw + b; - if (!is_gimple_assign (stmt)) - return false; + when only "(NTYPE) res" is significant. In that case it's more efficient + to truncate "b" and do the operation on NTYPE instead: - code = gimple_assign_rhs_code (stmt); - if (code != LSHIFT_EXPR && code != RSHIFT_EXPR - && code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR) - return false; + NTYPE an = (NTYPE) a; + NTYPE bn = (NTYPE) b; // truncation + NTYPE resn = an + bn; + WTYPE res = (WTYPE) resn; - oprnd = gimple_assign_rhs1 (stmt); - const_oprnd = gimple_assign_rhs2 (stmt); - type = gimple_expr_type (stmt); + All users of "res" should then use "resn" instead, making the final + statement dead (not marked as relevant). The final statement is still + needed to maintain the type correctness of the IR. - if (TREE_CODE (oprnd) != SSA_NAME - || TREE_CODE (const_oprnd) != INTEGER_CST) - return false; + vect_determine_precisions has already determined the minimum + precison of the operation and the minimum precision required + by users of the result. */ - /* If oprnd has other uses besides that in stmt we cannot mark it - as being part of a pattern only. */ - if (!has_single_use (oprnd)) - return false; +static gimple * +vect_recog_over_widening_pattern (vec *stmts, tree *type_out) +{ + gassign *last_stmt = dyn_cast (stmts->pop ()); + if (!last_stmt) + return NULL; - /* If we are in the middle of a sequence, we use DEF from a previous - statement. Otherwise, OPRND has to be a result of type promotion. */ - if (*new_type) - { - half_type = *new_type; - oprnd = def; - } - else - { - first = true; - if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt, - &promotion) - || !promotion - || !vect_same_loop_or_bb_p (stmt, def_stmt)) - return false; - } + /* See whether we have found that this operation can be done on a + narrower type without changing its semantics. */ + stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt); + unsigned int new_precision = last_stmt_info->operation_precision; + if (!new_precision) + return NULL; - /* Can we perform the operation on a smaller type? */ - switch (code) + vec_info *vinfo = last_stmt_info->vinfo; + tree lhs = gimple_assign_lhs (last_stmt); + tree type = TREE_TYPE (lhs); + tree_code code = gimple_assign_rhs_code (last_stmt); + + /* Keep the first operand of a COND_EXPR as-is: only the other two + operands are interesting. */ + unsigned int first_op = (code == COND_EXPR ? 2 : 1); + + /* Check the operands. */ + unsigned int nops = gimple_num_ops (last_stmt) - first_op; + auto_vec unprom (nops); + unprom.quick_grow (nops); + unsigned int min_precision = 0; + bool single_use_p = false; + for (unsigned int i = 0; i < nops; ++i) { - case BIT_IOR_EXPR: - case BIT_XOR_EXPR: - case BIT_AND_EXPR: - if (!int_fits_type_p (const_oprnd, half_type)) - { - /* HALF_TYPE is not enough. Try a bigger type if possible. */ - if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4)) - return false; + tree op = gimple_op (last_stmt, first_op + i); + if (TREE_CODE (op) == INTEGER_CST) + unprom[i].set_op (op, vect_constant_def); + else if (TREE_CODE (op) == SSA_NAME) + { + bool op_single_use_p = true; + if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i], + &op_single_use_p)) + return NULL; + /* If: - interm_type = build_nonstandard_integer_type ( - TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type)); - if (!int_fits_type_p (const_oprnd, interm_type)) - return false; - } + (1) N bits of the result are needed; + (2) all inputs are widened from MN widening from OP to the output + without changing the number or type of extensions involved. + This then reduces the number of copies of STMT_INFO. - case LSHIFT_EXPR: - /* Try intermediate type - HALF_TYPE is not enough for sure. */ - if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4)) - return false; + If instead of (3) more than one operand is a single-use SSA name, + shifting the extension to the output is even more of a win. - /* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size. - (e.g., if the original value was char, the shift amount is at most 8 - if we want to use short). */ - if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1) - return false; + If instead: - interm_type = build_nonstandard_integer_type ( - TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type)); + (1) N bits of the result are needed; + (2) one operand OP2 is widened from M2N widening of the inputs to the output. + (a) additionally shifts the M1->M2 widening to the output; + it requires fewer copies of STMT_INFO but requires an extra + M2->M1 truncation. - interm_type = build_nonstandard_integer_type ( - TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type)); + Which is better will depend on the complexity and cost of + STMT_INFO, which is hard to predict at this stage. However, + a clear tie-breaker in favor of (b) is the fact that the + truncation in (a) increases the length of the operation chain. - if (!vect_supportable_shift (code, interm_type)) - return false; + If instead of (4) only one of OP1 or OP2 is single-use, + (b) is still a win over doing the operation in N bits: + it still shifts the M2->N widening on the single-use operand + to the output and reduces the number of STMT_INFO copies. - break; + If neither operand is single-use then operating on fewer than + N bits might lead to more extensions overall. Whether it does + or not depends on global information about the vectorization + region, and whether that's a good trade-off would again + depend on the complexity and cost of the statements involved, + as well as things like register pressure that are not normally + modelled at this stage. We therefore ignore these cases + and just optimize the clear single-use wins above. - default: - gcc_unreachable (); + Thus we take the maximum precision of the unpromoted operands + and record whether any operand is single-use. */ + if (unprom[i].dt == vect_internal_def) + { + min_precision = MAX (min_precision, + TYPE_PRECISION (unprom[i].type)); + single_use_p |= op_single_use_p; + } + } } - /* There are four possible cases: - 1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's - the first statement in the sequence) - a. The original, HALF_TYPE, is not enough - we replace the promotion - from HALF_TYPE to TYPE with a promotion to INTERM_TYPE. - b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original - promotion. - 2. OPRND is defined by a pattern statement we created. - a. Its type is not sufficient for the operation, we create a new stmt: - a type conversion for OPRND from HALF_TYPE to INTERM_TYPE. We store - this statement in NEW_DEF_STMT, and it is later put in - STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT. - b. OPRND is good to use in the new statement. */ - if (first) - { - if (interm_type) - { - /* Replace the original type conversion HALF_TYPE->TYPE with - HALF_TYPE->INTERM_TYPE. */ - if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt))) - { - new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)); - /* Check if the already created pattern stmt is what we need. */ - if (!is_gimple_assign (new_stmt) - || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt)) - || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type) - return false; - - stmts->safe_push (def_stmt); - oprnd = gimple_assign_lhs (new_stmt); - } - else - { - /* Create NEW_OPRND = (INTERM_TYPE) OPRND. */ - oprnd = gimple_assign_rhs1 (def_stmt); - new_oprnd = make_ssa_name (interm_type); - new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd); - STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt; - stmts->safe_push (def_stmt); - oprnd = new_oprnd; - } - } - else - { - /* Retrieve the operand before the type promotion. */ - oprnd = gimple_assign_rhs1 (def_stmt); - } - } + /* Although the operation could be done in operation_precision, we have + to balance that against introducing extra truncations or extensions. + Calculate the minimum precision that can be handled efficiently. + + The loop above determined that the operation could be handled + efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an + extension from the inputs to the output without introducing more + instructions, and would reduce the number of instructions required + for STMT_INFO itself. + + vect_determine_precisions has also determined that the result only + needs min_output_precision bits. Truncating by a factor of N times + requires a tree of N - 1 instructions, so if TYPE is N times wider + than min_output_precision, doing the operation in TYPE and truncating + the result requires N + (N - 1) = 2N - 1 instructions per output vector. + In contrast: + + - truncating the input to a unary operation and doing the operation + in the new type requires at most N - 1 + 1 = N instructions per + output vector + + - doing the same for a binary operation requires at most + (N - 1) * 2 + 1 = 2N - 1 instructions per output vector + + Both unary and binary operations require fewer instructions than + this if the operands were extended from a suitable truncated form. + Thus there is usually nothing to lose by doing operations in + min_output_precision bits, but there can be something to gain. */ + if (!single_use_p) + min_precision = last_stmt_info->min_output_precision; else - { - if (interm_type) - { - /* Create a type conversion HALF_TYPE->INTERM_TYPE. */ - new_oprnd = make_ssa_name (interm_type); - new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd); - oprnd = new_oprnd; - *new_def_stmt = new_stmt; - } + min_precision = MIN (min_precision, last_stmt_info->min_output_precision); + + /* Apply the minimum efficient precision we just calculated. */ + if (new_precision < min_precision) + new_precision = min_precision; + if (new_precision >= TYPE_PRECISION (type)) + return NULL; - /* Otherwise, OPRND is already set. */ + vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt); + + *type_out = get_vectype_for_scalar_type (type); + if (!*type_out) + return NULL; + + /* We've found a viable pattern. Get the new type of the operation. */ + bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED); + tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p); + + /* We specifically don't check here whether the target supports the + new operation, since it might be something that a later pattern + wants to rewrite anyway. If targets have a minimum element size + for some optabs, we should pattern-match smaller ops to larger ops + where beneficial. */ + tree new_vectype = get_vectype_for_scalar_type (new_type); + if (!new_vectype) + return NULL; + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, "demoting "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, type); + dump_printf (MSG_NOTE, " to "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type); + dump_printf (MSG_NOTE, "\n"); } - if (interm_type) - *new_type = interm_type; - else - *new_type = half_type; + /* Calculate the rhs operands for an operation on NEW_TYPE. */ + STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL; + tree ops[3] = {}; + for (unsigned int i = 1; i < first_op; ++i) + ops[i - 1] = gimple_op (last_stmt, i); + vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1], + new_type, &unprom[0], new_vectype); + + /* Use the operation to produce a result of type NEW_TYPE. */ + tree new_var = vect_recog_temp_ssa_var (new_type, NULL); + gimple *pattern_stmt = gimple_build_assign (new_var, code, + ops[0], ops[1], ops[2]); + gimple_set_location (pattern_stmt, gimple_location (last_stmt)); + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "created pattern stmt: "); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0); + } - *op0 = oprnd; - *op1 = fold_convert (*new_type, const_oprnd); + pattern_stmt = vect_convert_output (last_stmt_info, type, + pattern_stmt, new_vectype); - return true; + stmts->safe_push (last_stmt); + return pattern_stmt; } +/* Recognize cases in which the input to a cast is wider than its + output, and the input is fed by a widening operation. Fold this + by removing the unnecessary intermediate widening. E.g.: -/* Try to find a statement or a sequence of statements that can be performed - on a smaller type: - - type x_t; - TYPE x_T, res0_T, res1_T; - loop: - S1 x_t = *p; - S2 x_T = (TYPE) x_t; - S3 res0_T = op (x_T, C0); - S4 res1_T = op (res0_T, C1); - S5 ... = () res1_T; - type demotion + unsigned char a; + unsigned int b = (unsigned int) a; + unsigned short c = (unsigned short) b; - where type 'TYPE' is at least double the size of type 'type', C0 and C1 are - constants. - Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either - be 'type' or some intermediate type. For now, we expect S5 to be a type - demotion operation. We also check that S3 and S4 have only one use. */ + --> -static gimple * -vect_recog_over_widening_pattern (vec *stmts, tree *type_out) -{ - gimple *stmt = stmts->pop (); - gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL, - *use_stmt = NULL; - tree op0, op1, vectype = NULL_TREE, use_lhs, use_type; - tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd; - bool first; - tree type = NULL; - - first = true; - while (1) - { - if (!vinfo_for_stmt (stmt) - || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt))) - return NULL; - - new_def_stmt = NULL; - if (!vect_operation_fits_smaller_type (stmt, var, &new_type, - &op0, &op1, &new_def_stmt, - stmts)) - { - if (first) - return NULL; - else - break; - } + unsigned short c = (unsigned short) a; - /* STMT can be performed on a smaller type. Check its uses. */ - use_stmt = vect_single_imm_use (stmt); - if (!use_stmt || !is_gimple_assign (use_stmt)) - return NULL; - - /* Create pattern statement for STMT. */ - vectype = get_vectype_for_scalar_type (new_type); - if (!vectype) - return NULL; - - /* We want to collect all the statements for which we create pattern - statetments, except for the case when the last statement in the - sequence doesn't have a corresponding pattern statement. In such - case we associate the last pattern statement with the last statement - in the sequence. Therefore, we only add the original statement to - the list if we know that it is not the last. */ - if (prev_stmt) - stmts->safe_push (prev_stmt); - - var = vect_recog_temp_ssa_var (new_type, NULL); - pattern_stmt - = gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1); - STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt; - new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt); + Although this is rare in input IR, it is an expected side-effect + of the over-widening pattern above. - if (dump_enabled_p ()) - { - dump_printf_loc (MSG_NOTE, vect_location, - "created pattern stmt: "); - dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0); - } + This is beneficial also for integer-to-float conversions, if the + widened integer has more bits than the float, and if the unwidened + input doesn't. */ - type = gimple_expr_type (stmt); - prev_stmt = stmt; - stmt = use_stmt; +static gimple * +vect_recog_cast_forwprop_pattern (vec *stmts, tree *type_out) +{ + /* Check for a cast, including an integer-to-float conversion. */ + gassign *last_stmt = dyn_cast (stmts->pop ()); + if (!last_stmt) + return NULL; + tree_code code = gimple_assign_rhs_code (last_stmt); + if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR) + return NULL; - first = false; - } + /* Make sure that the rhs is a scalar with a natural bitsize. */ + tree lhs = gimple_assign_lhs (last_stmt); + if (!lhs) + return NULL; + tree lhs_type = TREE_TYPE (lhs); + scalar_mode lhs_mode; + if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type) + || !is_a (TYPE_MODE (lhs_type), &lhs_mode)) + return NULL; - /* We got a sequence. We expect it to end with a type demotion operation. - Otherwise, we quit (for now). There are three possible cases: the - conversion is to NEW_TYPE (we don't do anything), the conversion is to - a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and - NEW_TYPE differs (we create a new conversion statement). */ - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt))) - { - use_lhs = gimple_assign_lhs (use_stmt); - use_type = TREE_TYPE (use_lhs); - /* Support only type demotion or signedess change. */ - if (!INTEGRAL_TYPE_P (use_type) - || TYPE_PRECISION (type) <= TYPE_PRECISION (use_type)) - return NULL; - - /* Check that NEW_TYPE is not bigger than the conversion result. */ - if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type)) - return NULL; + /* Check for a narrowing operation (from a vector point of view). */ + tree rhs = gimple_assign_rhs1 (last_stmt); + tree rhs_type = TREE_TYPE (rhs); + if (!INTEGRAL_TYPE_P (rhs_type) + || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type) + || TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode)) + return NULL; - if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type) - || TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type)) - { - *type_out = get_vectype_for_scalar_type (use_type); - if (!*type_out) - return NULL; + /* Try to find an unpromoted input. */ + stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt); + vec_info *vinfo = last_stmt_info->vinfo; + vect_unpromoted_value unprom; + if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom) + || TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type)) + return NULL; - /* Create NEW_TYPE->USE_TYPE conversion. */ - new_oprnd = make_ssa_name (use_type); - pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var); - STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt; - - /* We created a pattern statement for the last statement in the - sequence, so we don't need to associate it with the pattern - statement created for PREV_STMT. Therefore, we add PREV_STMT - to the list in order to mark it later in vect_pattern_recog_1. */ - if (prev_stmt) - stmts->safe_push (prev_stmt); - } - else - { - if (prev_stmt) - STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt)) - = STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt)); + /* If the bits above RHS_TYPE matter, make sure that they're the + same when extending from UNPROM as they are when extending from RHS. */ + if (!INTEGRAL_TYPE_P (lhs_type) + && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type)) + return NULL; - *type_out = vectype; - } + /* We can get the same result by casting UNPROM directly, to avoid + the unnecessary widening and narrowing. */ + vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt); - stmts->safe_push (use_stmt); - } - else - /* TODO: support general case, create a conversion to the correct type. */ + *type_out = get_vectype_for_scalar_type (lhs_type); + if (!*type_out) return NULL; - /* Pattern detected. */ - vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ()); + tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL); + gimple *pattern_stmt = gimple_build_assign (new_var, code, unprom.op); + gimple_set_location (pattern_stmt, gimple_location (last_stmt)); + stmts->safe_push (last_stmt); return pattern_stmt; } @@ -4205,6 +4170,390 @@ vect_recog_gather_scatter_pattern (vec *stmts, tree *type_out) return pattern_stmt; } +/* Return true if TYPE is a non-boolean integer type. These are the types + that we want to consider for narrowing. */ + +static bool +vect_narrowable_type_p (tree type) +{ + return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type); +} + +/* Return true if the operation given by CODE can be truncated to N bits + when only N bits of the output are needed. This is only true if bit N+1 + of the inputs has no effect on the low N bits of the result. */ + +static bool +vect_truncatable_operation_p (tree_code code) +{ + switch (code) + { + case PLUS_EXPR: + case MINUS_EXPR: + case MULT_EXPR: + case BIT_AND_EXPR: + case BIT_IOR_EXPR: + case BIT_XOR_EXPR: + case COND_EXPR: + return true; + + default: + return false; + } +} + +/* Record that STMT_INFO could be changed from operating on TYPE to + operating on a type with the precision and sign given by PRECISION + and SIGN respectively. PRECISION is an arbitrary bit precision; + it might not be a whole number of bytes. */ + +static void +vect_set_operation_type (stmt_vec_info stmt_info, tree type, + unsigned int precision, signop sign) +{ + /* Round the precision up to a whole number of bytes. */ + precision = vect_element_precision (precision); + if (precision < TYPE_PRECISION (type) + && (!stmt_info->operation_precision + || stmt_info->operation_precision > precision)) + { + stmt_info->operation_precision = precision; + stmt_info->operation_sign = sign; + } +} + +/* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its + non-boolean inputs, all of which have type TYPE. MIN_INPUT_PRECISION + is an arbitrary bit precision; it might not be a whole number of bytes. */ + +static void +vect_set_min_input_precision (stmt_vec_info stmt_info, tree type, + unsigned int min_input_precision) +{ + /* This operation in isolation only requires the inputs to have + MIN_INPUT_PRECISION of precision, However, that doesn't mean + that MIN_INPUT_PRECISION is a natural precision for the chain + as a whole. E.g. consider something like: + + unsigned short *x, *y; + *y = ((*x & 0xf0) >> 4) | (*y << 4); + + The right shift can be done on unsigned chars, and only requires the + result of "*x & 0xf0" to be done on unsigned chars. But taking that + approach would mean turning a natural chain of single-vector unsigned + short operations into one that truncates "*x" and then extends + "(*x & 0xf0) >> 4", with two vectors for each unsigned short + operation and one vector for each unsigned char operation. + This would be a significant pessimization. + + Instead only propagate the maximum of this precision and the precision + required by the users of the result. This means that we don't pessimize + the case above but continue to optimize things like: + + unsigned char *y; + unsigned short *x; + *y = ((*x & 0xf0) >> 4) | (*y << 4); + + Here we would truncate two vectors of *x to a single vector of + unsigned chars and use single-vector unsigned char operations for + everything else, rather than doing two unsigned short copies of + "(*x & 0xf0) >> 4" and then truncating the result. */ + min_input_precision = MAX (min_input_precision, + stmt_info->min_output_precision); + + if (min_input_precision < TYPE_PRECISION (type) + && (!stmt_info->min_input_precision + || stmt_info->min_input_precision > min_input_precision)) + stmt_info->min_input_precision = min_input_precision; +} + +/* Subroutine of vect_determine_min_output_precision. Return true if + we can calculate a reduced number of output bits for STMT_INFO, + whose result is LHS. */ + +static bool +vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs) +{ + /* Take the maximum precision required by users of the result. */ + unsigned int precision = 0; + imm_use_iterator iter; + use_operand_p use; + FOR_EACH_IMM_USE_FAST (use, iter, lhs) + { + gimple *use_stmt = USE_STMT (use); + if (is_gimple_debug (use_stmt)) + continue; + if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt)) + return false; + stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt); + if (!use_stmt_info->min_input_precision) + return false; + precision = MAX (precision, use_stmt_info->min_input_precision); + } + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ", + precision); + dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs); + dump_printf (MSG_NOTE, " are significant\n"); + } + stmt_info->min_output_precision = precision; + return true; +} + +/* Calculate min_output_precision for STMT_INFO. */ + +static void +vect_determine_min_output_precision (stmt_vec_info stmt_info) +{ + /* We're only interested in statements with a narrowable result. */ + tree lhs = gimple_get_lhs (stmt_info->stmt); + if (!lhs + || TREE_CODE (lhs) != SSA_NAME + || !vect_narrowable_type_p (TREE_TYPE (lhs))) + return; + + if (!vect_determine_min_output_precision_1 (stmt_info, lhs)) + stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs)); +} + +/* Use range information to decide whether STMT (described by STMT_INFO) + could be done in a narrower type. This is effectively a forward + propagation, since it uses context-independent information that applies + to all users of an SSA name. */ + +static void +vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt) +{ + tree lhs = gimple_assign_lhs (stmt); + if (!lhs || TREE_CODE (lhs) != SSA_NAME) + return; + + tree type = TREE_TYPE (lhs); + if (!vect_narrowable_type_p (type)) + return; + + /* First see whether we have any useful range information for the result. */ + unsigned int precision = TYPE_PRECISION (type); + signop sign = TYPE_SIGN (type); + wide_int min_value, max_value; + if (!vect_get_range_info (lhs, &min_value, &max_value)) + return; + + tree_code code = gimple_assign_rhs_code (stmt); + unsigned int nops = gimple_num_ops (stmt); + + if (!vect_truncatable_operation_p (code)) + /* Check that all relevant input operands are compatible, and update + [MIN_VALUE, MAX_VALUE] to include their ranges. */ + for (unsigned int i = 1; i < nops; ++i) + { + tree op = gimple_op (stmt, i); + if (TREE_CODE (op) == INTEGER_CST) + { + /* Don't require the integer to have RHS_TYPE (which it might + not for things like shift amounts, etc.), but do require it + to fit the type. */ + if (!int_fits_type_p (op, type)) + return; + + min_value = wi::min (min_value, wi::to_wide (op, precision), sign); + max_value = wi::max (max_value, wi::to_wide (op, precision), sign); + } + else if (TREE_CODE (op) == SSA_NAME) + { + /* Ignore codes that don't take uniform arguments. */ + if (!types_compatible_p (TREE_TYPE (op), type)) + return; + + wide_int op_min_value, op_max_value; + if (!vect_get_range_info (op, &op_min_value, &op_max_value)) + return; + + min_value = wi::min (min_value, op_min_value, sign); + max_value = wi::max (max_value, op_max_value, sign); + } + else + return; + } + + /* Try to switch signed types for unsigned types if we can. + This is better for two reasons. First, unsigned ops tend + to be cheaper than signed ops. Second, it means that we can + handle things like: + + signed char c; + int res = (int) c & 0xff00; // range [0x0000, 0xff00] + + as: + + signed char c; + unsigned short res_1 = (unsigned short) c & 0xff00; + int res = (int) res_1; + + where the intermediate result res_1 has unsigned rather than + signed type. */ + if (sign == SIGNED && !wi::neg_p (min_value)) + sign = UNSIGNED; + + /* See what precision is required for MIN_VALUE and MAX_VALUE. */ + unsigned int precision1 = wi::min_precision (min_value, sign); + unsigned int precision2 = wi::min_precision (max_value, sign); + unsigned int value_precision = MAX (precision1, precision2); + if (value_precision >= precision) + return; + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d" + " without loss of precision: ", + sign == SIGNED ? "signed" : "unsigned", + value_precision); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); + } + + vect_set_operation_type (stmt_info, type, value_precision, sign); + vect_set_min_input_precision (stmt_info, type, value_precision); +} + +/* Use information about the users of STMT's result to decide whether + STMT (described by STMT_INFO) could be done in a narrower type. + This is effectively a backward propagation. */ + +static void +vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt) +{ + tree_code code = gimple_assign_rhs_code (stmt); + unsigned int opno = (code == COND_EXPR ? 2 : 1); + tree type = TREE_TYPE (gimple_op (stmt, opno)); + if (!vect_narrowable_type_p (type)) + return; + + unsigned int precision = TYPE_PRECISION (type); + unsigned int operation_precision, min_input_precision; + switch (code) + { + CASE_CONVERT: + /* Only the bits that contribute to the output matter. Don't change + the precision of the operation itself. */ + operation_precision = precision; + min_input_precision = stmt_info->min_output_precision; + break; + + case LSHIFT_EXPR: + case RSHIFT_EXPR: + { + tree shift = gimple_assign_rhs2 (stmt); + if (TREE_CODE (shift) != INTEGER_CST + || !wi::ltu_p (wi::to_widest (shift), precision)) + return; + unsigned int const_shift = TREE_INT_CST_LOW (shift); + if (code == LSHIFT_EXPR) + { + /* We need CONST_SHIFT fewer bits of the input. */ + operation_precision = stmt_info->min_output_precision; + min_input_precision = (MAX (operation_precision, const_shift) + - const_shift); + } + else + { + /* We need CONST_SHIFT extra bits to do the operation. */ + operation_precision = (stmt_info->min_output_precision + + const_shift); + min_input_precision = operation_precision; + } + break; + } + + default: + if (vect_truncatable_operation_p (code)) + { + /* Input bit N has no effect on output bits N-1 and lower. */ + operation_precision = stmt_info->min_output_precision; + min_input_precision = operation_precision; + break; + } + return; + } + + if (operation_precision < precision) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d" + " without affecting users: ", + TYPE_UNSIGNED (type) ? "unsigned" : "signed", + operation_precision); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); + } + vect_set_operation_type (stmt_info, type, operation_precision, + TYPE_SIGN (type)); + } + vect_set_min_input_precision (stmt_info, type, min_input_precision); +} + +/* Handle vect_determine_precisions for STMT_INFO, given that we + have already done so for the users of its result. */ + +void +vect_determine_stmt_precisions (stmt_vec_info stmt_info) +{ + vect_determine_min_output_precision (stmt_info); + if (gassign *stmt = dyn_cast (stmt_info->stmt)) + { + vect_determine_precisions_from_range (stmt_info, stmt); + vect_determine_precisions_from_users (stmt_info, stmt); + } +} + +/* Walk backwards through the vectorizable region to determine the + values of these fields: + + - min_output_precision + - min_input_precision + - operation_precision + - operation_sign. */ + +void +vect_determine_precisions (vec_info *vinfo) +{ + DUMP_VECT_SCOPE ("vect_determine_precisions"); + + if (loop_vec_info loop_vinfo = dyn_cast (vinfo)) + { + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); + unsigned int nbbs = loop->num_nodes; + + for (unsigned int i = 0; i < nbbs; i++) + { + basic_block bb = bbs[nbbs - i - 1]; + for (gimple_stmt_iterator si = gsi_last_bb (bb); + !gsi_end_p (si); gsi_prev (&si)) + vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si))); + } + } + else + { + bb_vec_info bb_vinfo = as_a (vinfo); + gimple_stmt_iterator si = bb_vinfo->region_end; + gimple *stmt; + do + { + if (!gsi_stmt (si)) + si = gsi_last_bb (bb_vinfo->bb); + else + gsi_prev (&si); + stmt = gsi_stmt (si); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info)) + vect_determine_stmt_precisions (stmt_info); + } + while (stmt != gsi_stmt (bb_vinfo->region_begin)); + } +} + typedef gimple *(*vect_recog_func_ptr) (vec *, tree *); struct vect_recog_func @@ -4217,13 +4566,14 @@ struct vect_recog_func taken which means usually the more complex one needs to preceed the less comples onex (widen_sum only after dot_prod or sad for example). */ static vect_recog_func vect_vect_recog_func_ptrs[] = { + { vect_recog_over_widening_pattern, "over_widening" }, + { vect_recog_cast_forwprop_pattern, "cast_forwprop" }, { vect_recog_widen_mult_pattern, "widen_mult" }, { vect_recog_dot_prod_pattern, "dot_prod" }, { vect_recog_sad_pattern, "sad" }, { vect_recog_widen_sum_pattern, "widen_sum" }, { vect_recog_pow_pattern, "pow" }, { vect_recog_widen_shift_pattern, "widen_shift" }, - { vect_recog_over_widening_pattern, "over_widening" }, { vect_recog_rotate_pattern, "rotate" }, { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" }, { vect_recog_divmod_pattern, "divmod" }, @@ -4502,6 +4852,8 @@ vect_pattern_recog (vec_info *vinfo) unsigned int i, j; auto_vec stmts_to_replace; + vect_determine_precisions (vinfo); + DUMP_VECT_SCOPE ("vect_pattern_recog"); if (loop_vec_info loop_vinfo = dyn_cast (vinfo)) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 2dac54e..28be41f 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -899,6 +899,21 @@ typedef struct _stmt_vec_info { /* The number of scalar stmt references from active SLP instances. */ unsigned int num_slp_uses; + + /* If nonzero, the lhs of the statement could be truncated to this + many bits without affecting any users of the result. */ + unsigned int min_output_precision; + + /* If nonzero, all non-boolean input operands have the same precision, + and they could each be truncated to this many bits without changing + the result. */ + unsigned int min_input_precision; + + /* If OPERATION_BITS is nonzero, the statement could be performed on + an integer with the sign and number of bits given by OPERATION_SIGN + and OPERATION_BITS without changing the result. */ + unsigned int operation_precision; + signop operation_sign; } *stmt_vec_info; /* Information about a gather/scatter call. */ -- cgit v1.1