[PATCH v2 1/3] RISC-V: movmem for RISCV with V extension

This patchset permits generation of inlined vectorised code for movmem, setmem and cmpmem, if and only if the operation size is at least one and at most eight vector registers' worth of data. Further vectorisation rapidly becomes debatable due to code size concerns; however, for these simple cases we do have an unambiguous performance win without sacrificing too much code size compared to a libc call. Changes in v2: * run clang-format over the code in addition to the contrib/check_GNU_style.sh that was used for v1 * remove string.h include and refer to __builtin_* memory functions in multilib tests * respect stringop_strategy (don't vectorise if it doesn't include VECTOR) * use an integer constraint for movmem length parameter * use TARGET_MAX_LMUL unless riscv-autovec-lmul=dynamic to ensure we respect the user's wishes if they request specific lmul * add new unit tests to check that riscv-autovec-lmul is respected * PR target/112109 added to changelog for patch 1/3 as requested Sergei Lewis (3): RISC-V: movmem for RISCV with V extension RISC-V: setmem for RISCV with V extension RISC-V: cmpmem for RISCV with V extension gcc/ChangeLog * config/riscv/riscv.md (movmem<mode>): Use riscv_vector::expand_block_move, if and only if we know the entire operation can be performed using one vector load followed by one vector store gcc/testsuite/ChangeLog PR target/112109 * gcc.target/riscv/rvv/base/movmem-1.c: New test
author: Sergei Lewis <slewis@rivosinc.com> 2024-05-13 17:32:24 -0600
committer: Jeff Law <jlaw@ventanamicro.com> 2024-05-13 17:32:24 -0600
commit: df15eb15b5f820321c81efc75f0af13ff8c0dd5b (patch)
tree: 48069898f9a19b1854ab5134c9a82116c61fe0fa
parent: 67476ba8adb432033993f429b1aa4ee5689fa046 (diff)
download: gcc-df15eb15b5f820321c81efc75f0af13ff8c0dd5b.zip
gcc-df15eb15b5f820321c81efc75f0af13ff8c0dd5b.tar.gz
gcc-df15eb15b5f820321c81efc75f0af13ff8c0dd5b.tar.bz2
2 files changed, 82 insertions, 0 deletions
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4d6de99..696d911 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2608,6 +2608,29 @@
     FAIL;
 })
 
+;; Inlining general memmove is a pessimisation as we can't avoid having to
+;; decide which direction to go at runtime, which can be costly.  Until we
+;; can benchmark implementations on real V hardware implement a conservative
+;; approach of inlining cases which can be performed with a single vector
+;; load + store.  For tiny moves, fallback to scalar.
+(define_expand "movmem<mode>"
+  [(parallel [(set (match_operand:BLK 0 "general_operand")
+		   (match_operand:BLK 1 "general_operand"))
+	      (use (match_operand:P 2 "const_int_operand"))
+	      (use (match_operand:SI 3 "const_int_operand"))])]
+  "TARGET_VECTOR"
+{
+  if (CONST_INT_P (operands[2])
+      && INTVAL (operands[2]) >= TARGET_MIN_VLEN / 8
+      && INTVAL (operands[2]) <= TARGET_MIN_VLEN
+      && riscv_vector::expand_block_move (operands[0],
+					  operands[1],
+					  operands[2]))
+    DONE;
+  else
+    FAIL;
+})
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
new file mode 100644
index 0000000..b930241
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <string.h>
+
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
+
+/* tiny memmoves should not be vectorised
+** f1:
+**  li\s+a2,15
+**  tail\s+memmove
+*/
+char * f1 (char *a, char const *b)
+{
+  return memmove (a, b, 15);
+}
+
+/* vectorise+inline minimum vector register width with LMUL=1
+** f2:
+**  (
+**  vsetivli\s+zero,16,e8,m1,ta,ma
+**  |
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma
+**  )
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char * f2 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES);
+}
+
+/* vectorise+inline up to LMUL=8
+** f3:
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char * f3 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES*8);
+}
+
+/* don't vectorise if the move is too large for one operation
+** f4:
+**  li\s+a2,\d+
+**  tail\s+memmove
+*/
+char * f4 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES*8+1);
+}
+
author	Sergei Lewis <slewis@rivosinc.com>	2024-05-13 17:32:24 -0600
committer	Jeff Law <jlaw@ventanamicro.com>	2024-05-13 17:32:24 -0600
commit	df15eb15b5f820321c81efc75f0af13ff8c0dd5b (patch)
tree	48069898f9a19b1854ab5134c9a82116c61fe0fa
parent	67476ba8adb432033993f429b1aa4ee5689fa046 (diff)
download	gcc-df15eb15b5f820321c81efc75f0af13ff8c0dd5b.zip gcc-df15eb15b5f820321c81efc75f0af13ff8c0dd5b.tar.gz gcc-df15eb15b5f820321c81efc75f0af13ff8c0dd5b.tar.bz2