diff options
author | Alexandre Oliva <oliva@adacore.com> | 2023-11-29 04:00:24 -0300 |
---|---|---|
committer | Alexandre Oliva <oliva@gnu.org> | 2023-11-29 04:00:24 -0300 |
commit | 1ff6d9f7428b0668cd8ab0b3e3ab94f1d733124d (patch) | |
tree | 0c3efcba4d588c5014b1b25365e9021c97c4ee4b /gcc/expr.h | |
parent | 25a51e98fdd504826a40775a5e5b9ffb336b5aa1 (diff) | |
download | gcc-1ff6d9f7428b0668cd8ab0b3e3ab94f1d733124d.zip gcc-1ff6d9f7428b0668cd8ab0b3e3ab94f1d733124d.tar.gz gcc-1ff6d9f7428b0668cd8ab0b3e3ab94f1d733124d.tar.bz2 |
Introduce -finline-stringops
try_store_by_multiple_pieces was added not long ago, enabling
variable-sized memset to be expanded inline when the worst-case
in-range constant length would, using conditional blocks with powers
of two to cover all possibilities of length and alignment.
This patch introduces -finline-stringops[=fn] to request expansions to
start with a loop, so as to still take advantage of known alignment
even with long lengths, but without necessarily adding store blocks
for every power of two.
This makes it possible for the supported stringops (memset, memcpy,
memmove, memset) to be expanded, even if storing a single byte per
iteration. Surely efficient implementations can run faster, with a
pre-loop to increase alignment, but that would likely be excessive for
inline expansions.
Still, in some cases, such as in freestanding environments, users
prefer to inline such stringops, especially those that the compiler
may introduce itself, even if the expansion is not as performant as a
highly optimized C library implementation could be, to avoid
depending on a C runtime library.
for gcc/ChangeLog
* expr.cc (emit_block_move_hints): Take ctz of len. Obey
-finline-stringops. Use oriented or sized loop.
(emit_block_move): Take ctz of len, and pass it on.
(emit_block_move_via_sized_loop): New.
(emit_block_move_via_oriented_loop): New.
(emit_block_move_via_loop): Take incr. Move an incr-sized
block per iteration.
(emit_block_cmp_via_cmpmem): Take ctz of len. Obey
-finline-stringops.
(emit_block_cmp_via_loop): New.
* expr.h (emit_block_move): Add ctz of len defaulting to zero.
(emit_block_move_hints): Likewise.
(emit_block_cmp_hints): Likewise.
* builtins.cc (expand_builtin_memory_copy_args): Pass ctz of
len to emit_block_move_hints.
(try_store_by_multiple_pieces): Support starting with a loop.
(expand_builtin_memcmp): Pass ctz of len to
emit_block_cmp_hints.
(expand_builtin): Allow inline expansion of memset, memcpy,
memmove and memcmp if requested.
* common.opt (finline-stringops): New.
(ilsop_fn): New enum.
* flag-types.h (enum ilsop_fn): New.
* doc/invoke.texi (-finline-stringops): Add.
for gcc/testsuite/ChangeLog
* gcc.dg/torture/inline-mem-cmp-1.c: New.
* gcc.dg/torture/inline-mem-cpy-1.c: New.
* gcc.dg/torture/inline-mem-cpy-cmp-1.c: New.
* gcc.dg/torture/inline-mem-move-1.c: New.
* gcc.dg/torture/inline-mem-set-1.c: New.
Diffstat (limited to 'gcc/expr.h')
-rw-r--r-- | gcc/expr.h | 9 |
1 files changed, 6 insertions, 3 deletions
@@ -126,7 +126,8 @@ struct by_pieces_prev fixed_size_mode mode; }; -extern rtx emit_block_move (rtx, rtx, rtx, enum block_op_methods); +extern rtx emit_block_move (rtx, rtx, rtx, enum block_op_methods, + unsigned ctz_size = 0); extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods, unsigned int, HOST_WIDE_INT, unsigned HOST_WIDE_INT, @@ -134,9 +135,11 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods, unsigned HOST_WIDE_INT, bool bail_out_libcall = false, bool *is_move_done = NULL, - bool might_overlap = false); + bool might_overlap = false, + unsigned ctz_size = 0); extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool, - by_pieces_constfn, void *); + by_pieces_constfn, void *, + unsigned ctz_len = 0); extern bool emit_storent_insn (rtx to, rtx from); /* Copy all or part of a value X into registers starting at REGNO. |