aboutsummaryrefslogtreecommitdiff
path: root/tcg/optimize.c
AgeCommit message (Collapse)AuthorFilesLines
2024-09-22tcg/optimize: Optimize bitsel_vecRichard Henderson1-0/+58
Fold matching true/false operands. Fold true/false operands with 0/-1 to simpler logicals. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-22tcg/optimize: Optimize cmp_vec and cmpsel_vecRichard Henderson1-0/+36
Place immediate values second in the comparison. Place destination matches first in the true/false values. All of this mirrors what we do for integer setcond and movcond. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-22tcg/optimize: Fold movcond with true and false values identicalRichard Henderson1-0/+5
Fold "x = cond ? y : y" to "x = y". Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-03tcg/optimize: Fix TCG_COND_TST* simplification of setcond2Richard Henderson1-1/+1
Argument ordering for setcond2 is: output, a_low, a_high, b_low, b_high, cond The test is supposed to be against b_low, not a_high. Cc: qemu-stable@nongnu.org Fixes: ceb9ee06b71 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2413 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240701024623.1265028-1-richard.henderson@linaro.org>
2024-05-06tcg/optimize: Optimize setcond with zmaskRichard Henderson1-0/+110
If we can show that high bits of an input are zero, then we may optimize away some comparisons. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-04-09tcg/optimize: Do not attempt to constant fold neg_vecRichard Henderson1-9/+8
Split out the tail of fold_neg to fold_neg_no_const so that we can avoid attempting to constant fold vector negate. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2150 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-03-29tcg/optimize: Fix sign_mask for logical right-shiftRichard Henderson1-1/+1
The 'sign' computation is attempting to locate the sign bit that has been repeated, so that we can test if that bit is known zero. That computation can be zero if there are no known sign repetitions. Cc: qemu-stable@nongnu.org Fixes: 93a967fbb57 ("tcg/optimize: Propagate sign info for shifting") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2248 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-02-29tcg/optimize: fix uninitialized variablePaolo Bonzini1-1/+2
The variables uext_opc and sext_opc are used without initialization if TCG_TARGET_extract_i{32,64}_valid returns false. The result, depending on the compiler, might be the generation of extract and sextract opcodes with invalid offset and count, or just random data in the TCG opcode stream. Fixes: ceb9ee06b71 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}", 2024-02-03) Cc: Richard Henderson <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20240228110641.287205-1-pbonzini@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03tcg/optimize: Lower TCG_COND_TST{EQ,NE} if unsupportedRichard Henderson1-8/+52
After having performed other simplifications, lower any remaining test comparisons with AND. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03tcg/optimize: Handle TCG_COND_TST{EQ,NE}Richard Henderson1-22/+218
Fold constant comparisons. Canonicalize "tst x,x" to equality vs zero. Canonicalize "tst x,sign" to sign test vs zero. Fold double-word comparisons with zero parts. Fold setcond of "tst x,pow2" to a bit extract. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03tcg/optimize: Do swap_commutative2 in do_constant_folding_cond2Richard Henderson1-50/+57
Mirror the new do_constant_folding_cond1 by doing all argument and condition adjustment within one helper. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03tcg/optimize: Split out do_constant_folding_cond1Richard Henderson1-30/+27
Handle modifications to the arguments and condition in a single place. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03tcg/optimize: Split out arg_is_const_valRichard Henderson1-15/+23
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06tcg/optimize: Canonicalize sub2 with constants to add2Richard Henderson1-2/+19
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20231026013945.1152174-4-richard.henderson@linaro.org>
2023-11-06tcg/optimize: Canonicalize subi to addi during optimizationRichard Henderson1-1/+13
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20231026013945.1152174-3-richard.henderson@linaro.org>
2023-11-06tcg/optimize: Split out arg_new_constantRichard Henderson1-11/+18
Fixes a bug wherein raw uses of tcg_constant_internal do not have their TempOptInfo initialized. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06tcg: Eliminate duplicate env store operationsRichard Henderson1-0/+13
Notice when a constant is stored to the same location twice. Reviewed-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06tcg/optimize: Optimize env memory operationsRichard Henderson1-21/+243
Propagate stores to loads, loads to loads. Reviewed-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06tcg/optimize: Split out cmp_better_copyRichard Henderson1-18/+11
Compare two temps for "better", split out from finding the best from a whole list. Use TCGKind, which already gives the proper priority. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06tcg/optimize: Pipe OptContext into reset_tsRichard Henderson1-7/+7
Will be needed in the next patch. Reviewed-by: Song Gao <gaosong@loongson.cn> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-11-06tcg: Remove TCG_TARGET_HAS_neg_{i32,i64}Richard Henderson1-2/+2
The movcond opcode is now mandatory for backends to implement. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231026041404.1229328-7-richard.henderson@linaro.org>
2023-10-22tcg: Optimize past conditional branchesRichard Henderson1-3/+5
We already register allocate through extended basic blocks, optimize through extended basic blocks as well. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-24tcg: Introduce negsetcond opcodesRichard Henderson1-1/+40
Introduce a new opcode for negative setcond. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-24tcg: Fold deposit with zero to andRichard Henderson1-0/+37
Inserting a zero into a value, or inserting a value into zero at offset 0 may be implemented with AND. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-05tcg: Split tcg/tcg-op-common.h from tcg/tcg-op.hRichard Henderson1-1/+1
Create tcg/tcg-op-common.h, moving everything that does not concern TARGET_LONG_BITS or TCGv. Adjust tcg/*.c to use the new header instead of tcg-op.h, in preparation for compiling tcg/ only once. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-16tcg: Split INDEX_op_qemu_{ld,st}* for guest address sizeRichard Henderson1-7/+14
For 32-bit hosts, we cannot simply rely on TCGContext.addr_bits, as we need one or two host registers to represent the guest address. Create the new opcodes and update all users. Since we have not yet eliminated TARGET_LONG_BITS, only one of the two opcodes will ever be used, so we can get away with treating them the same in the backends. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-16tcg: Add INDEX_op_qemu_{ld,st}_i128Richard Henderson1-0/+2
Add opcodes for backend support for 128-bit memory operations. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-04-23tcg: Replace tcg_abort with g_assert_not_reachedRichard Henderson1-6/+4
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-01tcg: Rename TEMP_LOCAL to TEMP_TBRichard Henderson1-1/+1
Use TEMP_TB as that is more explicit about the default lifetime of the data. While "global" and "local" used to be contrasting, we have more lifetimes than that now. Do not yet rename tcg_temp_local_new_*, just the enum. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-01-05tcg: Reorg function callsRichard Henderson1-4/+2
Pre-compute the function call layout for each helper at startup. Drop TCG_CALL_DUMMY_ARG, as we no longer need to leave gaps in the op->args[] array. This allows several places to stop checking for NULL TCGTemp, to which TCG_CALL_DUMMY_ARG mapped. For tcg_gen_callN, loop over the arguments once. Allocate the TCGOp for the call early but delay emitting it, collecting arguments first. This allows the argument processing loop to emit code for extensions and have them sequenced before the call. For tcg_reg_alloc_call, loop over the arguments in reverse order, which allows stack slots to be filled first naturally. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-01-05tcg: Pass number of arguments to tcg_emit_op() / tcg_op_insert_*()Philippe Mathieu-Daudé1-2/+2
In order to have variable size allocated TCGOp, pass the number of arguments we use (and would allocate) up to tcg_op_alloc(). This alters tcg_emit_op(), tcg_op_insert_before() and tcg_op_insert_after() prototypes. In tcg_op_alloc() ensure the number of arguments is in range. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> [PMD: Extracted from bigger patch] Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221218211832.73312-2-philmd@linaro.org>
2022-03-04tcg: Add opcodes for vector nand, nor, eqvRichard Henderson1-6/+6
We've had placeholders for these opcodes for a while, and should have support on ppc, s390x and avx512 hosts. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-03-04tcg/optimize: only read val after const checkAlex Bennée1-4/+4
valgrind pointed out that arg_info()->val can be undefined which will be the case if the arguments are not constant. The ordering of the checks will have ensured we never relied on an undefined value but for the sake of completeness re-order the code to be clear. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220209112142.3367525-1-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-01-04tcg/optimize: Fix folding of vector opsRichard Henderson1-11/+38
Bitwise operations are easy to fold, because the operation is identical regardless of element size. But add and sub need extra element size info that is not currently propagated. Fixes: 2f9f08ba43d Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/799 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-11tcg/optimize: Add an extra cast to fold_extract2Richard Henderson1-1/+1
There is no bug, but silence a warning about computation in int32_t being assigned to a uint64_t. Reported-by: Coverity CID 1465220 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28tcg/optimize: Propagate sign info for shiftingRichard Henderson1-3/+47
For constant shifts, we can simply shift the s_mask. For variable shifts, we know that sar does not reduce the s_mask, which helps for sequences like ext32s_i64 t, in sar_i64 t, t, v ext32s_i64 out, t allowing the final extend to be eliminated. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28tcg/optimize: Propagate sign info for bit countingRichard Henderson1-1/+2
The results are generally 6 bit unsigned values, though the count leading and trailing bits may produce any value for a zero input. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28tcg/optimize: Propagate sign info for setcondRichard Henderson1-0/+2
The result is either 0 or 1, which means that we have a 2 bit signed result, and thus 62 bits of sign. For clarity, use the smask_from_zmask function. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28tcg/optimize: Propagate sign info for logical operationsRichard Henderson1-0/+29
Sign repetitions are perforce all identical, whether they are 1 or 0. Bitwise operations preserve the relative quantity of the repetitions. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28tcg/optimize: Optimize sign extensionsRichard Henderson1-21/+102
Certain targets, like riscv, produce signed 32-bit results. This can lead to lots of redundant extensions as values are manipulated. Begin by tracking only the obvious sign-extensions, and converting them to simple copies when possible. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28tcg/optimize: Use fold_xx_to_i for remRichard Henderson1-1/+5
Recognize the constant function for remainder. Suggested-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28tcg/optimize: Use fold_xi_to_x for divRichard Henderson1-1/+5
Recognize the identity function for division. Suggested-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28tcg/optimize: Use fold_xi_to_x for mulRichard Henderson1-1/+2
Recognize the identity function for low-part multiply. Suggested-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28tcg/optimize: Use fold_xx_to_i for orcRichard Henderson1-0/+1
Recognize the constant function for or-complement. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-28tcg/optimize: Stop forcing z_mask to "garbage" for 32-bit valuesRichard Henderson1-19/+16
This "garbage" setting pre-dates the addition of the type changing opcodes INDEX_op_ext_i32_i64, INDEX_op_extu_i32_i64, and INDEX_op_extr{l,h}_i64_i32. So now we have a definitive points at which to adjust z_mask to eliminate such bits from the 32-bit operands. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Sink commutative operand swapping into fold functionsRichard Henderson1-72/+70
Most of these are handled by creating a fold_const2_commutative to handle all of the binary operators. The rest were already handled on a case-by-case basis in the switch, and have their own fold function in which to place the call. We now have only one major switch on TCGOpcode. Introduce NO_DEST and a block comment for swap_commutative in order to make the handling of brcond and movcond opcodes cleaner. Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Expand fold_addsub2_i32 to 64-bit opsRichard Henderson1-21/+44
Rename to fold_addsub2. Use Int128 to implement the wider operation. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Expand fold_mulu2_i32 to all 4-arg multipliesRichard Henderson1-9/+35
Rename to fold_multiply2, and handle muls2_i32, mulu2_i64, and muls2_i64. Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Split out fold_masksRichard Henderson1-251/+294
Move all of the known-zero optimizations into the per-opcode functions. Use fold_masks when there is a possibility of the result being determined, and simply set ctx->z_mask otherwise. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Split out fold_ix_to_iRichard Henderson1-18/+10
Pull the "op r, 0, b => movi r, 0" optimization into a function, and use it in fold_shift. Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>