aboutsummaryrefslogtreecommitdiff
path: root/tcg/optimize.c
AgeCommit message (Collapse)AuthorFilesLines
2016-09-16tcg: Optimize fence instructionsPranith Kumar1-0/+39
This commit optimizes fence instructions. Two optimizations are currently implemented: (1) unnecessary duplicate fence instructions, and (2) merging weaker fences into a stronger fence. [rth: Merge tcg_optimize_mb back into tcg_optimize, so that we only loop over the opcode stream once. Merge "unrelated" weaker barriers into one stronger barrier.] Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Message-Id: <20160823134825.32578-1-bobby.prani@gmail.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-08-05tcg: Lower indirect registers in a separate passRichard Henderson1-29/+2
Rather than rely on recursion during the middle of register allocation, lower indirect registers to loads and stores off the indirect base into plain temps. For an x86_64 host, with sufficient registers, this results in identical code, modulo the actual register assignments. For an i686 host, with insufficient registers, this means that temps can be (temporarily) spilled to the stack in order to satisfy an allocation. This as opposed to the possibility of not being able to spill, to allocate a register for the indirect base, in order to perform a spill. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-08-05tcg: Reorg TCGOp chainingRichard Henderson1-6/+2
Instead of using -1 as end of chain, use 0, and link through the 0 entry as a fully circular double-linked list. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-19exec: extract exec/tb-context.hPaolo Bonzini1-1/+1
TCG backends do not need most of exec-all.h; extract what they actually need to a separate file or move it directly to tcg.h. The next patch will stop including exec-all.h from everywhere. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19qemu-common: push cpu.h inclusion out of qemu-common.hPaolo Bonzini1-2/+1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-21tcg: use tcg_debug_assert instead of assert (fix performance regression)Aurelien Jarno1-2/+2
The TCG code is quite performance sensitive, but at the same time can also be quite tricky. That is why asserts that can be enabled with the --enable-debug-tcg configure option. This used to work the following way: | #include "config.h" | | ... | | #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG) | /* define it to suppress various consistency checks (faster) */ | #define NDEBUG | #endif | | ... | | #include <assert.h> Since commit 757e725b (tcg: Clean up includes) "config.h" as been replaced by "qemu/osdep.h" which itself includes <assert.h>. As a consequence the assertions are always enabled, even when using --disable-debug-tcg, causing a performance regression, especially on targets with many registers. For instance on qemu-system-ppc the speed difference is about 15%. tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already uses in some places. This patch replaces all the calls to assert into calss to tcg_debug_assert. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-id: 1461228530-14852-1-git-send-email-aurelien@aurel32.net Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-01-29tcg: Clean up includesPeter Maydell1-3/+1
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-16-git-send-email-peter.maydell@linaro.org
2015-08-24tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32Richard Henderson1-11/+11
Rather than allow arbitrary shift+trunc, only concern ourselves with low and high parts. This is all that was being used anyway. Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 opsAurelien Jarno1-0/+13
They behave the same as ext32s_i64 and ext32u_i64 from the constant folding and zero propagation point of view, except that they can't be replaced by a mov, so we don't compute the affected value. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg: rename trunc_shr_i32 into trunc_shr_i64_i32Aurelien Jarno1-3/+3
The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32, and the name in the README doesn't match the name offered to the frontends. Always use the long name to make it clear it is a size changing op. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: allow constant to have copiesAurelien Jarno1-8/+2
Now that copies and constants are tracked separately, we can allow constant to have copies, deferring the choice to use a register or a constant to the register allocation pass. This prevent this kind of regular constant reloading: -OUT: [size=338] +OUT: [size=298] mov -0x4(%r14),%ebp test %ebp,%ebp jne 0x7ffbe9cb0ed6 mov $0x40002219f8,%rbp mov %rbp,(%r14) - mov $0x40002219f8,%rbp mov $0x4000221a20,%rbx mov %rbp,(%rbx) mov $0x4000000000,%rbp mov %rbp,(%r14) - mov $0x4000000000,%rbp mov $0x4000221d38,%rbx mov %rbp,(%rbx) mov $0x40002221a8,%rbp mov %rbp,(%r14) - mov $0x40002221a8,%rbp mov $0x4000221d40,%rbx mov %rbp,(%rbx) mov $0x4000019170,%rbp mov %rbp,(%r14) - mov $0x4000019170,%rbp mov $0x4000221d48,%rbx mov %rbp,(%rbx) mov $0x40000049ee,%rbp mov %rbp,0x80(%r14) mov %r14,%rdi callq 0x7ffbe99924d0 mov $0x4000001680,%rbp mov %rbp,0x30(%r14) mov 0x10(%r14),%rbp mov $0x4000001680,%rbp mov %rbp,0x30(%r14) mov 0x10(%r14),%rbp shl $0x20,%rbp mov (%r14),%rbx mov %ebx,%ebx mov %rbx,(%r14) or %rbx,%rbp mov %rbp,0x10(%r14) mov %rbp,0x90(%r14) mov 0x60(%r14),%rbx mov %rbx,0x38(%r14) mov 0x28(%r14),%rbx mov $0x4000220e60,%r12 mov %rbx,(%r12) mov $0x40002219c8,%rbx mov %rbp,(%rbx) mov 0x20(%r14),%rbp sub $0x8,%rbp mov $0x4000004a16,%rbx mov %rbx,0x0(%rbp) mov %rbp,0x20(%r14) mov $0x19,%ebp mov %ebp,0xa8(%r14) mov $0x4000015110,%rbp mov %rbp,0x80(%r14) xor %eax,%eax jmpq 0x7ffbebcae426 lea -0x5f6d72a(%rip),%rax # 0x7ffbe3d437b3 jmpq 0x7ffbebcae426 Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: track const/copy status separatelyAurelien Jarno1-28/+14
Instead of using an enum which could be either a copy or a const, track them separately. This will be used in the next patch. Constants are tracked through a bool. Copies are tracked by initializing temp's next_copy and prev_copy to itself, allowing to simplify the code a bit. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: add temp_is_const and temp_is_copy functionsAurelien Jarno1-71/+60
Add two accessor functions temp_is_const and temp_is_copy, to make the code more readable and make code change easier. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: optimize temps trackingAurelien Jarno1-11/+32
The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate vector registers on most guests, it's not uncommon to have more than 100 used temps. This means we have initialize more than 2kB at least twice per TB, often more when there is a few goto_tb. Instead used a TCGTempSet bit array to track which temps are in used in the current basic block. This means there are only around 16 bytes to initialize. This improves the boot time of a MIPS guest on an x86-64 host by around 7% and moves out tcg_optimize from the the top of the profiler list. [rth: Handle TCG_CALL_DUMMY_ARG] Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: fix constant signednessAurelien Jarno1-5/+5
By convention, on a 64-bit host TCG internally stores 32-bit constants as sign-extended. This is not the case in the optimizer when a 32-bit constant is folded. This doesn't seem to have more consequences than suboptimal code generation. For instance the x86 backend assumes sign-extended constants, and in some rare cases uses a 32-bit unsigned immediate 0xffffffff instead of a 8-bit signed immediate 0xff for the constant -1. This is with a ppc guest: before ------ ---- 0x9f29cc movi_i32 tmp1,$0xffffffff movi_i32 tmp2,$0x0 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2 mov_i32 r10,tmp0 0x7fd8c7dfe90c: xor %ebp,%ebp 0x7fd8c7dfe90e: mov %ebp,%r11d 0x7fd8c7dfe911: mov 0x18(%r14),%r9d 0x7fd8c7dfe915: add %r9d,%r10d 0x7fd8c7dfe918: adc %ebp,%r11d 0x7fd8c7dfe91b: add $0xffffffff,%r10d 0x7fd8c7dfe922: adc %ebp,%r11d 0x7fd8c7dfe925: mov %r11d,0x134(%r14) 0x7fd8c7dfe92c: mov %r10d,0x28(%r14) after ----- ---- 0x9f29cc movi_i32 tmp1,$0xffffffffffffffff movi_i32 tmp2,$0x0 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2 mov_i32 r10,tmp0 0x7f37010d490c: xor %ebp,%ebp 0x7f37010d490e: mov %ebp,%r11d 0x7f37010d4911: mov 0x18(%r14),%r9d 0x7f37010d4915: add %r9d,%r10d 0x7f37010d4918: adc %ebp,%r11d 0x7f37010d491b: add $0xffffffffffffffff,%r10d 0x7f37010d491f: adc %ebp,%r11d 0x7f37010d4922: mov %r11d,0x134(%r14) 0x7f37010d4929: mov %r10d,0x28(%r14) Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1436544211-2769-2-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23tcg/optimize: fix tcg_opt_gen_moviAurelien Jarno1-1/+1
Due to a copy&paste, the new op value is tested against mov_i32 instead of movi_i32. The test is therefore always false. Fix that. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1436544211-2769-1-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg/optimize: rename tcg_constant_foldingAurelien Jarno1-6/+1
The tcg_constant_folding folding ends up doing all the optimizations (which is a good thing to avoid looping on all ops multiple time), so make it clear and just rename it tcg_optimize. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-6-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg/optimize: fold constant test in tcg_opt_gen_movAurelien Jarno1-53/+36
Most of the calls to tcg_opt_gen_mov are preceeded by a test to check if the source temp is a constant. Fold that into the tcg_opt_gen_mov function. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433495958-9508-1-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg/optimize: fold temp copies test in tcg_opt_gen_movAurelien Jarno1-18/+9
Each call to tcg_opt_gen_mov is preceeded by a test to check if the source and destination temps are copies. Fold that into the tcg_opt_gen_mov function. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-4-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg/optimize: remove opc argument from tcg_opt_gen_movAurelien Jarno1-7/+7
We can get the opcode using the TCGOp pointer. It needs to be dereferenced, but it's anyway done a few lines below to write the new value. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-3-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg/optimize: remove opc argument from tcg_opt_gen_moviAurelien Jarno1-20/+20
We can get the opcode using the TCGOp pointer. It needs to be dereferenced, but it's anyway done a few lines below to write the new value. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-2-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-05-14tcg: Merge memop and mmu_idx parameters to qemu_ld/stRichard Henderson1-1/+2
At the tcg opcode level, not at the tcg-op.h generator level. This requires minor changes through all of the tcg backends, but none of the cpu translators. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-03-16tcg/optimize: Handle or r,a,a with constant aRichard Henderson1-1/+4
As seen with ubuntu-5.10-live-powerpc.iso. Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-02-12tcg: Implement insert_op_beforeRichard Henderson1-22/+35
Rather reserving space in the op stream for optimization, let the optimizer add ops as necessary. Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-02-12tcg: Remove opcodes instead of noping them outRichard Henderson1-7/+7
With the linked list scheme we need not leave nops in the stream that we need to process later. Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-02-12tcg: Put opcodes in a linked listRichard Henderson1-170/+116
The previous setup required ops and args to be completely sequential, and was error prone when it came to both iteration and optimization. Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-06-18tcg/optimize: Don't special case TCG_OPF_CALL_CLOBBERRichard Henderson1-5/+4
With the "old" ldst ops we didn't know the real width of the result of the load, but with the "new" ldst ops we do. Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-06-04tcg: Remove TCG_TARGET_HAS_new_ldstRichard Henderson1-5/+0
Since all backends have been converted, remove the compatibility code. Acked-by: Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-28tcg/optimize: Remember garbage high bits for 32-bit opsRichard Henderson1-7/+26
For a 64-bit host, the high bits of a register after a 32-bit operation are undefined. Adjust the temps mask for all 32-bit ops to reflect that. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-28tcg/optimize: Move updating of gen_opc_buf into tcg_opt_gen_mov*Richard Henderson1-61/+56
No functional change, just reduce a bit of redundancy. Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-28tcg: Optimize brcond2 and setcond2 ne/eqRichard Henderson1-0/+94
If either the high or low pair can be resolved, we can simplify to either a constant or to a 32-bit comparison. Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-12tcg: Make call address a constant parameterRichard Henderson1-42/+33
Avoid allocating a tcg temporary to hold the constant address, and instead place it directly into the op_call arguments. At the same time, convert to the newly introduced tcg_out_call backend function, rather than invoking tcg_out_op for the call. Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-04-28tcg: Add INDEX_op_trunc_shr_i32Richard Henderson1-0/+16
Let the backend do something special for truncation. Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-04-18tcg: Fix out of range shift in deposit optimizationsRichard Henderson1-6/+4
By inspection, for a deposit(x, y, 0, 64), we'd have a shift of (1<<64) and everything else falls apart. But we can reuse the existing deposit logic to get this right. Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-04-18tcg: Mask shift quantities while foldingRichard Henderson1-15/+20
The TCG result would be undefined, but we can at least produce one plausible result and avoid triggering the wrath of analysis tools. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17tcg/optimize: Add more identity simplificationsRichard Henderson1-15/+24
Recognize 0 operand to andc, and -1 operands to and, orc, eqv. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0Richard Henderson1-0/+1
Like we already do for SUB and XOR. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17tcg/optimize: Simply some logical ops to NOTRichard Henderson1-0/+57
Given, of course, an appropriate constant. These could be generated from the "canonical" operation for inversion on the guest, or via other optimizations. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17tcg/optimize: Handle known-zeros masks for ANDCRichard Henderson1-0/+11
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17tcg/optimize: add known-zero bits compute for load opsAurelien Jarno1-1/+25
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17tcg/optimize: improve known-zero bits for 32-bit opsAurelien Jarno1-0/+6
The shl_i32 op might set some bits of the unused 32 high bits of the mask. Fix that by clearing the unused 32 high bits for all 32-bit ops except load/store which operate on tl values. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17tcg/optimize: fix known-zero bits optimizationAurelien Jarno1-1/+7
Known-zero bits optimization is a great idea that helps to generate more optimized code. However the current implementation only works in very few cases as the computed mask is not saved. Fix this to make it really working. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17tcg/optimize: fix known-zero bits for right shift opsAurelien Jarno1-5/+14
32-bit versions of sar and shr ops should not propagate known-zero bits from the unused 32 high bits. For sar it could even lead to wrong code being generated. Cc: qemu-stable@nongnu.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25misc: Use new rotate functionsStefan Weil1-8/+4
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2013-09-02tcg: Constant fold div, remRichard Henderson1-0/+23
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02tcg: Add muluh and mulsh opcodesRichard Henderson1-0/+20
Use them in places where mulu2 and muls2 are used. Optimize mulx2 with dead low part to mulxh. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-05-09tcg/optimize: fix setcond2 optimizationAurelien Jarno1-0/+1
When setcond2 is rewritten into setcond, the state of the destination temp should be reset, so that a copy of the previous value is not used instead of the result. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-03-23tcg-optimize: Fold sub r,0,x to neg r,xRichard Henderson1-1/+33
Cc: Blue Swirl <blauwirbel@gmail.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23tcg: Add signed multiword multiplication operationsRichard Henderson1-0/+1
Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23tcg: Add 64-bit multiword arithmetic operationsRichard Henderson1-2/+2
Matching the 32-bit multiword arithmetic that we already have. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>