aboutsummaryrefslogtreecommitdiff
path: root/tcg
AgeCommit message (Collapse)AuthorFilesLines
2015-08-28s390: fix softmmu compilationLaurent Vivier1-2/+2
guest_base must be used only in linux-user mode. Signed-off-by: Laurent Vivier <laurent@vivier.eu> Message-id: 1440757421-9674-1-git-send-email-laurent@vivier.eu Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-08-24linux-user: remove useless macros GUEST_BASE and RESERVED_VALaurent Vivier8-61/+49
As we have removed CONFIG_USE_GUEST_BASE, we always use a guest base and the macros GUEST_BASE and RESERVED_VA become useless: replace them by their values. Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <1440420834-8388-1-git-send-email-laurent@vivier.eu> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24linux-user: remove --enable-guest-base/--disable-guest-baseLaurent Vivier5-18/+8
All tcg host architectures now support the guest base and as there is no real performance lost, it can be always enabled. Anyway, guest base use can be disabled lively by setting guest base to 0. CONFIG_USE_GUEST_BASE is defined as (USE_GUEST_BASE && USER_ONLY), it should have to be replaced by CONFIG_USER_ONLY in non CONFIG_USER_ONLY parts, but as some other parts are using !CONFIG_SOFTMMU I have chosen to use !CONFIG_SOFTMMU instead. Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <1440373328-9788-2-git-send-email-laurent@vivier.eu> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/aarch64: Use softmmu fast path for unaligned accessesRichard Henderson1-13/+24
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/s390: Use softmmu fast path for unaligned accessesRichard Henderson1-5/+21
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/ppc: Improve unaligned load/store handling on 64-bit backendBenjamin Herrenschmidt1-10/+31
Currently, we get to the slow path for any unaligned access in the backend, because we effectively preserve the bottom address bits below the alignment requirement when comparing with the TLB entry, so any non-0 bit there will cause the compare to fail. For the same number of instructions, we can instead add the access size - 1 to the address and stick to clearing all the bottom bits. That means that normal unaligned accesses will not fallback (the HW will handle them fine). Only when crossing a page boundary well we end up having a mismatch because we'll end up pointing to the next page which cannot possibly be in that same TLB entry. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Message-Id: <1437455978.5809.2.camel@kernel.crashing.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/i386: use softmmu fast path for unaligned accessesAurelien Jarno1-9/+13
Softmmu unaligned load/stores currently goes through through the slow path for two reasons: - to support unaligned access on host with strict alignement - to correctly handle accesses crossing pages x86 is only concerned by the second reason. Unaligned accesses are avoided by compilers, but are not uncommon. We therefore would like to see them going through the fast path, if they don't cross pages. For that we can use the fact that two adjacent TLB entries can't contain the same page. Therefore accessing the TLB entry corresponding to the first byte, but comparing its content to page address of the last byte ensures that we don't cross pages. We can do this check without adding more instructions in the TLB code (but increasing its length by one byte) by using the LEA instruction to combine the existing move with the size addition. On an x86-64 host, this gives a 3% boot time improvement for a powerpc guest and 4% for an x86-64 guest. [rth: Tidied calculation of the offset mask] Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1436467197-2183-1-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg: Remove tcg_gen_trunc_i64_i32Richard Henderson1-7/+2
Replacing it with tcg_gen_extrl_i64_i32. Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32Richard Henderson14-53/+71
Rather than allow arbitrary shift+trunc, only concern ourselves with low and high parts. This is all that was being used anyway. Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg: update README about size changing opsAurelien Jarno1-3/+15
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 opsAurelien Jarno1-0/+13
They behave the same as ext32s_i64 and ext32u_i64 from the constant folding and zero propagation point of view, except that they can't be replaced by a mov, so we don't compute the affected value. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg: implement real ext_i32_i64 and extu_i32_i64 opsAurelien Jarno9-8/+41
Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a 32-bit value is always converted to a 64-bit value and not propagated through the register allocator or the optimizer. Cc: Andrzej Zaborowski <balrogg@gmail.com> Cc: Alexander Graf <agraf@suse.de> Cc: Blue Swirl <blauwirbel@gmail.com> Cc: Stefan Weil <sw@weilnetz.de> Acked-by: Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32Aurelien Jarno1-2/+2
The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and returns a 32-bit value. Directly call tcg_gen_op3 with the correct types instead of calling tcg_gen_op3i_i32 and abusing the TCG types. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg: rename trunc_shr_i32 into trunc_shr_i64_i32Aurelien Jarno13-18/+18
The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32, and the name in the README doesn't match the name offered to the frontends. Always use the long name to make it clear it is a size changing op. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: allow constant to have copiesAurelien Jarno1-8/+2
Now that copies and constants are tracked separately, we can allow constant to have copies, deferring the choice to use a register or a constant to the register allocation pass. This prevent this kind of regular constant reloading: -OUT: [size=338] +OUT: [size=298] mov -0x4(%r14),%ebp test %ebp,%ebp jne 0x7ffbe9cb0ed6 mov $0x40002219f8,%rbp mov %rbp,(%r14) - mov $0x40002219f8,%rbp mov $0x4000221a20,%rbx mov %rbp,(%rbx) mov $0x4000000000,%rbp mov %rbp,(%r14) - mov $0x4000000000,%rbp mov $0x4000221d38,%rbx mov %rbp,(%rbx) mov $0x40002221a8,%rbp mov %rbp,(%r14) - mov $0x40002221a8,%rbp mov $0x4000221d40,%rbx mov %rbp,(%rbx) mov $0x4000019170,%rbp mov %rbp,(%r14) - mov $0x4000019170,%rbp mov $0x4000221d48,%rbx mov %rbp,(%rbx) mov $0x40000049ee,%rbp mov %rbp,0x80(%r14) mov %r14,%rdi callq 0x7ffbe99924d0 mov $0x4000001680,%rbp mov %rbp,0x30(%r14) mov 0x10(%r14),%rbp mov $0x4000001680,%rbp mov %rbp,0x30(%r14) mov 0x10(%r14),%rbp shl $0x20,%rbp mov (%r14),%rbx mov %ebx,%ebx mov %rbx,(%r14) or %rbx,%rbp mov %rbp,0x10(%r14) mov %rbp,0x90(%r14) mov 0x60(%r14),%rbx mov %rbx,0x38(%r14) mov 0x28(%r14),%rbx mov $0x4000220e60,%r12 mov %rbx,(%r12) mov $0x40002219c8,%rbx mov %rbp,(%rbx) mov 0x20(%r14),%rbp sub $0x8,%rbp mov $0x4000004a16,%rbx mov %rbx,0x0(%rbp) mov %rbp,0x20(%r14) mov $0x19,%ebp mov %ebp,0xa8(%r14) mov $0x4000015110,%rbp mov %rbp,0x80(%r14) xor %eax,%eax jmpq 0x7ffbebcae426 lea -0x5f6d72a(%rip),%rax # 0x7ffbe3d437b3 jmpq 0x7ffbebcae426 Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: track const/copy status separatelyAurelien Jarno1-28/+14
Instead of using an enum which could be either a copy or a const, track them separately. This will be used in the next patch. Constants are tracked through a bool. Copies are tracked by initializing temp's next_copy and prev_copy to itself, allowing to simplify the code a bit. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: add temp_is_const and temp_is_copy functionsAurelien Jarno1-71/+60
Add two accessor functions temp_is_const and temp_is_copy, to make the code more readable and make code change easier. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: optimize temps trackingAurelien Jarno1-11/+32
The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate vector registers on most guests, it's not uncommon to have more than 100 used temps. This means we have initialize more than 2kB at least twice per TB, often more when there is a few goto_tb. Instead used a TCGTempSet bit array to track which temps are in used in the current basic block. This means there are only around 16 bytes to initialize. This improves the boot time of a MIPS guest on an x86-64 host by around 7% and moves out tcg_optimize from the the top of the profiler list. [rth: Handle TCG_CALL_DUMMY_ARG] Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24tcg/optimize: fix constant signednessAurelien Jarno1-5/+5
By convention, on a 64-bit host TCG internally stores 32-bit constants as sign-extended. This is not the case in the optimizer when a 32-bit constant is folded. This doesn't seem to have more consequences than suboptimal code generation. For instance the x86 backend assumes sign-extended constants, and in some rare cases uses a 32-bit unsigned immediate 0xffffffff instead of a 8-bit signed immediate 0xff for the constant -1. This is with a ppc guest: before ------ ---- 0x9f29cc movi_i32 tmp1,$0xffffffff movi_i32 tmp2,$0x0 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2 mov_i32 r10,tmp0 0x7fd8c7dfe90c: xor %ebp,%ebp 0x7fd8c7dfe90e: mov %ebp,%r11d 0x7fd8c7dfe911: mov 0x18(%r14),%r9d 0x7fd8c7dfe915: add %r9d,%r10d 0x7fd8c7dfe918: adc %ebp,%r11d 0x7fd8c7dfe91b: add $0xffffffff,%r10d 0x7fd8c7dfe922: adc %ebp,%r11d 0x7fd8c7dfe925: mov %r11d,0x134(%r14) 0x7fd8c7dfe92c: mov %r10d,0x28(%r14) after ----- ---- 0x9f29cc movi_i32 tmp1,$0xffffffffffffffff movi_i32 tmp2,$0x0 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2 mov_i32 r10,tmp0 0x7f37010d490c: xor %ebp,%ebp 0x7f37010d490e: mov %ebp,%r11d 0x7f37010d4911: mov 0x18(%r14),%r9d 0x7f37010d4915: add %r9d,%r10d 0x7f37010d4918: adc %ebp,%r11d 0x7f37010d491b: add $0xffffffffffffffff,%r10d 0x7f37010d491f: adc %ebp,%r11d 0x7f37010d4922: mov %r11d,0x134(%r14) 0x7f37010d4929: mov %r10d,0x28(%r14) Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1436544211-2769-2-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-01tcg/mips: fix add2Aurelien Jarno1-0/+3
The add2 code in the tcg_out_addsub2 function doesn't take into account the case where rl == al == bl. In that case we can't compute the carry after the addition. As it corresponds to a multiplication by 2, the carry bit is the bit 31. While this is a corner case, this prevents x86-64 guests to boot on a MIPS host. Cc: qemu-stable@nongnu.org Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2015-08-01tcg/s390x: Mask TCGMemOp appropriately for indexingAurelien Jarno1-2/+2
Commit 2b7ec66f fixed TCGMemOp masking following the MO_AMASK addition, but two cases were forgotten in the TCG S390 backend. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2015-08-01tcg/mips: Mask TCGMemOp appropriately for indexingAurelien Jarno1-2/+2
Commit 2b7ec66f fixed TCGMemOp masking following the MO_AMASK addition, but two cases were forgotten in the TCG MIPS backend. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2015-08-01tcg/mips: fix TLB loading for BE host with 32-bit guestsAurelien Jarno1-1/+3
For 32-bit guest, we load a 32-bit address from the TLB, so there is no need to compensate for the low or high part. This fixes 32-bit guests on big-endian hosts. Cc: qemu-stable@nongnu.org Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2015-07-27tcg: mark temps as mem_coherent = 0 for mov with a constantAurelien Jarno1-0/+1
When a constant has to be loaded in a mov op, we fail to set mem_coherent = 0. This patch fixes that. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1437994568-7825-3-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-27tcg: correctly mark dead inputs for mov with a constantAurelien Jarno1-0/+3
When tcg_reg_alloc_mov propagate a constant, we failed to correctly mark a temp as dead if the liveness analysis hints so. This fixes the following assert when configure with --enable-debug-tcg: qemu-x86_64: tcg/tcg.c:1827: tcg_reg_alloc_bb_end: Assertion `ts->val_type == TEMP_VAL_DEAD' failed. Cc: Richard Henderson <rth@twiddle.net> Reported-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1437994568-7825-2-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23tcg/optimize: fix tcg_opt_gen_moviAurelien Jarno1-1/+1
Due to a copy&paste, the new op value is tested against mov_i32 instead of movi_i32. The test is therefore always false. Fix that. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1436544211-2769-1-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23tcg/aarch64: use 32-bit offset for 32-bit softmmu emulationRichard Henderson1-6/+6
Similar to the same fix for user-mode, except this instance occurs on the softmmu path. Again, the tlb addend must be the base register, while the guest address is the index. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23tcg/aarch64: use 32-bit offset for 32-bit user-mode emulationPaolo Bonzini1-10/+16
Thanks to the previous patch, it is now easy for tcg_out_qemu_ld and tcg_out_qemu_st to use a 32-bit zero extended offset. However, the guest base register x28 must be the base and addr_reg must be the index. Reported-by: Leon Alrae <leon.alrae@imgtec.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1436974021-28978-3-git-send-email-pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23tcg/aarch64: add ext argument to tcg_out_insn_3310Paolo Bonzini1-19/+22
The new argument lets you pick uxtw or uxtx mode for the offset register. For now, all callers pass TCG_TYPE_I64 so that uxtx is generated. The bits for uxtx are removed from I3312_TO_I3310. Reported-by: Leon Alrae <leon.alrae@imgtec.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1436974021-28978-2-git-send-email-pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23tcg/i386: Extend addresses for 32-bit guestsRichard Henderson1-42/+72
Removing the ??? comment explaining why it (mostly) worked. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1437081950-7206-2-git-send-email-rth@twiddle.net>
2015-07-13tci: Fix regression with INDEX_op_qemu_st_i32, INDEX_op_qemu_st_i64Stefan Weil1-6/+0
Commit 59227d5d45bb3c31dc2118011691c35b3c00879c did not update the code in tcg/tci/tcg-target.c for those two cases. Signed-off-by: Stefan Weil <sw@weilnetz.de> Message-id: 1436556159-3002-1-git-send-email-sw@weilnetz.de Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-07-09tcg/mips: Fix build error from merged memop+mmu_idx parameterJames Hogan1-2/+2
Commit 3972ef6f830d ("tcg: Push merged memop+mmu_idx parameter to softmmu routines") caused the following build errors when building TCG for MIPS: In file included from tcg/tcg.c:258:0: tcg/mips/tcg-target.c In function ‘tcg_out_qemu_ld_slow_path’: tcg/mips/tcg-target.c:1015:22: error: ‘lb’ undeclared (first use in this function) tcg/mips/tcg-target.c In function ‘tcg_out_qemu_st_slow_path’: tcg/mips/tcg-target.c:1058:22: error: ‘lb’ undeclared (first use in this function) It looks like lb was meant to refer to the TCGLabelQemuLdst *l parameter, so fix both references to lb to refer to just l. Fixes: 3972ef6f830d ("tcg: Push merged memop+mmu_idx parameter to softmmu routines") Signed-off-by: James Hogan <james.hogan@imgtec.com> Reviewed-by: Leon Alrae <leon.alrae@imgtec.com> Message-id: 1436433435-24898-2-git-send-email-james.hogan@imgtec.com Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Leon Alrae <leon.alrae@imgtec.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-07-07tcg/s390: fix branch target change during code retranslationAurelien Jarno1-4/+8
Make sure to not modify the branch target. This ensure that the branch target is not corrupted during partial retranslation. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Tested-by: Alexander Graf <agraf@suse.de> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-26cpu-defs: Move CPU_TEMP_BUF_NLONGS to tcgPeter Crosthwaite1-0/+2
The usages of this define are pure TCG and there is no architecture specific variation of the value. Localise it to the TCG engine to remove another architecture agnostic piece from cpu-defs.h. This follows on from a28177820a868eafda8fab007561cc19f41941f4 where temp_buf was moved out of the CPU_COMMON obsoleting the need for the super early definition. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <498e8e5325c1a1aff79e5bcfc28cb760ef6b214e.1433052532.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-09tcg/optimize: rename tcg_constant_foldingAurelien Jarno1-6/+1
The tcg_constant_folding folding ends up doing all the optimizations (which is a good thing to avoid looping on all ops multiple time), so make it clear and just rename it tcg_optimize. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-6-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg/optimize: fold constant test in tcg_opt_gen_movAurelien Jarno1-53/+36
Most of the calls to tcg_opt_gen_mov are preceeded by a test to check if the source temp is a constant. Fold that into the tcg_opt_gen_mov function. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433495958-9508-1-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg/optimize: fold temp copies test in tcg_opt_gen_movAurelien Jarno1-18/+9
Each call to tcg_opt_gen_mov is preceeded by a test to check if the source and destination temps are copies. Fold that into the tcg_opt_gen_mov function. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-4-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg/optimize: remove opc argument from tcg_opt_gen_movAurelien Jarno1-7/+7
We can get the opcode using the TCGOp pointer. It needs to be dereferenced, but it's anyway done a few lines below to write the new value. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-3-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg/optimize: remove opc argument from tcg_opt_gen_moviAurelien Jarno1-20/+20
We can get the opcode using the TCGOp pointer. It needs to be dereferenced, but it's anyway done a few lines below to write the new value. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-2-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg: fix dead computation for repeated input argumentsAurelien Jarno1-3/+11
When the same temp is used twice or more as an input argument to a TCG instruction, the dead computation code doesn't recognize the second use as a dead temp. This is because the temp is marked as live in the same loop where dead inputs are checked. The fix is to split the loop in two parts. This avoid emitting a move and using a register for the movcond instruction when used as "move if true" on x86-64. This might bring more improvements on RISC TCG targets which don't have outputs aliased to inputs. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447228-29425-3-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg: fix register allocation with two aliased dead inputsAurelien Jarno1-0/+10
For TCG ops with two outputs registers (add2, sub2, div2, div2u), when the same input temp is used for the two inputs aliased to the two outputs, and when these inputs are both dead, the register allocation code wrongly assigned the same register to the same output. This happens for example with sub2 t1, t2, t3, t3, t4, t5, when t3 is not used anymore after the TCG op. In that case the same register is used for t1, t2 and t3. The fix is to look for already allocated aliased input when allocating a dead aliased input and check that the register is not already used. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447228-29425-2-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg: Handle MO_AMASK in tcg_dump_opsRichard Henderson1-3/+12
Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com> Tested-by: Yongbok Kim <yongbok.kim@imgtec.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09tcg: Mask TCGMemOp appropriately for indexingRichard Henderson7-30/+30
The addition of MO_AMASK means that places that used inverted masks need to be changed to use positive masks, and places that failed to mask the intended bits need updating. Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com> Tested-by: Yongbok Kim <yongbok.kim@imgtec.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-03tcg: add TCG_TARGET_TLB_DISPLACEMENT_BITSPaolo Bonzini9-0/+10
This will be used to size the TLB when more than 8 MMU modes are used by the target. Limitations come from the limited size of the immediate fields (which sometimes, as in the case of Aarch64, extend to instructions that shift the immediate). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1424436345-37924-2-git-send-email-pbonzini@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03tci: do not use CPUArchState in tcg-target.hPaolo Bonzini2-3/+4
tcg-target.h does not use any QEMU-specific symbols, save for tci's usage of CPUArchState. Pull that up to tcg/tcg.h. This will make it possible to include tcg-target.h in cpu-defs.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-05-14tcg: Add MO_ALIGN, MO_UNALNRichard Henderson1-0/+13
These modifiers control, on a per-memory-op basis, whether unaligned memory accesses are allowed. The default setting reflects the target's definition of ALIGNED_ONLY. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-05-14tcg: Push merged memop+mmu_idx parameter to softmmu routinesRichard Henderson10-113/+112
The extra information is not yet used but it is now available. This requires minor changes through all of the tcg backends. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-05-14tcg: Merge memop and mmu_idx parameters to qemu_ld/stRichard Henderson14-61/+130
At the tcg opcode level, not at the tcg-op.h generator level. This requires minor changes through all of the tcg backends, but none of the cpu translators. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-05-05tcg: optimise memory layout of TCGTempEmilio G. Cota1-12/+14
This brings down the size of the struct from 56 to 32 bytes on 64-bit, and to 20 bytes on 32-bit. This leads to memory savings: Before: $ find . -name 'tcg.o' | xargs size text data bss dec hex filename 41131 29800 88 71019 1156b ./aarch64-softmmu/tcg/tcg.o 37969 29416 96 67481 10799 ./x86_64-linux-user/tcg/tcg.o 39354 28816 96 68266 10aaa ./arm-linux-user/tcg/tcg.o 40802 29096 88 69986 11162 ./arm-softmmu/tcg/tcg.o 39417 29672 88 69177 10e39 ./x86_64-softmmu/tcg/tcg.o After: $ find . -name 'tcg.o' | xargs size text data bss dec hex filename 40883 29800 88 70771 11473 ./aarch64-softmmu/tcg/tcg.o 37473 29416 96 66985 105a9 ./x86_64-linux-user/tcg/tcg.o 38858 28816 96 67770 108ba ./arm-linux-user/tcg/tcg.o 40554 29096 88 69738 1106a ./arm-softmmu/tcg/tcg.o 39169 29672 88 68929 10d41 ./x86_64-softmmu/tcg/tcg.o Note that using an entire byte for some enums that need less than that wastes a few bits (noticeable in 32 bits, where we use 20 bytes instead of 16) but avoids extraction code, which overall is a win--I've tested several variations of the patch, and the appended is the best performer for OpenSSL's bntest by a very small margin: Before: $ taskset -c 0 perf stat -r 15 -- x86_64-linux-user/qemu-x86_64 img/bntest-x86_64 >/dev/null [...] Performance counter stats for 'x86_64-linux-user/qemu-x86_64 img/bntest-x86_64' (15 runs): 10538.479833 task-clock (msec) # 0.999 CPUs utilized ( +- 0.38% ) 772 context-switches # 0.073 K/sec ( +- 2.03% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 2,207 page-faults # 0.209 K/sec ( +- 0.08% ) 10.552871687 seconds time elapsed ( +- 0.39% ) After: $ taskset -c 0 perf stat -r 15 -- x86_64-linux-user/qemu-x86_64 img/bntest-x86_64 >/dev/null Performance counter stats for 'x86_64-linux-user/qemu-x86_64 img/bntest-x86_64' (15 runs): 10459.968847 task-clock (msec) # 0.999 CPUs utilized ( +- 0.30% ) 739 context-switches # 0.071 K/sec ( +- 1.71% ) 0 cpu-migrations # 0.000 K/sec ( +- 68.14% ) 2,204 page-faults # 0.211 K/sec ( +- 0.10% ) 10.473900411 seconds time elapsed ( +- 0.30% ) Suggested-by: Stefan Weil <sw@weilnetz.de> Suggested-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-04-30tcg: Delete unused cpu_pc_from_tb()Peter Crosthwaite1-2/+4
No code uses the cpu_pc_from_tb() function. Delete from tricore and arm which each provide an unused implementation. Update the comment in tcg.h to reflect that this is obsoleted by synchronize_from_tb. Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>