aboutsummaryrefslogtreecommitdiff
path: root/cpu-exec.c
AgeCommit message (Collapse)AuthorFilesLines
2016-06-11tb hash: track translated blocks with qhtEmilio G. Cota1-43/+43
Having a fixed-size hash table for keeping track of all translation blocks is suboptimal: some workloads are just too big or too small to get maximum performance from the hash table. The MRU promotion policy helps improve performance when the hash table is a little undersized, but it cannot make up for severely undersized hash tables. Furthermore, frequent MRU promotions result in writes that are a scalability bottleneck. For scalability, lookups should only perform reads, not writes. This is not a big deal for now, but it will become one once MTTCG matures. The appended fixes these issues by using qht as the implementation of the TB hash table. This solution is superior to other alternatives considered, namely: - master: implementation in QEMU before this patchset - xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU. - xxhash-rcu: fixed buckets + xxhash + RCU list + MRU. MRU is implemented here by adding an intermediate struct that contains the u32 hash and a pointer to the TB; this allows us, on an MRU promotion, to copy said struct (that is not at the head), and put this new copy at the head. After a grace period, the original non-head struct can be eliminated, and after another grace period, freed. - qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize + no MRU for lookups; MRU for inserts. The appended solution is the following: - qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize + no MRU for lookups; MRU for inserts. The plots below compare the considered solutions. The Y axis shows the boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis sweeps the number of buckets (or initial number of buckets for qht-autoresize). The plots in PNG format (and with errorbars) can be seen here: http://imgur.com/a/Awgnq Each test runs 5 times, and the entire QEMU process is pinned to a single core for repeatability of results. Host: Intel Xeon E5-2690 28 ++------------+-------------+-------------+-------------+------------++ A***** + + + master **A*** + 27 ++ * xxhash ##B###++ | A******A****** xxhash-rcu $$C$$$ | 26 C$$ A******A****** qht-fixed-nomru*%%D%%%++ D%%$$ A******A******A*qht-dyn-mru A*E****A 25 ++ %%$$ qht-dyn-nomru &&F&&&++ B#####% | 24 ++ #C$$$$$ ++ | B### $ | | ## C$$$$$$ | 23 ++ # C$$$$$$ ++ | B###### C$$$$$$ %%%D 22 ++ %B###### C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C | D%%%%%%B###### @E@@@@@@ %%%D%%%@@@E@@@@@@E 21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B + E@@@ F&&& + E@ + F&&& + + 20 ++------------+-------------+-------------+-------------+------------++ 14 16 18 20 22 24 log2 number of buckets Host: Intel i7-4790K 14.5 ++------------+------------+-------------+------------+------------++ A** + + + master **A*** + 14 ++ ** xxhash ##B###++ 13.5 ++ ** xxhash-rcu $$C$$$++ | qht-fixed-nomru %%D%%% | 13 ++ A****** qht-dyn-mru @@E@@@++ | A*****A******A****** qht-dyn-nomru &&F&&& | 12.5 C$$ A******A******A*****A****** ***A 12 ++ $$ A*** ++ D%%% $$ | 11.5 ++ %% ++ B### %C$$$$$$ | 11 ++ ## D%%%%% C$$$$$ ++ | # % C$$$$$$ | 10.5 F&&&&&&B######D%%%%% C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$ $$$C 10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B + F&& D%%%%%%B######B######B#####B###@@@D%%% + 9.5 ++------------+------------+-------------+------------+------------++ 14 16 18 20 22 24 log2 number of buckets Note that the original point before this patch series is X=15 for "master"; the little sensitivity to the increased number of buckets is due to the poor hashing function in master. xxhash-rcu has significant overhead due to the constant churn of allocating and deallocating intermediate structs for implementing MRU. An alternative would be do consider failed lookups as "maybe not there", and then acquire the external lock (tb_lock in this case) to really confirm that there was indeed a failed lookup. This, however, would not be enough to implement dynamic resizing--this is more complex: see "Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming" by Triplett, McKenney and Walpole. This solution was discarded due to the very coarse RCU read critical sections that we have in MTTCG; resizing requires waiting for readers after every pointer update, and resizes require many pointer updates, so this would quickly become prohibitive. qht-fixed-nomru shows that MRU promotion is advisable for undersized hash tables. However, qht-dyn-mru shows that MRU promotion is not important if the hash table is properly sized: there is virtually no difference in performance between qht-dyn-nomru and qht-dyn-mru. Before this patch, we're at X=15 on "xxhash"; after this patch, we're at X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we can achieve with optimum sizing of the hash table, while keeping the hash table scalable for readers. The improvement we get before and after this patch for booting debian jessie with arm-softmmu is: - Intel Xeon E5-2690: 10.5% less time - Intel i7-4790K: 5.2% less time We could get this same improvement _for this particular workload_ by statically increasing the size of the hash table. But this would hurt workloads that do not need a large hash table. The dynamic (upward) resizing allows us to start small and enlarge the hash table as needed. A quick note on downsizing: the table is resized back to 2**15 buckets on every tb_flush; this makes sense because it is not guaranteed that the table will reach the same number of TBs later on (e.g. most bootup code is thrown away after boot); it makes sense to grow the hash table as more code blocks are translated. This also avoids the complication of having to build downsizing hysteresis logic into qht. Reviewed-by: Sergey Fedorov <serge.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-15-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11tb hash: hash phys_pc, pc, and flags with xxhashEmilio G. Cota1-2/+2
For some workloads such as arm bootup, tb_phys_hash is performance-critical. The is due to the high frequency of accesses to the hash table, originated by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's. More info: https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html To dig further into this I modified an arm image booting debian jessie to immediately shut down after boot. Analysis revealed that quite a bit of time is unnecessarily spent in tb_phys_hash: the cause is poor hashing that results in very uneven loading of chains in the hash table's buckets; the longest observed chain had ~550 elements. The appended addresses this with two changes: 1) Use xxhash as the hash table's hash function. xxhash is a fast, high-quality hashing function. 2) Feed the hashing function with not just tb_phys, but also pc and flags. This improves performance over using just tb_phys for hashing, since that resulted in some hash buckets having many TB's, while others getting very few; with these changes, the longest observed chain on a single hash bucket is brought down from ~550 to ~40. Tests show that the other element checked for in tb_find_physical, cs_base, is always a match when tb_phys+pc+flags are a match, so hashing cs_base is wasteful. It could be that this is an ARM-only thing, though. UPDATE: On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote: > The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB > consisting of only a delay slot). > It may well still turn out to be reasonable to ignore cs_base for hashing. BTW, after this change the hash table should not be called "tb_hash_phys" anymore; this is addressed later in this series. This change gives consistent bootup time improvements. I tested two host machines: - Intel Xeon E5-2690: 11.6% less time - Intel i7-4790K: 19.2% less time Increasing the number of hash buckets yields further improvements. However, using a larger, fixed number of buckets can degrade performance for other workloads that do not translate as many blocks (600K+ for debian-jessie arm bootup). This is dealt with later in this series. Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-8-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-26cpu-exec: Fix direct jump to TB spanning pageSergey Fedorov1-0/+9
It is not safe to make a direct jump to a TB spanning two pages in system emulation because the mapping for the second page can get changed but we don't take care of direct jumps in this case. However in user mode emulation, this is not the case because there's only static address translation and TBs are always invalidated properly. Fixes: 5b053a4a2827 ("tcg: Clean up direct block chaining safety checks") Reported-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Tested-by: Max Filippov <jcmvbkbc@gmail.com> Message-id: 1463404380-29302-1-git-send-email-sergey.fedorov@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-19cpu: move exec-all.h inclusion out of cpu.hPaolo Bonzini1-0/+1
exec-all.h contains TCG-specific definitions. It is not needed outside TCG-specific files such as translate.c, exec.c or *helper.c. One generic function had snuck into include/exec/exec-all.h; move it to include/qom/cpu.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-12cpu-exec: Clean up 'interrupt_request' reloading in cpu_handle_interrupt()Sergey Fedorov1-3/+4
Suggested-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Message-Id: <1463071937-26607-1-git-send-email-sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec()Sergey Fedorov1-12/+0
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <1462962111-32237-6-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12cpu-exec: Move TB execution stuff out of cpu_exec()Sergey Fedorov1-55/+64
Simplify cpu_exec() by extracting TB execution code outside of cpu_exec() into a new static inline function cpu_loop_exec_tb(). Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <1462962111-32237-5-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12cpu-exec: Move interrupt handling out of cpu_exec()Sergey Fedorov1-62/+70
Simplify cpu_exec() by extracting interrupt handling code outside of cpu_exec() into a new static inline function cpu_handle_interrupt(). Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <1462962111-32237-4-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12cpu-exec: Move exception handling out of cpu_exec()Sergey Fedorov1-41/+52
Simplify cpu_exec() by extracting exception handling code out of cpu_exec() into a new static inline function cpu_handle_exception(). Also make cpu_handle_debug_exception() inline as it is used only once. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <1462962111-32237-3-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12cpu-exec: Move halt handling out of cpu_exec()Sergey Fedorov1-14/+24
Simplify cpu_exec() by extracting CPU halt state handling code out of cpu_exec() into a new static inline function cpu_handle_halt(). Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <1462962111-32237-2-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12cpu-exec: Remove relic orphaned commentSergey Fedorov1-2/+0
This comment should have been deleted by commit 0ac087f1f3ae ("removed unused code") but somehow it is still here. There's no point to keep it. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Message-Id: <1462286050-21778-1-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12tcg: Remove needless CPUState::current_tbSergey Fedorov1-4/+0
This field was used for telling cpu_interrupt() to unlink a chain of TBs being executed when it worked that way. Now, cpu_interrupt() don't do this anymore. So we don't need this field anymore. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Message-Id: <1462273462-14036-1-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12cpu-exec: Move TB chaining into tb_find_fast()Sergey Fedorov1-16/+19
Move tb_add_jump() call and surrounding code from cpu_exec() into tb_find_fast(). That simplifies cpu_exec() a little by hiding the direct chaining optimization details into tb_find_fast(). It also allows to move tb_lock()/tb_unlock() pair into tb_find_fast(), putting it closer to tb_find_slow() which also manipulates the lock. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net> [rth: Fixed rebase typo in nochain test.]
2016-05-12tcg: Rework tb_invalidated_flagSergey Fedorov1-10/+11
'tb_invalidated_flag' was meant to catch two events: * some TB has been invalidated by tb_phys_invalidate(); * the whole translation buffer has been flushed by tb_flush(). Then it was checked: * in cpu_exec() to ensure that the last executed TB can be safely linked to directly call the next one; * in cpu_exec_nocache() to decide if the original TB should be provided for further possible invalidation along with the temporarily generated TB. It is always safe to patch an invalidated TB since it is not going to be used anyway. It is also safe to call tb_phys_invalidate() for an already invalidated TB. Thus, setting this flag in tb_phys_invalidate() is simply unnecessary. Moreover, it can prevent from pretty proper linking of TBs, if any arbitrary TB has been invalidated. So just don't touch it in tb_phys_invalidate(). If this flag is only used to catch whether tb_flush() has been called then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using only 'true' and 'false' to set its value. Also, instead of setting it in tb_gen_code(), just after tb_flush() has been called, do it right inside of tb_flush(). In cpu_exec(), this flag is used to track if tb_flush() has been called and have made 'next_tb' (a reference to the last executed TB) invalid for linking it to directly call the next TB. tb_flush() can be called during the CPU execution loop from tb_gen_code(), during TB execution or by another thread while 'tb_lock' is released. Catch for translation buffer flush reliably by resetting this flag once before first TB lookup and each time we find it set before trying to add a direct jump. Don't touch in in tb_find_physical(). Each vCPU has its own execution loop in multithreaded mode and thus should have its own copy of the flag to be able to reset it with its own 'next_tb' and don't affect any other vCPU execution thread. So make this flag per-vCPU and move it to CPUState. In cpu_exec_nocache(), we only need to check if tb_flush() has been called from tb_gen_code() called by cpu_exec_nocache() itself. To do this reliably, preserve the old value of the flag, reset it before calling tb_gen_code(), check afterwards, and combine the saved value back to the flag. This patch is based on the patch "tcg: move tb_invalidated_flag to CPUState" from Paolo Bonzini <pbonzini@redhat.com>. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12tcg: Clean up from 'next_tb'Sergey Fedorov1-27/+32
The value returned from tcg_qemu_tb_exec() is the value passed to the corresponding tcg_gen_exit_tb() at translation time of the last TB attempted to execute. It is a little confusing to store it in a variable named 'next_tb'. In fact, it is a combination of 4-byte aligned pointer and additional information in its two least significant bits. Break it down right away into two variables named 'last_tb' and 'tb_exit' which are a pointer to the last TB attempted to execute and the TB exit reason, correspondingly. This simplifies the code and improves its readability. Correct a misleading documentation comment for tcg_qemu_tb_exec() and fix logging in cpu_tb_exec(). Also rename a misleading 'next_tb' in another couple of places. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12cpu-exec: elide more icount code if CONFIG_USER_ONLYPaolo Bonzini1-0/+8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Alex Bennée: #ifndef replay code to match elided functions] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12tcg: reorganize tb_find_physical loopAlex Bennée1-20/+24
Put some comments and improve code structure. This should help reading the code. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [Sergey Fedorov: provide commit message; bring back resetting of tb_invalidated_flag] Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12tcg: Clean up direct block chaining safety checksSergey Fedorov1-5/+2
We don't take care of direct jumps when address mapping changes. Thus we must be sure to generate direct jumps so that they always keep valid even if address mapping changes. Luckily, we can only allow to execute a TB if it was generated from the pages which match with current mapping. Document tcg_gen_goto_tb() declaration and note the reason for destination PC limitations. Some targets with variable length instructions allow TB to straddle a page boundary. However, we make sure that both of TB pages match the current address mapping when looking up TBs. So it is safe to do direct jumps into the both pages. Correct the checks for some of those targets. Given that, we can safely patch a TB which spans two pages. Remove the unnecessary check in cpu_exec() and allow such TBs to be patched. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12tb: consistently use uint32_t for tb->flagsEmilio G. Cota1-3/+3
We are inconsistent with the type of tb->flags: usage varies loosely between int and uint64_t. Settle to uint32_t everywhere, which is superior to both: at least one target (aarch64) uses the most significant bit in the u32, and uint64_t is wasteful. Compile-tested for all targets. Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com> Suggested-by: Richard Henderson <rth@twiddle.net> Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1460049562-23517-1-git-send-email-cota@braap.org>
2016-03-22qemu-log: dfilter-ise exec, out_asm, op and opt_opAlex Bennée1-6/+7
This ensures the code generation debug code will honour -dfilter if set. For the "exec" tracing I've added a new inline macro for efficiency's sake. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Aurelien Jarno <aurelien@aureL32.net> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <1458052224-9316-8-git-send-email-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22qemu-log: Improve the "exec" TB execution loggingPeter Maydell1-9/+11
Improve the TB execution logging so that it is easier to identify what is happening from trace logs: * move the "Trace" logging of executed TBs into cpu_tb_exec() so that it is emitted if and only if we actually execute a TB, and for consistency for the CPU state logging * log when we link two TBs together via tb_add_jump() * log when cpu_tb_exec() returns early from a chain of TBs The new style logging looks like this: Trace 0x7fb7cc822ca0 [ffffffc0000dce00] Linking TBs 0x7fb7cc822ca0 [ffffffc0000dce00] index 0 -> 0x7fb7cc823110 [ffffffc0000dce10] Trace 0x7fb7cc823110 [ffffffc0000dce10] Trace 0x7fb7cc823420 [ffffffc000302688] Trace 0x7fb7cc8234a0 [ffffffc000302698] Trace 0x7fb7cc823520 [ffffffc0003026a4] Trace 0x7fb7cc823560 [ffffffc0000dce44] Linking TBs 0x7fb7cc823560 [ffffffc0000dce44] index 1 -> 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc8235d0 [ffffffc0000dce70] Stopped execution of TB chain before 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc8235d0 [ffffffc0000dce70] Trace 0x7fb7cc822fd0 [ffffffc0000dd52c] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [AJB: reword patch title, Abandoned->Stopped] Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <1458052224-9316-6-git-send-email-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-03log: do not unnecessarily include qom/cpu.hPaolo Bonzini1-0/+1
Split the bits that require it to exec/log.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-id: 1452174932-28657-8-git-send-email-den@openvz.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-01-29exec: Clean up includesPeter Maydell1-1/+1
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-4-git-send-email-peter.maydell@linaro.org
2015-11-06Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-replay' into ↵Peter Maydell1-13/+42
staging So here it is, let's see what happens. # gpg: Signature made Fri 06 Nov 2015 09:30:34 GMT using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" * remotes/bonzini/tags/for-upstream-replay: replay: recording of the user input replay: command line options replay: replay blockers for devices replay: initialization and deinitialization replay: ptimer bottom halves: introduce bh call function replay: checkpoints icount: improve counting for record/replay replay: shutdown event replay: recording and replaying clock ticks replay: asynchronous events infrastructure replay: interrupts and exceptions cpu: replay instructions sequence cpu-exec: allow temporary disabling icount replay: introduce icount event replay: introduce mutex to protect the replay log replay: internal functions for replay log replay: global variables and function stubs Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-06replay: interrupts and exceptionsPavel Dovgalyuk1-10/+38
This patch includes modifications of common cpu files. All interrupts and exceptions occured during recording are written into the replay log. These events allow correct replaying the execution by kicking cpu thread when one of these events is found in the log. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20150917162416.8676.57647.stgit@PASHA-ISP.def.inno> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05cpu-exec: allow temporary disabling icountPavel Dovgalyuk1-3/+4
This patch is required for deterministic replay to generate an exception by trying executing an instruction without changing icount. It adds new flag to TB for disabling icount while translating it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20150917162359.8676.77011.stgit@PASHA-ISP.def.inno> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04cpu-exec: Fix compiler warning (-Werror=clobbered)Stefan Weil1-3/+15
Reloading of local variables after sigsetjmp is only needed for some buggy compilers. The code which should reload these variables causes compiler warnings with gcc 4.7 when compiler optimizations are enabled: cpu-exec.c:204:15: error: variable ‘cpu’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] cpu-exec.c:207:15: error: variable ‘cc’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] cpu-exec.c:202:28: error: argument ‘env’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] Now this code is only used for compilers which need it (and gcc 4.5.x, x > 0 which does not need it but won't give warnings). There were bug reports for clang and gcc 4.5.0, while gcc 4.5.1 was reported to work fine without the reload code. For clang it is not clear which versions are affected, so simply keep the status quo for all clang compilations. This can be improved later. Signed-off-by: Stefan Weil <sw@weilnetz.de> Message-Id: <1443266606-21400-1-git-send-email-sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-19cpu-exec: Add "nochain" debug flagRichard Henderson1-1/+2
Respect it to avoid linking TBs together. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-09-25i386: partial revert of interrupt poll fixPavel Dovgalyuk1-0/+9
Processing CPU_INTERRUPT_POLL requests in cpu_has_work functions break the determinism of cpu_exec. This patch is required to make interrupts processing deterministic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20150917162331.8676.15286.stgit@PASHA-ISP.def.inno> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-16cpu-exec: Migrate some generic fns to cpu-exec-commonPeter Crosthwaite1-59/+0
The goal is to split the functions such that cpu-exec is CPU specific content, while cpus-exec-common.c is generic code only. The function interface to cpu-exec needs to be virtualised to prepare support for multi-arch and moving these definitions out saves bloating the QOM interface. So move these definitions out of cpu-exec to a new module, cpu-exec-common. Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <3cefeb3fbbb33031670951a0e74de2778529da3f.1441614289.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-14Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell1-49/+70
* Support for jemalloc * qemu_mutex_lock_iothread "No such process" fix * cutils: qemu_strto* wrappers * iohandler.c simplification * Many other fixes and misc patches. And some MTTCG work (with Emilio's fixes squashed): * Signal-free TCG kick * Removing spinlock in favor of QemuMutex * User-mode emulation multi-threading fixes/docs # gpg: Signature made Thu 10 Sep 2015 09:03:07 BST using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" * remotes/bonzini/tags/for-upstream: (44 commits) cutils: work around platform differences in strto{l,ul,ll,ull} cpu-exec: fix lock hierarchy for user-mode emulation exec: make mmap_lock/mmap_unlock globally available tcg: comment on which functions have to be called with mmap_lock held tcg: add memory barriers in page_find_alloc accesses remove unused spinlock. replace spinlock by QemuMutex. cpus: remove tcg_halt_cond and tcg_cpu_thread globals cpus: protect work list with work_mutex scripts/dump-guest-memory.py: fix after RAMBlock change configure: Add support for jemalloc add macro file for coccinelle configure: factor out adding disas configure vhost-scsi: fix wrong vhost-scsi firmware path checkpatch: remove tests that are not relevant outside the kernel checkpatch: adapt some tests to QEMU CODING_STYLE: update mixed declaration rules qmp: Add example usage of strto*l() qemu wrapper cutils: Add qemu_strtoull() wrapper cutils: Add qemu_strtoll() wrapper ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-09-11cpu-exec: introduce loop exit with restore functionPavel Dovgalyuk1-0/+9
This patch introduces loop exit function, which also restores guest CPU state according to the value of host program counter. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20150710095702.13280.97477.stgit@PASHA-ISP> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-09-09cpu-exec: fix lock hierarchy for user-mode emulationPaolo Bonzini1-18/+53
tb_lock has to be taken inside the mmap_lock (example: tb_invalidate_phys_range is called by target_mmap), but tb_link_page is taking the mmap_lock and it is called with the tb_lock held. To fix this, take the mmap_lock in tb_find_slow, not in tb_link_page. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-09replace spinlock by QemuMutex.KONRAD Frederic1-11/+3
spinlock is only used in two cases: * cpu-exec.c: to protect TranslationBlock * mem_helper.c: for lock helper in target-i386 (which seems broken). It's a pthread_mutex_t in user-mode, so we can use QemuMutex directly, with an #ifdef. The #ifdef will be removed when multithreaded TCG will need the mutex as well. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Message-Id: <1439220437-23957-5-git-send-email-fred.konrad@greensocs.com> Signed-off-by: Emilio G. Cota <cota@braap.org> [Merge Emilio G. Cota's patch to remove volatile. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-09tcg: signal-free qemu_cpu_kickPaolo Bonzini1-1/+1
Signals are slow and do not exist on Win32. The previous patches have done most of the legwork to introduce memory barriers (some of them were even there already for the sake of Windows!) and we can now set the flags directly in the iothread. qemu_cpu_kick_thread is not used anymore on TCG, since the TCG thread is never outside usermode while the CPU is running (not halted). Instead run the content of the signal handler (now in qemu_cpu_kick_no_halt) directly. qemu_cpu_kick_no_halt is also used in qemu_mutex_lock_iothread to avoid the overhead of qemu_cond_broadcast. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-09tcg: synchronize exit_request and tcg_current_cpu accessesPaolo Bonzini1-1/+1
Synchronize the remaining pair of accesses in cpu_signal. These should be necessary on Windows as well, at least in theory. Probably SuspendProcess and ResumeProcess introduce some implicit memory barrier. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-09tcg: synchronize cpu->exit_request and cpu->tcg_exit_req accessesPaolo Bonzini1-1/+5
Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-09tcg: assign cpu->current_tb in a simpler placePaolo Bonzini1-8/+2
TCG has not been reading cpu->current_tb from signal handlers for years. The code that synchronized cpu_exec with the signal handler is not needed anymore. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-09tcg: introduce tcg_current_cpuPaolo Bonzini1-9/+5
This is already useful on Windows in order to remove tls.h, because accesses to current_cpu are done from a different thread on that platform. It will be used on POSIX platforms as soon TCG stops using signals to interrupt the execution of translated code. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-08-14exec: drop cpu_can_do_io, just read cpu->can_do_ioPaolo Bonzini1-1/+1
After commit 626cf8f (icount: set can_do_io outside TB execution, 2014-12-08), can_do_io is set to 1 if not executing code. It is no longer necessary to make this assumption in cpu_can_do_io. It is also possible to remove the use_icount test, simply by never setting cpu->can_do_io to 0 unless use_icount is true. With these changes cpu_can_do_io boils down to a read of cpu->can_do_io. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-08-06cpu-exec: Do not invalidate original TB in cpu_exec_nocache()Sergey Fedorov1-6/+2
Instead of invalidating an original TB in cpu_exec_nocache() prematurely, just save a link to it in the temporary generated TB. If cpu_io_recompile() is raised subsequently from the temporary TB, invalidate the original one as well. That allows reusing the original TB each time cpu_exec_nocache() is called to handle expired instruction counter in icount mode. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Message-Id: <1435656909-29116-1-git-send-email-serge.fdrv@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-09cpu-exec: Purge all uses of ENV_GET_CPU()Peter Crosthwaite1-15/+13
Remove un-needed usages of ENV_GET_CPU() by converting the APIs to use CPUState pointers and retrieving the env_ptr as minimally needed. Scripted conversion for target-* change: for I in target-*/cpu.h; do sed -i \ 's/\(^int cpu_[^_]*_exec(\)[^ ][^ ]* \*s);$/\1CPUState *cpu);/' \ $I; done Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-06-26include/exec: Move tb hash functions outPeter Crosthwaite1-0/+1
This is one of very few things in exec-all with a genuine CPU architecture dependency. Move these hashing helpers to a new header to trim exec-all.h down to a near architecture-agnostic header. The defs are only used by cpu-exec and translate-all which are both arch-obj's so the new tb-hash.h has no core code usage. Reviewed-by: Richard Henderson <rth@redhat.com> Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <9d048b96f7cfa64a4d9c0b88e0dd2877fac51d41.1433052532.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-24Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell1-0/+33
- vhost-scsi: add bootindex property - RCU: fix MemoryRegion lifetime issues in PCI; document the rules; convert of AddressSpaceDispatch and RAMList - KVM: add kvm_exit reasons for aarch64 # gpg: Signature made Mon Feb 16 16:32:32 2015 GMT using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (21 commits) Convert ram_list to RCU exec: convert ram_list to QLIST cosmetic changes preparing for the following patches exec: protect mru_block with RCU rcu: add g_free_rcu rcu: introduce RCU-enabled QLIST exec: RCUify AddressSpaceDispatch exec: make iotlb RCU-friendly exec: introduce cpu_reload_memory_map docs: clarify memory region lifecycle pci: split shpc_cleanup and shpc_free pcie: remove mmconfig memory leak and wrap mmconfig update with transaction memory: keep the owner of the AddressSpace alive until do_address_space_destroy rcu: run RCU callbacks under the BQL rcu: do not let RCU callbacks pile up indefinitely vhost-scsi: set the bootable value of channel/target/lun vhost-scsi: add a property for booting vhost-scsi: expose the TYPE_FW_PATH_PROVIDER interface vhost-scsi: add bootindex property qdev: support to get a device firmware path directly ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-02-16exec: RCUify AddressSpaceDispatchPaolo Bonzini1-1/+24
Note that even after this patch, most callers of address_space_* functions must still be under the big QEMU lock, otherwise the memory region returned by address_space_translate can disappear as soon as address_space_translate returns. This will be fixed in the next part of this series. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-16exec: make iotlb RCU-friendlyPaolo Bonzini1-1/+5
After the previous patch, TLBs will be flushed on every change to the memory mapping. This patch augments that with synchronization of the MemoryRegionSections referred to in the iotlb array. With this change, it is guaranteed that iotlb_to_region will access the correct memory map, even once the TLB will be accessed outside the BQL. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-16exec: introduce cpu_reload_memory_mapPaolo Bonzini1-0/+6
This for now is a simple TLB flush. This can change later for two reasons: 1) an AddressSpaceDispatch will be cached in the CPUState object 2) it will not be possible to do tlb_flush once the TCG-generated code runs outside the BQL. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-10cpu-exec: simplify icount codePaolo Bonzini1-8/+3
Use MIN instead of an "if" statement. Move "tb" assignment where the value is actually used. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-02-10cpu-exec: drop dead assignmentPaolo Bonzini1-1/+0
All uses of TB inside cpu_exec are dominated by "tb = tb_find_fast(env)", and there are no uses after the switch statement. So the assignment is dead, as reported by Coverity. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-02-02cpu-exec: simplify init_delay_paramsPaolo Bonzini1-4/+2
With the introduction of QEMU_CLOCK_VIRTUAL_RT, the computation of sc->diff_clk can be simplified nicely: qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + cpu_get_clock_offset() = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - cpu_get_clock_offset()) = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timers_state.cpu_clock_offset) = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) Cc: Sebastian Tanase <sebastian.tanase@openwide.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>