diff options
author | Alex Bennée <alex.bennee@linaro.org> | 2020-07-09 15:13:16 +0100 |
---|---|---|
committer | Alex Bennée <alex.bennee@linaro.org> | 2020-07-11 15:53:00 +0100 |
commit | 4d7fe02be39e855cdd11376b4d17a85e343fd5c9 (patch) | |
tree | 78283a7abca20df673578bf694de157a3245ec8c /docs | |
parent | c8c06e520d389dcde5963cc5a73d5ecbaf6b8e55 (diff) | |
download | qemu-4d7fe02be39e855cdd11376b4d17a85e343fd5c9.zip qemu-4d7fe02be39e855cdd11376b4d17a85e343fd5c9.tar.gz qemu-4d7fe02be39e855cdd11376b4d17a85e343fd5c9.tar.bz2 |
docs/devel: add some notes on tcg-icount for developers
This attempts to bring together my understanding of the requirements
for icount behaviour into one reference document for our developer
notes.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Dovgalyuk <dovgaluk@ispras.ru>
Cc: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20200709141327.14631-3-alex.bennee@linaro.org>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/devel/index.rst | 1 | ||||
-rw-r--r-- | docs/devel/tcg-icount.rst | 97 |
2 files changed, 98 insertions, 0 deletions
diff --git a/docs/devel/index.rst b/docs/devel/index.rst index 4ecaea3..ae6eac7 100644 --- a/docs/devel/index.rst +++ b/docs/devel/index.rst @@ -23,6 +23,7 @@ Contents: decodetree secure-coding-practices tcg + tcg-icount multi-thread-tcg tcg-plugins bitops diff --git a/docs/devel/tcg-icount.rst b/docs/devel/tcg-icount.rst new file mode 100644 index 0000000..8d67b6c --- /dev/null +++ b/docs/devel/tcg-icount.rst @@ -0,0 +1,97 @@ +.. + Copyright (c) 2020, Linaro Limited + Written by Alex Bennée + + +======================== +TCG Instruction Counting +======================== + +TCG has long supported a feature known as icount which allows for +instruction counting during execution. This should not be confused +with cycle accurate emulation - QEMU does not attempt to emulate how +long an instruction would take on real hardware. That is a job for +other more detailed (and slower) tools that simulate the rest of a +micro-architecture. + +This feature is only available for system emulation and is +incompatible with multi-threaded TCG. It can be used to better align +execution time with wall-clock time so a "slow" device doesn't run too +fast on modern hardware. It can also provides for a degree of +deterministic execution and is an essential part of the record/replay +support in QEMU. + +Core Concepts +============= + +At its heart icount is simply a count of executed instructions which +is stored in the TimersState of QEMU's timer sub-system. The number of +executed instructions can then be used to calculate QEMU_CLOCK_VIRTUAL +which represents the amount of elapsed time in the system since +execution started. Depending on the icount mode this may either be a +fixed number of ns per instruction or adjusted as execution continues +to keep wall clock time and virtual time in sync. + +To be able to calculate the number of executed instructions the +translator starts by allocating a budget of instructions to be +executed. The budget of instructions is limited by how long it will be +until the next timer will expire. We store this budget as part of a +vCPU icount_decr field which shared with the machinery for handling +cpu_exit(). The whole field is checked at the start of every +translated block and will cause a return to the outer loop to deal +with whatever caused the exit. + +In the case of icount, before the flag is checked we subtract the +number of instructions the translation block would execute. If this +would cause the instruction budget to go negative we exit the main +loop and regenerate a new translation block with exactly the right +number of instructions to take the budget to 0 meaning whatever timer +was due to expire will expire exactly when we exit the main run loop. + +Dealing with MMIO +----------------- + +While we can adjust the instruction budget for known events like timer +expiry we cannot do the same for MMIO. Every load/store we execute +might potentially trigger an I/O event, at which point we will need an +up to date and accurate reading of the icount number. + +To deal with this case, when an I/O access is made we: + + - restore un-executed instructions to the icount budget + - re-compile a single [1]_ instruction block for the current PC + - exit the cpu loop and execute the re-compiled block + +The new block is created with the CF_LAST_IO compile flag which +ensures the final instruction translation starts with a call to +gen_io_start() so we don't enter a perpetual loop constantly +recompiling a single instruction block. For translators using the +common translator_loop this is done automatically. + +.. [1] sometimes two instructions if dealing with delay slots + +Other I/O operations +-------------------- + +MMIO isn't the only type of operation for which we might need a +correct and accurate clock. IO port instructions and accesses to +system registers are the common examples here. These instructions have +to be handled by the individual translators which have the knowledge +of which operations are I/O operations. + +When the translator is handling an instruction of this kind: + +* it must call gen_io_start() if icount is enabled, at some + point before the generation of the code which actually does + the I/O, using a code fragment similar to: + +.. code:: c + + if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) { + gen_io_start(); + } + +* it must end the TB immediately after this instruction + +Note that some older front-ends call a "gen_io_end()" function: +this is obsolete and should not be used. |