diff options
author | Emilio G. Cota <cota@braap.org> | 2018-10-09 13:45:56 -0400 |
---|---|---|
committer | Richard Henderson <richard.henderson@linaro.org> | 2018-10-18 18:58:10 -0700 |
commit | 71aec3541d87d611f6efad71d45b310e515372cc (patch) | |
tree | 1168445e8f1278b6986fa13e46db6f513eb3cf7f /include/exec | |
parent | ea9025cb49027d9b3c4f48c56602351b9cf65ff1 (diff) | |
download | qemu-71aec3541d87d611f6efad71d45b310e515372cc.zip qemu-71aec3541d87d611f6efad71d45b310e515372cc.tar.gz qemu-71aec3541d87d611f6efad71d45b310e515372cc.tar.bz2 |
cputlb: serialize tlb updates with env->tlb_lock
Currently we rely on atomic operations for cross-CPU invalidations.
There are two cases that these atomics miss: cross-CPU invalidations
can race with either (1) vCPU threads flushing their TLB, which
happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB,
which updates .addr_write with a regular store. This results in
undefined behaviour, since we're mixing regular and atomic ops
on concurrent accesses.
Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table
and the corresponding victim cache now hold the lock.
The readers that do not hold tlb_lock must use atomic reads when
reading .addr_write, since this field can be updated by other threads;
the conversion to atomic reads is done in the next patch.
Note that an alternative fix would be to expand the use of atomic ops.
However, in the case of TLB flushes this would have a huge performance
impact, since (1) TLB flushes can happen very frequently and (2) we
currently use a full memory barrier to flush each TLB entry, and a TLB
has many entries. Instead, acquiring the lock is barely slower than a
full memory barrier since it is uncontended, and with a single lock
acquisition we can flush the entire TLB.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009174557.16125-6-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Diffstat (limited to 'include/exec')
-rw-r--r-- | include/exec/cpu-defs.h | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h index a171ffc..4ff62f3 100644 --- a/include/exec/cpu-defs.h +++ b/include/exec/cpu-defs.h @@ -24,6 +24,7 @@ #endif #include "qemu/host-utils.h" +#include "qemu/thread.h" #include "qemu/queue.h" #ifdef CONFIG_TCG #include "tcg-target.h" @@ -142,6 +143,8 @@ typedef struct CPUIOTLBEntry { #define CPU_COMMON_TLB \ /* The meaning of the MMU modes is defined in the target code. */ \ + /* tlb_lock serializes updates to tlb_table and tlb_v_table */ \ + QemuSpin tlb_lock; \ CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; \ CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE]; \ CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; \ |