diff options
author | Emilio G. Cota <cota@braap.org> | 2017-07-26 16:58:05 -0400 |
---|---|---|
committer | Richard Henderson <richard.henderson@linaro.org> | 2018-06-15 07:42:55 -1000 |
commit | be2cdc5e352eb28b4ff631f053a261d91e6af78e (patch) | |
tree | 8c1a51b0f20bbff2ee2f26badf8e8fe4bec46196 /include | |
parent | 32359d529f30bea8124ed671b2e6a22f22540488 (diff) | |
download | qemu-be2cdc5e352eb28b4ff631f053a261d91e6af78e.zip qemu-be2cdc5e352eb28b4ff631f053a261d91e6af78e.tar.gz qemu-be2cdc5e352eb28b4ff631f053a261d91e6af78e.tar.bz2 |
tcg: track TBs with per-region BST's
This paves the way for enabling scalable parallel generation of TCG code.
Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.
The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.
Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.
Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.
For this reason we now drop tb_lock in dump_exec_info().
As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Diffstat (limited to 'include')
-rw-r--r-- | include/exec/exec-all.h | 1 | ||||
-rw-r--r-- | include/exec/tb-context.h | 1 |
2 files changed, 0 insertions, 2 deletions
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 8bbea78..8d4306a 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -405,7 +405,6 @@ static inline uint32_t curr_cflags(void) | (use_icount ? CF_USE_ICOUNT : 0); } -void tb_remove(TranslationBlock *tb); void tb_flush(CPUState *cpu); void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr); TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc, diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h index 1d41202..d8472c8 100644 --- a/include/exec/tb-context.h +++ b/include/exec/tb-context.h @@ -31,7 +31,6 @@ typedef struct TBContext TBContext; struct TBContext { - GTree *tb_tree; struct qht htable; /* any access to the tbs or the page table must use this lock */ QemuMutex tb_lock; |