aboutsummaryrefslogtreecommitdiff
path: root/tcg/tcg.h
diff options
context:
space:
mode:
authorEmilio G. Cota <cota@braap.org>2017-07-26 16:58:05 -0400
committerRichard Henderson <richard.henderson@linaro.org>2018-06-15 07:42:55 -1000
commitbe2cdc5e352eb28b4ff631f053a261d91e6af78e (patch)
tree8c1a51b0f20bbff2ee2f26badf8e8fe4bec46196 /tcg/tcg.h
parent32359d529f30bea8124ed671b2e6a22f22540488 (diff)
downloadqemu-be2cdc5e352eb28b4ff631f053a261d91e6af78e.zip
qemu-be2cdc5e352eb28b4ff631f053a261d91e6af78e.tar.gz
qemu-be2cdc5e352eb28b4ff631f053a261d91e6af78e.tar.bz2
tcg: track TBs with per-region BST's
This paves the way for enabling scalable parallel generation of TCG code. Instead of tracking TBs with a single binary search tree (BST), use a BST for each TCG region, protecting it with a lock. This is as scalable as it gets, since each TCG thread operates on a separate region. The core of this change is the introduction of struct tcg_region_tree, which contains a pointer to a GTree and an associated lock to serialize accesses to it. We then allocate an array of tcg_region_tree's, adding the appropriate padding to avoid false sharing based on qemu_dcache_linesize. Given a tc_ptr, we first find the corresponding region_tree. This is done by special-casing the first and last regions first, since they might be of size != region.size; otherwise we just divide the offset by region.stride. I was worried about this division (several dozen cycles of latency), but profiling shows that this is not a fast path. Note that region.stride is not required to be a power of two; it is only required to be a multiple of the host's page size. Note that with this design we can also provide consistent snapshots about all region trees at once; for instance, tcg_tb_foreach acquires/releases all region_tree locks before/after iterating over them. For this reason we now drop tb_lock in dump_exec_info(). As an alternative I considered implementing a concurrent BST, but this can be tricky to get right, offers no consistent snapshots of the BST, and performance and scalability-wise I don't think it could ever beat having separate GTrees, given that our workload is insert-mostly (all concurrent BST designs I've seen focus, understandably, on making lookups fast, which comes at the expense of convoluted, non-wait-free insertions/removals). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Diffstat (limited to 'tcg/tcg.h')
-rw-r--r--tcg/tcg.h6
1 files changed, 6 insertions, 0 deletions
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 509f4d6..1e6df19 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -866,6 +866,12 @@ void tcg_region_reset_all(void);
size_t tcg_code_size(void);
size_t tcg_code_capacity(void);
+void tcg_tb_insert(TranslationBlock *tb);
+void tcg_tb_remove(TranslationBlock *tb);
+TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr);
+void tcg_tb_foreach(GTraverseFunc func, gpointer user_data);
+size_t tcg_nb_tbs(void);
+
/* user-mode: Called with tb_lock held. */
static inline void *tcg_malloc(int size)
{