From 49df298f1fa4f80404f5dc1b496335abb924953e Mon Sep 17 00:00:00 2001 From: Andrew Cagney Date: Fri, 7 Nov 2003 22:04:39 +0000 Subject: 2003-11-07 Andrew Cagney * bcache.h: Update copyright. Add comments on bcache VS hashtab. * bcache.c (struct bstring): Make "length" an unsigned short, add "half_hash". (struct bcache): Add "half_hash_error_count". (bcache): Compute and save the "half_hash". Compare the "half_hash" before comparing the length. Update half_hash_error_count. --- gdb/bcache.h | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 88 insertions(+), 13 deletions(-) (limited to 'gdb/bcache.h') diff --git a/gdb/bcache.h b/gdb/bcache.h index 61fbbe6..6c3a63d 100644 --- a/gdb/bcache.h +++ b/gdb/bcache.h @@ -2,7 +2,7 @@ Written by Fred Fish Rewritten by Jim Blandy - Copyright 1999, 2000, 2002 Free Software Foundation, Inc. + Copyright 1999, 2000, 2002, 2003 Free Software Foundation, Inc. This file is part of GDB. @@ -48,20 +48,95 @@ You shouldn't modify the strings you get from a bcache, because: - You don't necessarily know who you're sharing space with. If I - stick eight bytes of text in a bcache, and then stick an - eight-byte structure in the same bcache, there's no guarantee - those two objects don't actually comprise the same sequence of - bytes. If they happen to, the bcache will use a single byte - string for both of them. Then, modifying the structure will - change the string. In bizarre ways. + stick eight bytes of text in a bcache, and then stick an eight-byte + structure in the same bcache, there's no guarantee those two + objects don't actually comprise the same sequence of bytes. If + they happen to, the bcache will use a single byte string for both + of them. Then, modifying the structure will change the string. In + bizarre ways. - Even if you know for some other reason that all that's okay, - there's another problem. A bcache stores all its strings in a - hash table. If you modify a string's contents, you will probably - change its hash value. This means that the modified string is - now in the wrong place in the hash table, and future bcache - probes will never find it. So by mutating a string, you give up - any chance of sharing its space with future duplicates. */ + there's another problem. A bcache stores all its strings in a hash + table. If you modify a string's contents, you will probably change + its hash value. This means that the modified string is now in the + wrong place in the hash table, and future bcache probes will never + find it. So by mutating a string, you give up any chance of + sharing its space with future duplicates. + + + Size of bcache VS hashtab: + + For bcache, the most critical cost is size (or more exactly the + overhead added by the bcache). It turns out that the bcache is + remarkably efficient. + + Assuming a 32-bit system (the hash table slots are 4 bytes), + ignoring alignment, and limit strings to 255 bytes (1 byte length) + we get ... + + bcache: This uses a separate linked list to track the hash chain. + The numbers show roughly 100% occupancy of the hash table and an + average chain length of 4. Spreading the slot cost over the 4 + chain elements: + + 4 (slot) / 4 (chain length) + 1 (length) + 4 (chain) = 6 bytes + + hashtab: This uses a more traditional re-hash algorithm where the + chain is maintained within the hash table. The table occupancy is + kept below 75% but we'll assume its perfect: + + 4 (slot) x 4/3 (occupancy) + 1 (length) = 6 1/3 bytes + + So a perfect hashtab has just slightly larger than an average + bcache. + + It turns out that an average hashtab is far worse. Two things + hurt: + + - Hashtab's occupancy is more like 50% (it ranges between 38% and + 75%) giving a per slot cost of 4x2 vs 4x4/3. + + - the string structure needs to be aligned to 8 bytes which for + hashtab wastes 7 bytes, while for bcache wastes only 3. + + This gives: + + hashtab: 4 x 2 + 1 + 7 = 16 bytes + + bcache 4 / 4 + 1 + 4 + 3 = 9 bytes + + The numbers of GDB debugging GDB support this. ~40% vs ~70% overhead. + + + Speed of bcache VS hashtab (the half hash hack): + + While hashtab has a typical chain length of 1, bcache has a chain + length of round 4. This means that the bcache will require + something like double the number of compares after that initial + hash. In both cases the comparison takes the form: + + a.length == b.length && memcmp (a.data, b.data, a.length) == 0 + + That is lengths are checked before doing the memcmp. + + For GDB debugging GDB, it turned out that all lengths were 24 bytes + (no C++ so only psymbols were cached) and hence, all compares + required a call to memcmp. As a hack, two bytes of padding + (mentioned above) are used to store the upper 16 bits of the + string's hash value and then that is used in the comparison vis: + + a.half_hash == b.half_hash && a.length == b.length && memcmp + (a.data, b.data, a.length) + + The numbers from GDB debugging GDB show this to be a remarkable + 100% effective (only necessary length and memcmp tests being + performed). + + Mind you, looking at the wall clock, the same GDB debugging GDB + showed only marginal speed up (0.780 vs 0.773s). Seems GDB is too + busy doing something else :-( + +*/ struct bcache; -- cgit v1.1