diff options
author | Dhruv Matani <dhruvbird@HotPOP.com> | 2004-03-12 03:28:12 +0000 |
---|---|---|
committer | Loren J. Rittle <ljrittle@gcc.gnu.org> | 2004-03-12 03:28:12 +0000 |
commit | 009368dba64b7288dc9d9a92618de62d2e36dc3a (patch) | |
tree | ca58c4306f28de2d044b770e99596d8b8e76cb18 /libstdc++-v3/docs | |
parent | a8dad789a5d98744bdff783b7cc66085eb51ca87 (diff) | |
download | gcc-009368dba64b7288dc9d9a92618de62d2e36dc3a.zip gcc-009368dba64b7288dc9d9a92618de62d2e36dc3a.tar.gz gcc-009368dba64b7288dc9d9a92618de62d2e36dc3a.tar.bz2 |
ballocator_doc.txt: New file.
2004-03-11 Dhruv Matani <dhruvbird@HotPOP.com>
* docs/html/ext/ballocator_doc.txt: New file.
* include/Makefile.am (ext_headers): Add
${ext_srcdir}/bitmap_allocator.h .
* include/Makefile.in: Regenerate (by hand, since I didn't have
automake de jure on hand).
* include/ext/bitmap_allocator.h: New file.
* testsuite/performance/20_util/allocator/list_sort_search.cc: New test.
* testsuite/performance/20_util/allocator/map_mt_find.cc: Likewise.
* testsuite/performance/20_util/allocator/producer_consumer.cc: Add
test for the bitmap_allocator<>.
* testsuite/performance/20_util/allocator/insert.cc: Likewise.
* testsuite/performance/20_util/allocator/insert_insert.cc: Likewise.
* testsuite/performance/20_util/allocator/map_thread.cc: Likewise.
From-SVN: r79366
Diffstat (limited to 'libstdc++-v3/docs')
-rw-r--r-- | libstdc++-v3/docs/html/ext/ballocator_doc.txt | 374 |
1 files changed, 374 insertions, 0 deletions
diff --git a/libstdc++-v3/docs/html/ext/ballocator_doc.txt b/libstdc++-v3/docs/html/ext/ballocator_doc.txt new file mode 100644 index 0000000..2173b61 --- /dev/null +++ b/libstdc++-v3/docs/html/ext/ballocator_doc.txt @@ -0,0 +1,374 @@ + BITMAPPED ALLOCATOR + =================== + +2004-03-11 Dhruv Matani <dhruvbird@HotPOP.com> + +--------------------------------------------------------------------- + +As this name suggests, this allocator uses a bit-map to keep track of +the used and unused memory locations for it's book-keeping purposes. + +This allocator will make use of 1 single bit to keep track of whether +it has been allocated or not. A bit 1 indicates free, while 0 +indicates allocated. This has been done so that you can easily check a +collection of bits for a free block. This kind of Bitmapped strategy +works best for single object allocations, and with the STL type +parameterized allocators, we do not need to choose any size for the +block which will be represented by a single bit. This will be the size +of the parameter around which the allocator has been +parameterized. Thus, close to optimal performance will result. Hence, +this should be used for node based containers which call the allocate +function with an argument of 1. + +The bitmapped allocator's internal pool is exponentially +growing. Meaning that internally, the blocks acquired from the Free +List Store will double every time the bitmapped allocator runs out of +memory. + +-------------------------------------------------------------------- + +The macro __GTHREADS decides whether to use Mutex Protection around +every allocation/deallocation. The state of the macro is picked up +automatically from the gthr abstration layer. + +---------------------------------------------------------------------- + +What is the Free List Store? +---------------------------- + +The Free List Store (referred to as FLS for the remaining part of this +document) is the Global memory pool that is shared by all instances of +the bitmapped allocator instantiated for any type. This maintains a +sorted order of all free memory blocks given back to it by the +bitmapped allocator, and is also responsible for giving memory to the +bitmapped allocator when it asks for more. + +Internally, there is a Free List threshold which indicates the Maximum +number of free lists that the FLS can hold internally +(cache). Currently, this value is set at 64. So, if there are more +than 64 free lists coming in, then some of them will be given back to +the OS using operator delete so that at any given time the Free List's +size does not exceed 64 entries. This is done because a Binary Search +is used to locate an entry in a free list when a request for memory +comes along. Thus, the run-time complexity of the search would go up +given an increasing size, for 64 entries however, lg(64) == 6 +comparisons are enough to locate the correct free list if it exists. + +Suppose the free list size has reached it's threshold, then the +largest block from among those in the list and the new block will be +selected and given back to the OS. This is done because it reduces +external fragmentation, and allows the OS to use the larger blocks +later in an orderly fashion, possibly merging them later. Also, on +some systems, large blocks are obtained via calls to mmap, so giving +them back to free system resources becomes most important. + +The function _S_should_i_give decides the policy that determines +whether the current block of memory should be given to the allocator +for the request that it has made. That's because we may not always +have exact fits for the memory size that the allocator requests. We do +this mainly to prevent external fragmentation at the cost of a little +internal fragmentation. Now, the value of this internal fragmentation +has to be decided by this function. I can see 3 possibilities right +now. Please add more as and when you find better strategies. + +1. Equal size check. Return true only when the 2 blocks are of equal + size. + +2. Difference Threshold: Return true only when the _block_size is + greater than or equal to the _required_size, and if the _BS is > + _RS by a difference of less than some THRESHOLD value, then return + true, else return false. + +3. Percentage Threshold. Return true only when the _block_size is + greater than or equal to the _required_size, and if the _BS is > + _RS by a percentage of less than some THRESHOLD value, then return + true, else return false. + +Currently, (3) is being used with a value of 36% Maximum wastage per +Super Block. + +-------------------------------------------------------------------- + +1) What is a super block? Why is it needed? + + A super block is the block of memory acquired from the FLS from + which the bitmap allocator carves out memory for single objects and + satisfies the user's requests. These super blocks come in sizes that + are powers of 2 and multiples of 32 (_Bits_Per_Block). Yes both at + the same time! That's because the next super block acquired will be + 2 times the previous one, and also all super blocks have to be + multiples of the _Bits_Per_Block value. + +2) How does it interact with the free list store? + + The super block is contained in the FLS, and the FLS is responsible + for getting / returning Super Bocks to and from the OS using + operator new as defined by the C++ standard. + +--------------------------------------------------------------------- + +How does the allocate function Work? +------------------------------------ + +The allocate function is specialized for single object allocation +ONLY. Thus, ONLY if n == 1, will the bitmap_allocator's specialized +algorithm be used. Otherwise, the request is satisfied directly by +calling operator new. + +Suppose n == 1, then the allocator does the following: + +1. Checks to see whether the a free block exists somewhere in a region + of memory close to the last satisfied request. If so, then that + block is marked as allocated in the bit map and given to the + user. If not, then (2) is executed. + +2. Is there a free block anywhere after the current block right upto + the end of the memory that we have? If so, that block is found, and + the same procedure is applied as above, and returned to the + user. If not, then (3) is executed. + +3. Is there any block in whatever region of memory that we own free? + This is done by checking (a) The use count for each super block, + and if that fails then (b) The individual bit-maps for each super + block. Note: Here we are never touching any of the memory that the + user will be given, and we are confining all memory accesses to a + small region of memory! This helps reduce cache misses. If this + succeeds then we apply the same procedure on that bit-map as (1), + and return that block of memory to the user. However, if this + process fails, then we resort to (4). + +4. This process involves Refilling the internal exponentially growing + memory pool. The said effect is achieved by calling _S_refill_pool + which does the following: + (a). Gets more memory from the Global Free List of the + Required size. + (b). Adjusts the size for the next call to itself. + (c). Writes the appropriate headers in the bit-maps. + (d). Sets the use count for that super-block just allocated + to 0 (zero). + (e). All of the above accounts to maintaining the basic + invariant for the allocator. If the invariant is + maintained, we are sure that all is well. + Now, the same process is applied on the newly acquired free blocks, + which are dispatched accordingly. + +Thus, you can clearly see that the allocate function is nothing but a +combination of the next-fit and first-fit algorithm optimized ONLY for +single object allocations. + + +------------------------------------------------------------------------- + +How does the deallocate function work? +-------------------------------------- + +The deallocate function again is specialized for single objects ONLY. +For all n belonging to > 1, the operator delete is called without +further ado, and the deallocate function returns. + +However for n == 1, a series of steps are performed: + +1. We first need to locate that super-block which holds the memory + location given to us by the user. For that purpose, we maintain a + static variable _S_last_dealloc_index, which holds the index into + the vector of block pairs which indicates the index of the last + super-block from which memory was freed. We use this strategy in + the hope that the user will deallocate memory in a region close to + what he/she deallocated the last time around. If the check for + belongs_to succeeds, then we determine the bit-map for the given + pointer, and locate the index into that bit-map, and mark that bit + as free by setting it. + +2. If the _S_last_dealloc_index does not point to the memory block + that we're looking for, then we do a linear search on the block + stored in the vector of Block Pairs. This vector in code is called + _S_mem_blocks. When the corresponding super-block is found, we + apply the same procedure as we did for (1) to mark the block as + free in the bit-map. + +Now, whenever a block is freed, the use count of that particular super +block goes down by 1. When this use count hits 0, we remove that super +block from the list of all valid super blocks stored in the +vector. While doing this, we also make sure that the basic invariant +is maintained by making sure that _S_last_request and +_S_last_dealloc_index point to valid locations within the vector. + +-------------------------------------------------------------------- + + +Data Layout for a Super Block: +============================== + +Each Super Block will be of some size that is a multiple of the number +of Bits Per Block. Typically, this value is chosen as Bits_Per_Byte X +sizeof(unsigned int). On an X86 system, this gives the figure +8 X 4 = 32. Thus, each Super Block will be of size 32 X Some_Value. +This Some_Value is sizeof(value_type). For now, let it be called 'K'. +Thus, finally, Super Block size is 32 X K bytes. + +This value of 32 has been chosen because each unsigned int has 32-bits +and Maximum use of these can be made with such a figure. + +Consider a block of size 32 ints. +In memory, it would look like this: + +--------------------------------------------------------------------- +| 136 | 0 | 4294967295 | Data-> Space for 32-ints | +--------------------------------------------------------------------- + +The first Columns represents the size of the Block in bytes as seen by +the Bitmap Allocator. Internally, a global free list is used to keep +track of the free blocks used and given back by the bitmap +allocator. It is this Free List Store that is responsible for writing +and managing this information. Actually the number of bytes allocated +in this case would be: 4 + 4 + 4 + 32*4 = 140 bytes, but the first 4 +bytes are an addition by the Free List Store, so the Bitmap Allocator +sees only 136 bytes. These first 4 bytes about which the bitmapped +allocator is not aware hold the value 136. + +What do the remaining values represent? +--------------------------------------- + +The 2nd 4 in the expression is the sizeof(unsigned int) because the +Bitmapped Allocator maintains a used count for each Super Block, which +is initially set to 0 (as indicated in the diagram). This is +incremented every time a block is removed from this super block +(allocated), and decremented whenever it is given back. So, when the +used count falls to 0, the whole super block will be given back to the +Free List Store. + +The value 4294967295 represents the integer corresponding to the +bit representation of all bits set: 11111111111111111111111111111111. + +The 3rd 4 is size of the bitmap itself, which is the size of 32-bits, +which is 4-bytes, or 1 X sizeof(unsigned int). + + +-------------------------------------------------------------------- + +Another issue would be whether to keep the all bitmaps in a separate +area in memory, or to keep them near the actual blocks that will be +given out or allocated for the client. After some testing, I've +decided to keep these bitmaps close to the actual blocks. this will +help in 2 ways. + +1. Constant time access for the bitmap themselves, since no kind of + look up will be needed to find the correct bitmap list or it's + equivalent. + +2. And also this would preserve the cache as far as possible. + +So in effect, this kind of an allocator might prove beneficial from a +purely cache point of view. But this allocator has been made to try +and roll out the defects of the node_allocator, wherein the nodes get +skewed about in memory, if they are not returned in the exact reverse +order or in the same order in which they were allocated. Also, the +new_allocator's book keeping overhead is too much for small objects +and single object allocations, though it preserves the locality of +blocks very well when they are returned back to the allocator. + +------------------------------------------------------------------- + +Expected overhead per block would be 1 bit in memory. Also, once +the address of the free list has been found, the cost for +allocation/deallocation would be negligible, and is supposed to be +constant time. For these very reasons, it is very important to +minimize the linear time costs, which include finding a free list +with a free block while allocating, and finding the corresponding +free list for a block while deallocating. Therefore, I have decided +that the growth of the internal pool for this allocator will be +exponential as compared to linear for node_allocator. There, linear +time works well, because we are mainly concerned with speed of +allocation/deallocation and memory consumption, whereas here, the +allocation/deallocation part does have some linear/logarithmic +complexity components in it. Thus, to try and minimize them would +be a good thing to do at the cost of a little bit of memory. + +Another thing to be noted is the the pool size will double every time +the internal pool gets exhausted, and all the free blocks have been +given away. The initial size of the pool would be sizeof(unsigned +int)*8 which is the number of bits in an integer, which can fit +exactly in a CPU register. Hence, the term given is exponential growth +of the internal pool. + +--------------------------------------------------------------------- + +After reading all this, you may still have a few questions about the +internal working of this allocator, like my friend had! + +Well here are the exact questions that he posed: + +1) The "Data Layout" section is cryptic. I have no idea of what you + are trying to say. Layout of what? The free-list? Each bitmap? The + Super Block? + + The layout of a Super Block of a given size. In the example, a super + block of size 32 X 1 is taken. The general formula for calculating + the size of a super block is 32*sizeof(value_type)*2^n, where n + ranges from 0 to 32 for 32-bit systems. + +2) And since I just mentioned the term `each bitmap', what in the + world is meant by it? What does each bitmap manage? How does it + relate to the super block? Is the Super Block a bitmap as well? + + Good question! Each bitmap is part of a Super Block which is made up + of 3 parts as I have mentioned earlier. Re-iterating, 1. The use + count, 2. The bit-map for that Super Block. 3. The actual memory + that will be eventually given to the user. Each bitmap is a multiple + of 32 in size. If there are 32*(2^3) blocks of single objects to be + given, there will be '32*(2^3)' bits present. Each 32 bits managing + the allocated / free status for 32 blocks. Since each unsigned int + contains 32-bits, one unsigned int can manage upto 32 blocks' + status. Each bit-map is made up of a number of unsigned ints, whose + exact number for a super-block of a given size I have just + mentioned. + +3) How do the allocate and deallocate functions work in regard to + bitmaps? + + The allocate and deallocate functions manipulate the bitmaps and have + nothing to do with the memory that is given to the user. As I have + earlier mentioned, a 1 in the bitmap's bit field indicates free, + while a 0 indicates allocated. This lets us check 32 bits at a time + to check whether there is at lease one free block in those 32 blocks + by testing for equality with (0). Now, the allocate function will + given a memory block find the corresponding bit in the bitmap, and + will reset it (ie. make it re-set (0)). And when the deallocate + function is called, it will again set that bit after locating it to + indicate that that particular block corresponding to this bit in the + bit-map is not being used by anyone, and may be used to satisfy + future requests. + +---------------------------------------------------------------------- + +(Tech-Stuff, Please stay out if you are not interested in the +selection of certain constants. This has nothing to do with the +algorithm per-se, only with some vales that must be chosen correctly +to ensure that the allocator performs well in a real word scenario, +and maintains a good balance between the memory consumption and the +allocation/deallocation speed). + +The formula for calculating the maximum wastage as a percentage: + +(32 X k + 1) / (2 X (32 X k + 1 + 32 X c)) X 100. + +Where, + k => The constant overhead per node. eg. for list, it is 8 + bytes, and for map it is 12 bytes. + c => The size of the base type on which the map/list is + instantiated. Thus, suppose the the type1 is int and type2 is + double, they are related by the relation sizeof(double) == + 2*sizeof(int). Thus, all types must have this double size + relation for this formula to work properly. + +Plugging-in: For List: k = 8 and c = 4 (int and double), we get: +33.376% + +For map/multimap: k = 12, and c = 4 (int and double), we get: +37.524% + +Thus, knowing these values, and based on the sizeof(value_type), we +may create a function that returns the Max_Wastage_Percentage for us +to use. + + |