ballocator_doc.txt: New file.

2004-03-11 Dhruv Matani <dhruvbird@HotPOP.com> * docs/html/ext/ballocator_doc.txt: New file. * include/Makefile.am (ext_headers): Add ${ext_srcdir}/bitmap_allocator.h . * include/Makefile.in: Regenerate (by hand, since I didn't have automake de jure on hand). * include/ext/bitmap_allocator.h: New file. * testsuite/performance/20_util/allocator/list_sort_search.cc: New test. * testsuite/performance/20_util/allocator/map_mt_find.cc: Likewise. * testsuite/performance/20_util/allocator/producer_consumer.cc: Add test for the bitmap_allocator<>. * testsuite/performance/20_util/allocator/insert.cc: Likewise. * testsuite/performance/20_util/allocator/insert_insert.cc: Likewise. * testsuite/performance/20_util/allocator/map_thread.cc: Likewise. From-SVN: r79366
author: Dhruv Matani <dhruvbird@HotPOP.com> 2004-03-12 03:28:12 +0000
committer: Loren J. Rittle <ljrittle@gcc.gnu.org> 2004-03-12 03:28:12 +0000
commit: 009368dba64b7288dc9d9a92618de62d2e36dc3a (patch)
tree: ca58c4306f28de2d044b770e99596d8b8e76cb18 /libstdc++-v3/docs
parent: a8dad789a5d98744bdff783b7cc66085eb51ca87 (diff)
download: gcc-009368dba64b7288dc9d9a92618de62d2e36dc3a.zip
gcc-009368dba64b7288dc9d9a92618de62d2e36dc3a.tar.gz
gcc-009368dba64b7288dc9d9a92618de62d2e36dc3a.tar.bz2
1 files changed, 374 insertions, 0 deletions
diff --git a/libstdc++-v3/docs/html/ext/ballocator_doc.txt b/libstdc++-v3/docs/html/ext/ballocator_doc.txt
new file mode 100644
index 0000000..2173b61
--- /dev/null
+++ b/libstdc++-v3/docs/html/ext/ballocator_doc.txt
@@ -0,0 +1,374 @@
+			BITMAPPED ALLOCATOR
+			===================
+
+2004-03-11  Dhruv Matani  <dhruvbird@HotPOP.com>
+
+---------------------------------------------------------------------
+
+As this name suggests, this allocator uses a bit-map to keep track of
+the used and unused memory locations for it's book-keeping purposes.
+
+This allocator will make use of 1 single bit to keep track of whether
+it has been allocated or not. A bit 1 indicates free, while 0
+indicates allocated. This has been done so that you can easily check a
+collection of bits for a free block. This kind of Bitmapped strategy
+works best for single object allocations, and with the STL type
+parameterized allocators, we do not need to choose any size for the
+block which will be represented by a single bit. This will be the size
+of the parameter around which the allocator has been
+parameterized. Thus, close to optimal performance will result. Hence,
+this should be used for node based containers which call the allocate
+function with an argument of 1.
+
+The bitmapped allocator's internal pool is exponentially
+growing. Meaning that internally, the blocks acquired from the Free
+List Store will double every time the bitmapped allocator runs out of
+memory.
+
+--------------------------------------------------------------------
+
+The macro __GTHREADS decides whether to use Mutex Protection around
+every allocation/deallocation.  The state of the macro is picked up
+automatically from the gthr abstration layer.
+
+----------------------------------------------------------------------
+
+What is the Free List Store?
+----------------------------
+
+The Free List Store (referred to as FLS for the remaining part of this
+document) is the Global memory pool that is shared by all instances of
+the bitmapped allocator instantiated for any type. This maintains a
+sorted order of all free memory blocks given back to it by the
+bitmapped allocator, and is also responsible for giving memory to the
+bitmapped allocator when it asks for more.
+
+Internally, there is a Free List threshold which indicates the Maximum
+number of free lists that the FLS can hold internally
+(cache). Currently, this value is set at 64. So, if there are more
+than 64 free lists coming in, then some of them will be given back to
+the OS using operator delete so that at any given time the Free List's
+size does not exceed 64 entries. This is done because a Binary Search
+is used to locate an entry in a free list when a request for memory
+comes along. Thus, the run-time complexity of the search would go up
+given an increasing size, for 64 entries however, lg(64) == 6
+comparisons are enough to locate the correct free list if it exists.
+
+Suppose the free list size has reached it's threshold, then the
+largest block from among those in the list and the new block will be
+selected and given back to the OS. This is done because it reduces
+external fragmentation, and allows the OS to use the larger blocks
+later in an orderly fashion, possibly merging them later. Also, on
+some systems, large blocks are obtained via calls to mmap, so giving
+them back to free system resources becomes most important.
+
+The function _S_should_i_give decides the policy that determines
+whether the current block of memory should be given to the allocator
+for the request that it has made. That's because we may not always
+have exact fits for the memory size that the allocator requests. We do
+this mainly to prevent external fragmentation at the cost of a little
+internal fragmentation. Now, the value of this internal fragmentation
+has to be decided by this function. I can see 3 possibilities right
+now. Please add more as and when you find better strategies.
+
+1. Equal size check. Return true only when the 2 blocks are of equal
+   size.
+
+2. Difference Threshold: Return true only when the _block_size is
+   greater than or equal to the _required_size, and if the _BS is >
+   _RS by a difference of less than some THRESHOLD value, then return
+   true, else return false.  
+
+3. Percentage Threshold. Return true only when the _block_size is
+   greater than or equal to the _required_size, and if the _BS is >
+   _RS by a percentage of less than some THRESHOLD value, then return
+   true, else return false.
+
+Currently, (3) is being used with a value of 36% Maximum wastage per
+Super Block.
+
+--------------------------------------------------------------------
+
+1) What is a super block? Why is it needed?
+
+   A super block is the block of memory acquired from the FLS from
+   which the bitmap allocator carves out memory for single objects and
+   satisfies the user's requests. These super blocks come in sizes that
+   are powers of 2 and multiples of 32 (_Bits_Per_Block). Yes both at
+   the same time! That's because the next super block acquired will be
+   2 times the previous one, and also all super blocks have to be
+   multiples of the _Bits_Per_Block value.
+
+2) How does it interact with the free list store?
+
+   The super block is contained in the FLS, and the FLS is responsible
+   for getting / returning Super Bocks to and from the OS using
+   operator new as defined by the C++ standard.
+
+---------------------------------------------------------------------
+
+How does the allocate function Work?
+------------------------------------
+
+The allocate function is specialized for single object allocation
+ONLY. Thus, ONLY if n == 1, will the bitmap_allocator's specialized
+algorithm be used. Otherwise, the request is satisfied directly by
+calling operator new.
+
+Suppose n == 1, then the allocator does the following:
+
+1. Checks to see whether the a free block exists somewhere in a region
+   of memory close to the last satisfied request. If so, then that
+   block is marked as allocated in the bit map and given to the
+   user. If not, then (2) is executed.
+
+2. Is there a free block anywhere after the current block right upto
+   the end of the memory that we have? If so, that block is found, and
+   the same procedure is applied as above, and returned to the
+   user. If not, then (3) is executed.
+
+3. Is there any block in whatever region of memory that we own free?
+   This is done by checking (a) The use count for each super block,
+   and if that fails then (b) The individual bit-maps for each super
+   block. Note: Here we are never touching any of the memory that the
+   user will be given, and we are confining all memory accesses to a
+   small region of memory! This helps reduce cache misses. If this
+   succeeds then we apply the same procedure on that bit-map as (1),
+   and return that block of memory to the user. However, if this
+   process fails, then we resort to (4).
+
+4. This process involves Refilling the internal exponentially growing
+   memory pool. The said effect is achieved by calling _S_refill_pool
+   which does the following:
+	 (a). Gets more memory from the Global Free List of the
+	      Required size.
+	 (b). Adjusts the size for the next call to itself.
+	 (c). Writes the appropriate headers in the bit-maps.
+	 (d). Sets the use count for that super-block just allocated
+	      to 0 (zero).
+	 (e). All of the above accounts to maintaining the basic
+	      invariant for the allocator. If the invariant is
+	      maintained, we are sure that all is well.
+   Now, the same process is applied on the newly acquired free blocks,
+   which are dispatched accordingly.
+
+Thus, you can clearly see that the allocate function is nothing but a
+combination of the next-fit and first-fit algorithm optimized ONLY for
+single object allocations.
+
+
+-------------------------------------------------------------------------
+
+How does the deallocate function work?
+--------------------------------------
+
+The deallocate function again is specialized for single objects ONLY.
+For all n belonging to > 1, the operator delete is called without
+further ado, and the deallocate function returns.
+
+However for n == 1, a series of steps are performed:
+
+1. We first need to locate that super-block which holds the memory
+   location given to us by the user. For that purpose, we maintain a
+   static variable _S_last_dealloc_index, which holds the index into
+   the vector of block pairs which indicates the index of the last
+   super-block from which memory was freed. We use this strategy in
+   the hope that the user will deallocate memory in a region close to
+   what he/she deallocated the last time around. If the check for
+   belongs_to succeeds, then we determine the bit-map for the given
+   pointer, and locate the index into that bit-map, and mark that bit
+   as free by setting it.
+
+2. If the _S_last_dealloc_index does not point to the memory block
+   that we're looking for, then we do a linear search on the block
+   stored in the vector of Block Pairs. This vector in code is called
+   _S_mem_blocks. When the corresponding super-block is found, we
+   apply the same procedure as we did for (1) to mark the block as
+   free in the bit-map.
+
+Now, whenever a block is freed, the use count of that particular super
+block goes down by 1. When this use count hits 0, we remove that super
+block from the list of all valid super blocks stored in the
+vector. While doing this, we also make sure that the basic invariant
+is maintained by making sure that _S_last_request and
+_S_last_dealloc_index point to valid locations within the vector.
+
+--------------------------------------------------------------------
+
+
+Data Layout for a Super Block:
+==============================
+
+Each Super Block will be of some size that is a multiple of the number
+of Bits Per Block. Typically, this value is chosen as Bits_Per_Byte X
+sizeof(unsigned int). On an X86 system, this gives the figure 
+8 X 4 = 32. Thus, each Super Block will be of size 32 X Some_Value.
+This Some_Value is sizeof(value_type). For now, let it be called 'K'.
+Thus, finally, Super Block size is 32 X K bytes.
+
+This value of 32 has been chosen because each unsigned int has 32-bits
+and Maximum use of these can be made with such a figure.
+
+Consider a block of size 32 ints.
+In memory, it would look like this:
+
+---------------------------------------------------------------------
+|   136   |    0    | 4294967295 |      Data-> Space for 32-ints    |
+---------------------------------------------------------------------
+
+The first Columns represents the size of the Block in bytes as seen by
+the Bitmap Allocator. Internally, a global free list is used to keep
+track of the free blocks used and given back by the bitmap
+allocator. It is this Free List Store that is responsible for writing
+and managing this information. Actually the number of bytes allocated
+in this case would be: 4 + 4 + 4 + 32*4 = 140 bytes, but the first 4
+bytes are an addition by the Free List Store, so the Bitmap Allocator
+sees only 136 bytes. These first 4 bytes about which the bitmapped
+allocator is not aware hold the value 136.
+
+What do the remaining values represent?
+---------------------------------------
+
+The 2nd 4 in the expression is the sizeof(unsigned int) because the
+Bitmapped Allocator maintains a used count for each Super Block, which
+is initially set to 0 (as indicated in the diagram). This is
+incremented every time a block is removed from this super block
+(allocated), and decremented whenever it is given back. So, when the
+used count falls to 0, the whole super block will be given back to the
+Free List Store.
+
+The value 4294967295 represents the integer corresponding to the
+bit representation of all bits set: 11111111111111111111111111111111.
+
+The 3rd 4 is size of the bitmap itself, which is the size of 32-bits,
+which is 4-bytes, or 1 X sizeof(unsigned int).
+
+
+--------------------------------------------------------------------
+
+Another issue would be whether to keep the all bitmaps in a separate
+area in memory, or to keep them near the actual blocks that will be
+given out or allocated for the client. After some testing, I've
+decided to keep these bitmaps close to the actual blocks. this will
+help in 2 ways. 
+
+1. Constant time access for the bitmap themselves, since no kind of
+   look up will be needed to find the correct bitmap list or it's
+   equivalent.
+
+2. And also this would preserve the cache as far as possible.
+
+So in effect, this kind of an allocator might prove beneficial from a
+purely cache point of view. But this allocator has been made to try
+and roll out the defects of the node_allocator, wherein the nodes get
+skewed about in memory, if they are not returned in the exact reverse
+order or in the same order in which they were allocated. Also, the
+new_allocator's book keeping overhead is too much for small objects
+and single object allocations, though it preserves the locality of
+blocks very well when they are returned back to the allocator.
+
+-------------------------------------------------------------------
+
+Expected overhead per block would be 1 bit in memory. Also, once
+the address of the free list has been found, the cost for
+allocation/deallocation would be negligible, and is supposed to be
+constant time. For these very reasons, it is very important to
+minimize the linear time costs, which include finding a free list
+with a free block while allocating, and finding the corresponding
+free list for a block while deallocating. Therefore, I have decided
+that the growth of the internal pool for this allocator will be
+exponential as compared to linear for node_allocator. There, linear
+time works well, because we are mainly concerned with speed of
+allocation/deallocation and memory consumption, whereas here, the
+allocation/deallocation part does have some linear/logarithmic
+complexity components in it. Thus, to try and minimize them would
+be a good thing to do at the cost of a little bit of memory.
+
+Another thing to be noted is the the pool size will double every time
+the internal pool gets exhausted, and all the free blocks have been
+given away. The initial size of the pool would be sizeof(unsigned
+int)*8 which is the number of bits in an integer, which can fit
+exactly in a CPU register. Hence, the term given is exponential growth
+of the internal pool.
+
+---------------------------------------------------------------------
+
+After reading all this, you may still have a few questions about the
+internal working of this allocator, like my friend had!
+
+Well here are the exact questions that he posed:
+
+1) The "Data Layout" section is cryptic. I have no idea of what you
+   are trying to say. Layout of what? The free-list? Each bitmap? The
+   Super Block?
+
+   The layout of a Super Block of a given size. In the example, a super
+   block of size 32 X 1 is taken. The general formula for calculating
+   the size of a super block is 32*sizeof(value_type)*2^n, where n
+   ranges from 0 to 32 for 32-bit systems.
+
+2) And since I just mentioned the term `each bitmap', what in the
+   world is meant by it? What does each bitmap manage? How does it
+   relate to the super block? Is the Super Block a bitmap as well?
+
+   Good question! Each bitmap is part of a Super Block which is made up
+   of 3 parts as I have mentioned earlier. Re-iterating, 1. The use
+   count, 2. The bit-map for that Super Block. 3. The actual memory
+   that will be eventually given to the user. Each bitmap is a multiple
+   of 32 in size. If there are 32*(2^3) blocks of single objects to be
+   given, there will be '32*(2^3)' bits present. Each 32 bits managing
+   the allocated / free status for 32 blocks. Since each unsigned int
+   contains 32-bits, one unsigned int can manage upto 32 blocks'
+   status. Each bit-map is made up of a number of unsigned ints, whose
+   exact number for a super-block of a given size I have just
+   mentioned.
+
+3) How do the allocate and deallocate functions work in regard to
+   bitmaps?
+
+   The allocate and deallocate functions manipulate the bitmaps and have
+   nothing to do with the memory that is given to the user. As I have
+   earlier mentioned, a 1 in the bitmap's bit field indicates free,
+   while a 0 indicates allocated. This lets us check 32 bits at a time
+   to check whether there is at lease one free block in those 32 blocks
+   by testing for equality with (0). Now, the allocate function will
+   given a memory block find the corresponding bit in the bitmap, and
+   will reset it (ie. make it re-set (0)). And when the deallocate
+   function is called, it will again set that bit after locating it to
+   indicate that that particular block corresponding to this bit in the
+   bit-map is not being used by anyone, and may be used to satisfy
+   future requests.
+
+----------------------------------------------------------------------
+
+(Tech-Stuff, Please stay out if you are not interested in the
+selection of certain constants. This has nothing to do with the
+algorithm per-se, only with some vales that must be chosen correctly
+to ensure that the allocator performs well in a real word scenario,
+and maintains a good balance between the memory consumption and the
+allocation/deallocation speed).
+
+The formula for calculating the maximum wastage as a percentage:
+
+(32 X k + 1) / (2 X (32 X k + 1 + 32 X c)) X 100.
+
+Where,
+	k => The constant overhead per node. eg. for list, it is 8
+	bytes, and for map it is 12 bytes.
+	c => The size of the base type on which the map/list is
+	instantiated. Thus, suppose the the type1 is int and type2 is
+	double, they are related by the relation sizeof(double) ==
+	2*sizeof(int). Thus, all types must have this double size
+	relation for this formula to work properly.
+
+Plugging-in: For List: k = 8 and c = 4 (int and double), we get:
+33.376%
+
+For map/multimap: k = 12, and c = 4 (int and double), we get:
+37.524%
+
+Thus, knowing these values, and based on the sizeof(value_type), we
+may create a function that returns the Max_Wastage_Percentage for us
+to use.
+
+
author	Dhruv Matani <dhruvbird@HotPOP.com>	2004-03-12 03:28:12 +0000
committer	Loren J. Rittle <ljrittle@gcc.gnu.org>	2004-03-12 03:28:12 +0000
commit	009368dba64b7288dc9d9a92618de62d2e36dc3a (patch)
tree	ca58c4306f28de2d044b770e99596d8b8e76cb18 /libstdc++-v3/docs
parent	a8dad789a5d98744bdff783b7cc66085eb51ca87 (diff)
download	gcc-009368dba64b7288dc9d9a92618de62d2e36dc3a.zip gcc-009368dba64b7288dc9d9a92618de62d2e36dc3a.tar.gz gcc-009368dba64b7288dc9d9a92618de62d2e36dc3a.tar.bz2