diff options
Diffstat (limited to 'winsup/bz2lib/manual_3.html')
-rw-r--r-- | winsup/bz2lib/manual_3.html | 1773 |
1 files changed, 1773 insertions, 0 deletions
diff --git a/winsup/bz2lib/manual_3.html b/winsup/bz2lib/manual_3.html new file mode 100644 index 0000000..a8fa7e6 --- /dev/null +++ b/winsup/bz2lib/manual_3.html @@ -0,0 +1,1773 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.54 + from manual.texi on 23 March 2000 --> + +<TITLE>bzip2 and libbzip2 - Programming with libbzip2</TITLE> +<link href="manual_4.html" rel=Next> +<link href="manual_2.html" rel=Previous> +<link href="manual_toc.html" rel=ToC> + +</HEAD> +<BODY> +<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_2.html">previous</A>, <A HREF="manual_4.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC12" HREF="manual_toc.html#TOC12">Programming with <CODE>libbzip2</CODE></A></H1> + +<P> +This chapter describes the programming interface to <CODE>libbzip2</CODE>. + +</P> +<P> +For general background information, particularly about memory +use and performance aspects, you'd be well advised to read Chapter 2 +as well. + +</P> + + +<H2><A NAME="SEC13" HREF="manual_toc.html#TOC13">Top-level structure</A></H2> + +<P> +<CODE>libbzip2</CODE> is a flexible library for compressing and decompressing +data in the <CODE>bzip2</CODE> data format. Although packaged as a single +entity, it helps to regard the library as three separate parts: the low +level interface, and the high level interface, and some utility +functions. + +</P> +<P> +The structure of <CODE>libbzip2</CODE>'s interfaces is similar to +that of Jean-loup Gailly's and Mark Adler's excellent <CODE>zlib</CODE> +library. + +</P> +<P> +All externally visible symbols have names beginning <CODE>BZ2_</CODE>. +This is new in version 1.0. The intention is to minimise pollution +of the namespaces of library clients. + +</P> + + +<H3><A NAME="SEC14" HREF="manual_toc.html#TOC14">Low-level summary</A></H3> + +<P> +This interface provides services for compressing and decompressing +data in memory. There's no provision for dealing with files, streams +or any other I/O mechanisms, just straight memory-to-memory work. +In fact, this part of the library can be compiled without inclusion +of <CODE>stdio.h</CODE>, which may be helpful for embedded applications. + +</P> +<P> +The low-level part of the library has no global variables and +is therefore thread-safe. + +</P> +<P> +Six routines make up the low level interface: +<CODE>BZ2_bzCompressInit</CODE>, <CODE>BZ2_bzCompress</CODE>, and <BR> <CODE>BZ2_bzCompressEnd</CODE> +for compression, +and a corresponding trio <CODE>BZ2_bzDecompressInit</CODE>, <BR> <CODE>BZ2_bzDecompress</CODE> +and <CODE>BZ2_bzDecompressEnd</CODE> for decompression. +The <CODE>*Init</CODE> functions allocate +memory for compression/decompression and do other +initialisations, whilst the <CODE>*End</CODE> functions close down operations +and release memory. + +</P> +<P> +The real work is done by <CODE>BZ2_bzCompress</CODE> and <CODE>BZ2_bzDecompress</CODE>. +These compress and decompress data from a user-supplied input buffer +to a user-supplied output buffer. These buffers can be any size; +arbitrary quantities of data are handled by making repeated calls +to these functions. This is a flexible mechanism allowing a +consumer-pull style of activity, or producer-push, or a mixture of +both. + +</P> + + + +<H3><A NAME="SEC15" HREF="manual_toc.html#TOC15">High-level summary</A></H3> + +<P> +This interface provides some handy wrappers around the low-level +interface to facilitate reading and writing <CODE>bzip2</CODE> format +files (<CODE>.bz2</CODE> files). The routines provide hooks to facilitate +reading files in which the <CODE>bzip2</CODE> data stream is embedded +within some larger-scale file structure, or where there are +multiple <CODE>bzip2</CODE> data streams concatenated end-to-end. + +</P> +<P> +For reading files, <CODE>BZ2_bzReadOpen</CODE>, <CODE>BZ2_bzRead</CODE>, +<CODE>BZ2_bzReadClose</CODE> and <BR> <CODE>BZ2_bzReadGetUnused</CODE> are supplied. For +writing files, <CODE>BZ2_bzWriteOpen</CODE>, <CODE>BZ2_bzWrite</CODE> and +<CODE>BZ2_bzWriteFinish</CODE> are available. + +</P> +<P> +As with the low-level library, no global variables are used +so the library is per se thread-safe. However, if I/O errors +occur whilst reading or writing the underlying compressed files, +you may have to consult <CODE>errno</CODE> to determine the cause of +the error. In that case, you'd need a C library which correctly +supports <CODE>errno</CODE> in a multithreaded environment. + +</P> +<P> +To make the library a little simpler and more portable, +<CODE>BZ2_bzReadOpen</CODE> and <CODE>BZ2_bzWriteOpen</CODE> require you to pass them file +handles (<CODE>FILE*</CODE>s) which have previously been opened for reading or +writing respectively. That avoids portability problems associated with +file operations and file attributes, whilst not being much of an +imposition on the programmer. + +</P> + + + +<H3><A NAME="SEC16" HREF="manual_toc.html#TOC16">Utility functions summary</A></H3> +<P> +For very simple needs, <CODE>BZ2_bzBuffToBuffCompress</CODE> and +<CODE>BZ2_bzBuffToBuffDecompress</CODE> are provided. These compress +data in memory from one buffer to another buffer in a single +function call. You should assess whether these functions +fulfill your memory-to-memory compression/decompression +requirements before investing effort in understanding the more +general but more complex low-level interface. + +</P> +<P> +Yoshioka Tsuneo (<CODE>QWF00133@niftyserve.or.jp</CODE> / +<CODE>tsuneo-y@is.aist-nara.ac.jp</CODE>) has contributed some functions to +give better <CODE>zlib</CODE> compatibility. These functions are +<CODE>BZ2_bzopen</CODE>, <CODE>BZ2_bzread</CODE>, <CODE>BZ2_bzwrite</CODE>, <CODE>BZ2_bzflush</CODE>, +<CODE>BZ2_bzclose</CODE>, +<CODE>BZ2_bzerror</CODE> and <CODE>BZ2_bzlibVersion</CODE>. You may find these functions +more convenient for simple file reading and writing, than those in the +high-level interface. These functions are not (yet) officially part of +the library, and are minimally documented here. If they break, you +get to keep all the pieces. I hope to document them properly when time +permits. + +</P> +<P> +Yoshioka also contributed modifications to allow the library to be +built as a Windows DLL. + +</P> + + + +<H2><A NAME="SEC17" HREF="manual_toc.html#TOC17">Error handling</A></H2> + +<P> +The library is designed to recover cleanly in all situations, including +the worst-case situation of decompressing random data. I'm not +100% sure that it can always do this, so you might want to add +a signal handler to catch segmentation violations during decompression +if you are feeling especially paranoid. I would be interested in +hearing more about the robustness of the library to corrupted +compressed data. + +</P> +<P> +Version 1.0 is much more robust in this respect than +0.9.0 or 0.9.5. Investigations with Checker (a tool for +detecting problems with memory management, similar to Purify) +indicate that, at least for the few files I tested, all single-bit +errors in the decompressed data are caught properly, with no +segmentation faults, no reads of uninitialised data and no +out of range reads or writes. So it's certainly much improved, +although I wouldn't claim it to be totally bombproof. + +</P> +<P> +The file <CODE>bzlib.h</CODE> contains all definitions needed to use +the library. In particular, you should definitely not include +<CODE>bzlib_private.h</CODE>. + +</P> +<P> +In <CODE>bzlib.h</CODE>, the various return values are defined. The following +list is not intended as an exhaustive description of the circumstances +in which a given value may be returned -- those descriptions are given +later. Rather, it is intended to convey the rough meaning of each +return value. The first five actions are normal and not intended to +denote an error situation. +<DL COMPACT> + +<DT><CODE>BZ_OK</CODE> +<DD> +The requested action was completed successfully. +<DT><CODE>BZ_RUN_OK</CODE> +<DD> +<DT><CODE>BZ_FLUSH_OK</CODE> +<DD> +<DT><CODE>BZ_FINISH_OK</CODE> +<DD> +In <CODE>BZ2_bzCompress</CODE>, the requested flush/finish/nothing-special action +was completed successfully. +<DT><CODE>BZ_STREAM_END</CODE> +<DD> +Compression of data was completed, or the logical stream end was +detected during decompression. +</DL> + +<P> +The following return values indicate an error of some kind. +<DL COMPACT> + +<DT><CODE>BZ_CONFIG_ERROR</CODE> +<DD> +Indicates that the library has been improperly compiled on your +platform -- a major configuration error. Specifically, it means +that <CODE>sizeof(char)</CODE>, <CODE>sizeof(short)</CODE> and <CODE>sizeof(int)</CODE> +are not 1, 2 and 4 respectively, as they should be. Note that the +library should still work properly on 64-bit platforms which follow +the LP64 programming model -- that is, where <CODE>sizeof(long)</CODE> +and <CODE>sizeof(void*)</CODE> are 8. Under LP64, <CODE>sizeof(int)</CODE> is +still 4, so <CODE>libbzip2</CODE>, which doesn't use the <CODE>long</CODE> type, +is OK. +<DT><CODE>BZ_SEQUENCE_ERROR</CODE> +<DD> +When using the library, it is important to call the functions in the +correct sequence and with data structures (buffers etc) in the correct +states. <CODE>libbzip2</CODE> checks as much as it can to ensure this is +happening, and returns <CODE>BZ_SEQUENCE_ERROR</CODE> if not. Code which +complies precisely with the function semantics, as detailed below, +should never receive this value; such an event denotes buggy code +which you should investigate. +<DT><CODE>BZ_PARAM_ERROR</CODE> +<DD> +Returned when a parameter to a function call is out of range +or otherwise manifestly incorrect. As with <CODE>BZ_SEQUENCE_ERROR</CODE>, +this denotes a bug in the client code. The distinction between +<CODE>BZ_PARAM_ERROR</CODE> and <CODE>BZ_SEQUENCE_ERROR</CODE> is a bit hazy, but still worth +making. +<DT><CODE>BZ_MEM_ERROR</CODE> +<DD> +Returned when a request to allocate memory failed. Note that the +quantity of memory needed to decompress a stream cannot be determined +until the stream's header has been read. So <CODE>BZ2_bzDecompress</CODE> and +<CODE>BZ2_bzRead</CODE> may return <CODE>BZ_MEM_ERROR</CODE> even though some of +the compressed data has been read. The same is not true for +compression; once <CODE>BZ2_bzCompressInit</CODE> or <CODE>BZ2_bzWriteOpen</CODE> have +successfully completed, <CODE>BZ_MEM_ERROR</CODE> cannot occur. +<DT><CODE>BZ_DATA_ERROR</CODE> +<DD> +Returned when a data integrity error is detected during decompression. +Most importantly, this means when stored and computed CRCs for the +data do not match. This value is also returned upon detection of any +other anomaly in the compressed data. +<DT><CODE>BZ_DATA_ERROR_MAGIC</CODE> +<DD> +As a special case of <CODE>BZ_DATA_ERROR</CODE>, it is sometimes useful to +know when the compressed stream does not start with the correct +magic bytes (<CODE>'B' 'Z' 'h'</CODE>). +<DT><CODE>BZ_IO_ERROR</CODE> +<DD> +Returned by <CODE>BZ2_bzRead</CODE> and <CODE>BZ2_bzWrite</CODE> when there is an error +reading or writing in the compressed file, and by <CODE>BZ2_bzReadOpen</CODE> +and <CODE>BZ2_bzWriteOpen</CODE> for attempts to use a file for which the +error indicator (viz, <CODE>ferror(f)</CODE>) is set. +On receipt of <CODE>BZ_IO_ERROR</CODE>, the caller should consult +<CODE>errno</CODE> and/or <CODE>perror</CODE> to acquire operating-system +specific information about the problem. +<DT><CODE>BZ_UNEXPECTED_EOF</CODE> +<DD> +Returned by <CODE>BZ2_bzRead</CODE> when the compressed file finishes +before the logical end of stream is detected. +<DT><CODE>BZ_OUTBUFF_FULL</CODE> +<DD> +Returned by <CODE>BZ2_bzBuffToBuffCompress</CODE> and +<CODE>BZ2_bzBuffToBuffDecompress</CODE> to indicate that the output data +will not fit into the output buffer provided. +</DL> + + + +<H2><A NAME="SEC18" HREF="manual_toc.html#TOC18">Low-level interface</A></H2> + + + +<H3><A NAME="SEC19" HREF="manual_toc.html#TOC19"><CODE>BZ2_bzCompressInit</CODE></A></H3> + +<PRE> +typedef + struct { + char *next_in; + unsigned int avail_in; + unsigned int total_in_lo32; + unsigned int total_in_hi32; + + char *next_out; + unsigned int avail_out; + unsigned int total_out_lo32; + unsigned int total_out_hi32; + + void *state; + + void *(*bzalloc)(void *,int,int); + void (*bzfree)(void *,void *); + void *opaque; + } + bz_stream; + +int BZ2_bzCompressInit ( bz_stream *strm, + int blockSize100k, + int verbosity, + int workFactor ); + +</PRE> + +<P> +Prepares for compression. The <CODE>bz_stream</CODE> structure +holds all data pertaining to the compression activity. +A <CODE>bz_stream</CODE> structure should be allocated and initialised +prior to the call. +The fields of <CODE>bz_stream</CODE> +comprise the entirety of the user-visible data. <CODE>state</CODE> +is a pointer to the private data structures required for compression. + +</P> +<P> +Custom memory allocators are supported, via fields <CODE>bzalloc</CODE>, +<CODE>bzfree</CODE>, +and <CODE>opaque</CODE>. The value +<CODE>opaque</CODE> is passed to as the first argument to +all calls to <CODE>bzalloc</CODE> and <CODE>bzfree</CODE>, but is +otherwise ignored by the library. +The call <CODE>bzalloc ( opaque, n, m )</CODE> is expected to return a +pointer <CODE>p</CODE> to +<CODE>n * m</CODE> bytes of memory, and <CODE>bzfree ( opaque, p )</CODE> +should free +that memory. + +</P> +<P> +If you don't want to use a custom memory allocator, set <CODE>bzalloc</CODE>, +<CODE>bzfree</CODE> and +<CODE>opaque</CODE> to <CODE>NULL</CODE>, +and the library will then use the standard <CODE>malloc</CODE>/<CODE>free</CODE> +routines. + +</P> +<P> +Before calling <CODE>BZ2_bzCompressInit</CODE>, fields <CODE>bzalloc</CODE>, +<CODE>bzfree</CODE> and <CODE>opaque</CODE> should +be filled appropriately, as just described. Upon return, the internal +state will have been allocated and initialised, and <CODE>total_in_lo32</CODE>, +<CODE>total_in_hi32</CODE>, <CODE>total_out_lo32</CODE> and +<CODE>total_out_hi32</CODE> will have been set to zero. +These four fields are used by the library +to inform the caller of the total amount of data passed into and out of +the library, respectively. You should not try to change them. +As of version 1.0, 64-bit counts are maintained, even on 32-bit +platforms, using the <CODE>_hi32</CODE> fields to store the upper 32 bits +of the count. So, for example, the total amount of data in +is <CODE>(total_in_hi32 << 32) + total_in_lo32</CODE>. + +</P> +<P> +Parameter <CODE>blockSize100k</CODE> specifies the block size to be used for +compression. It should be a value between 1 and 9 inclusive, and the +actual block size used is 100000 x this figure. 9 gives the best +compression but takes most memory. + +</P> +<P> +Parameter <CODE>verbosity</CODE> should be set to a number between 0 and 4 +inclusive. 0 is silent, and greater numbers give increasingly verbose +monitoring/debugging output. If the library has been compiled with +<CODE>-DBZ_NO_STDIO</CODE>, no such output will appear for any verbosity +setting. + +</P> +<P> +Parameter <CODE>workFactor</CODE> controls how the compression phase behaves +when presented with worst case, highly repetitive, input data. If +compression runs into difficulties caused by repetitive data, the +library switches from the standard sorting algorithm to a fallback +algorithm. The fallback is slower than the standard algorithm by +perhaps a factor of three, but always behaves reasonably, no matter how +bad the input. + +</P> +<P> +Lower values of <CODE>workFactor</CODE> reduce the amount of effort the +standard algorithm will expend before resorting to the fallback. You +should set this parameter carefully; too low, and many inputs will be +handled by the fallback algorithm and so compress rather slowly, too +high, and your average-to-worst case compression times can become very +large. The default value of 30 gives reasonable behaviour over a wide +range of circumstances. + +</P> +<P> +Allowable values range from 0 to 250 inclusive. 0 is a special case, +equivalent to using the default value of 30. + +</P> +<P> +Note that the compressed output generated is the same regardless of +whether or not the fallback algorithm is used. + +</P> +<P> +Be aware also that this parameter may disappear entirely in future +versions of the library. In principle it should be possible to devise a +good way to automatically choose which algorithm to use. Such a +mechanism would render the parameter obsolete. + +</P> +<P> +Possible return values: + +<PRE> + <CODE>BZ_CONFIG_ERROR</CODE> + if the library has been mis-compiled + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>strm</CODE> is <CODE>NULL</CODE> + or <CODE>blockSize</CODE> < 1 or <CODE>blockSize</CODE> > 9 + or <CODE>verbosity</CODE> < 0 or <CODE>verbosity</CODE> > 4 + or <CODE>workFactor</CODE> < 0 or <CODE>workFactor</CODE> > 250 + <CODE>BZ_MEM_ERROR</CODE> + if not enough memory is available + <CODE>BZ_OK</CODE> + otherwise +</PRE> + +<P> +Allowable next actions: + +<PRE> + <CODE>BZ2_bzCompress</CODE> + if <CODE>BZ_OK</CODE> is returned + no specific action needed in case of error +</PRE> + + + +<H3><A NAME="SEC20" HREF="manual_toc.html#TOC20"><CODE>BZ2_bzCompress</CODE></A></H3> + +<PRE> + int BZ2_bzCompress ( bz_stream *strm, int action ); +</PRE> + +<P> +Provides more input and/or output buffer space for the library. The +caller maintains input and output buffers, and calls <CODE>BZ2_bzCompress</CODE> to +transfer data between them. + +</P> +<P> +Before each call to <CODE>BZ2_bzCompress</CODE>, <CODE>next_in</CODE> should point at +the data to be compressed, and <CODE>avail_in</CODE> should indicate how many +bytes the library may read. <CODE>BZ2_bzCompress</CODE> updates <CODE>next_in</CODE>, +<CODE>avail_in</CODE> and <CODE>total_in</CODE> to reflect the number of bytes it +has read. + +</P> +<P> +Similarly, <CODE>next_out</CODE> should point to a buffer in which the +compressed data is to be placed, with <CODE>avail_out</CODE> indicating how +much output space is available. <CODE>BZ2_bzCompress</CODE> updates +<CODE>next_out</CODE>, <CODE>avail_out</CODE> and <CODE>total_out</CODE> to reflect the +number of bytes output. + +</P> +<P> +You may provide and remove as little or as much data as you like on each +call of <CODE>BZ2_bzCompress</CODE>. In the limit, it is acceptable to supply and +remove data one byte at a time, although this would be terribly +inefficient. You should always ensure that at least one byte of output +space is available at each call. + +</P> +<P> +A second purpose of <CODE>BZ2_bzCompress</CODE> is to request a change of mode of the +compressed stream. + +</P> +<P> +Conceptually, a compressed stream can be in one of four states: IDLE, +RUNNING, FLUSHING and FINISHING. Before initialisation +(<CODE>BZ2_bzCompressInit</CODE>) and after termination (<CODE>BZ2_bzCompressEnd</CODE>), a +stream is regarded as IDLE. + +</P> +<P> +Upon initialisation (<CODE>BZ2_bzCompressInit</CODE>), the stream is placed in the +RUNNING state. Subsequent calls to <CODE>BZ2_bzCompress</CODE> should pass +<CODE>BZ_RUN</CODE> as the requested action; other actions are illegal and +will result in <CODE>BZ_SEQUENCE_ERROR</CODE>. + +</P> +<P> +At some point, the calling program will have provided all the input data +it wants to. It will then want to finish up -- in effect, asking the +library to process any data it might have buffered internally. In this +state, <CODE>BZ2_bzCompress</CODE> will no longer attempt to read data from +<CODE>next_in</CODE>, but it will want to write data to <CODE>next_out</CODE>. +Because the output buffer supplied by the user can be arbitrarily small, +the finishing-up operation cannot necessarily be done with a single call +of <CODE>BZ2_bzCompress</CODE>. + +</P> +<P> +Instead, the calling program passes <CODE>BZ_FINISH</CODE> as an action to +<CODE>BZ2_bzCompress</CODE>. This changes the stream's state to FINISHING. Any +remaining input (ie, <CODE>next_in[0 .. avail_in-1]</CODE>) is compressed and +transferred to the output buffer. To do this, <CODE>BZ2_bzCompress</CODE> must be +called repeatedly until all the output has been consumed. At that +point, <CODE>BZ2_bzCompress</CODE> returns <CODE>BZ_STREAM_END</CODE>, and the stream's +state is set back to IDLE. <CODE>BZ2_bzCompressEnd</CODE> should then be +called. + +</P> +<P> +Just to make sure the calling program does not cheat, the library makes +a note of <CODE>avail_in</CODE> at the time of the first call to +<CODE>BZ2_bzCompress</CODE> which has <CODE>BZ_FINISH</CODE> as an action (ie, at the +time the program has announced its intention to not supply any more +input). By comparing this value with that of <CODE>avail_in</CODE> over +subsequent calls to <CODE>BZ2_bzCompress</CODE>, the library can detect any +attempts to slip in more data to compress. Any calls for which this is +detected will return <CODE>BZ_SEQUENCE_ERROR</CODE>. This indicates a +programming mistake which should be corrected. + +</P> +<P> +Instead of asking to finish, the calling program may ask +<CODE>BZ2_bzCompress</CODE> to take all the remaining input, compress it and +terminate the current (Burrows-Wheeler) compression block. This could +be useful for error control purposes. The mechanism is analogous to +that for finishing: call <CODE>BZ2_bzCompress</CODE> with an action of +<CODE>BZ_FLUSH</CODE>, remove output data, and persist with the +<CODE>BZ_FLUSH</CODE> action until the value <CODE>BZ_RUN</CODE> is returned. As +with finishing, <CODE>BZ2_bzCompress</CODE> detects any attempt to provide more +input data once the flush has begun. + +</P> +<P> +Once the flush is complete, the stream returns to the normal RUNNING +state. + +</P> +<P> +This all sounds pretty complex, but isn't really. Here's a table +which shows which actions are allowable in each state, what action +will be taken, what the next state is, and what the non-error return +values are. Note that you can't explicitly ask what state the +stream is in, but nor do you need to -- it can be inferred from the +values returned by <CODE>BZ2_bzCompress</CODE>. + +<PRE> +IDLE/<CODE>any</CODE> + Illegal. IDLE state only exists after <CODE>BZ2_bzCompressEnd</CODE> or + before <CODE>BZ2_bzCompressInit</CODE>. + Return value = <CODE>BZ_SEQUENCE_ERROR</CODE> + +RUNNING/<CODE>BZ_RUN</CODE> + Compress from <CODE>next_in</CODE> to <CODE>next_out</CODE> as much as possible. + Next state = RUNNING + Return value = <CODE>BZ_RUN_OK</CODE> + +RUNNING/<CODE>BZ_FLUSH</CODE> + Remember current value of <CODE>next_in</CODE>. Compress from <CODE>next_in</CODE> + to <CODE>next_out</CODE> as much as possible, but do not accept any more input. + Next state = FLUSHING + Return value = <CODE>BZ_FLUSH_OK</CODE> + +RUNNING/<CODE>BZ_FINISH</CODE> + Remember current value of <CODE>next_in</CODE>. Compress from <CODE>next_in</CODE> + to <CODE>next_out</CODE> as much as possible, but do not accept any more input. + Next state = FINISHING + Return value = <CODE>BZ_FINISH_OK</CODE> + +FLUSHING/<CODE>BZ_FLUSH</CODE> + Compress from <CODE>next_in</CODE> to <CODE>next_out</CODE> as much as possible, + but do not accept any more input. + If all the existing input has been used up and all compressed + output has been removed + Next state = RUNNING; Return value = <CODE>BZ_RUN_OK</CODE> + else + Next state = FLUSHING; Return value = <CODE>BZ_FLUSH_OK</CODE> + +FLUSHING/other + Illegal. + Return value = <CODE>BZ_SEQUENCE_ERROR</CODE> + +FINISHING/<CODE>BZ_FINISH</CODE> + Compress from <CODE>next_in</CODE> to <CODE>next_out</CODE> as much as possible, + but to not accept any more input. + If all the existing input has been used up and all compressed + output has been removed + Next state = IDLE; Return value = <CODE>BZ_STREAM_END</CODE> + else + Next state = FINISHING; Return value = <CODE>BZ_FINISHING</CODE> + +FINISHING/other + Illegal. + Return value = <CODE>BZ_SEQUENCE_ERROR</CODE> +</PRE> + +<P> +That still looks complicated? Well, fair enough. The usual sequence +of calls for compressing a load of data is: + +<UL> +<LI>Get started with <CODE>BZ2_bzCompressInit</CODE>. + +<LI>Shovel data in and shlurp out its compressed form using zero or more + +calls of <CODE>BZ2_bzCompress</CODE> with action = <CODE>BZ_RUN</CODE>. +<LI>Finish up. + +Repeatedly call <CODE>BZ2_bzCompress</CODE> with action = <CODE>BZ_FINISH</CODE>, +copying out the compressed output, until <CODE>BZ_STREAM_END</CODE> is returned. +<LI>Close up and go home. Call <CODE>BZ2_bzCompressEnd</CODE>. + +</UL> + +<P> +If the data you want to compress fits into your input buffer all +at once, you can skip the calls of <CODE>BZ2_bzCompress ( ..., BZ_RUN )</CODE> and +just do the <CODE>BZ2_bzCompress ( ..., BZ_FINISH )</CODE> calls. + +</P> +<P> +All required memory is allocated by <CODE>BZ2_bzCompressInit</CODE>. The +compression library can accept any data at all (obviously). So you +shouldn't get any error return values from the <CODE>BZ2_bzCompress</CODE> calls. +If you do, they will be <CODE>BZ_SEQUENCE_ERROR</CODE>, and indicate a bug in +your programming. + +</P> +<P> +Trivial other possible return values: + +<PRE> + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>strm</CODE> is <CODE>NULL</CODE>, or <CODE>strm->s</CODE> is <CODE>NULL</CODE> +</PRE> + + + +<H3><A NAME="SEC21" HREF="manual_toc.html#TOC21"><CODE>BZ2_bzCompressEnd</CODE></A></H3> + +<PRE> +int BZ2_bzCompressEnd ( bz_stream *strm ); +</PRE> + +<P> +Releases all memory associated with a compression stream. + +</P> +<P> +Possible return values: + +<PRE> + <CODE>BZ_PARAM_ERROR</CODE> if <CODE>strm</CODE> is <CODE>NULL</CODE> or <CODE>strm->s</CODE> is <CODE>NULL</CODE> + <CODE>BZ_OK</CODE> otherwise +</PRE> + + + +<H3><A NAME="SEC22" HREF="manual_toc.html#TOC22"><CODE>BZ2_bzDecompressInit</CODE></A></H3> + +<PRE> +int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small ); +</PRE> + +<P> +Prepares for decompression. As with <CODE>BZ2_bzCompressInit</CODE>, a +<CODE>bz_stream</CODE> record should be allocated and initialised before the +call. Fields <CODE>bzalloc</CODE>, <CODE>bzfree</CODE> and <CODE>opaque</CODE> should be +set if a custom memory allocator is required, or made <CODE>NULL</CODE> for +the normal <CODE>malloc</CODE>/<CODE>free</CODE> routines. Upon return, the internal +state will have been initialised, and <CODE>total_in</CODE> and +<CODE>total_out</CODE> will be zero. + +</P> +<P> +For the meaning of parameter <CODE>verbosity</CODE>, see <CODE>BZ2_bzCompressInit</CODE>. + +</P> +<P> +If <CODE>small</CODE> is nonzero, the library will use an alternative +decompression algorithm which uses less memory but at the cost of +decompressing more slowly (roughly speaking, half the speed, but the +maximum memory requirement drops to around 2300k). See Chapter 2 for +more information on memory management. + +</P> +<P> +Note that the amount of memory needed to decompress +a stream cannot be determined until the stream's header has been read, +so even if <CODE>BZ2_bzDecompressInit</CODE> succeeds, a subsequent +<CODE>BZ2_bzDecompress</CODE> could fail with <CODE>BZ_MEM_ERROR</CODE>. + +</P> +<P> +Possible return values: + +<PRE> + <CODE>BZ_CONFIG_ERROR</CODE> + if the library has been mis-compiled + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>(small != 0 && small != 1)</CODE> + or <CODE>(verbosity < 0 || verbosity > 4)</CODE> + <CODE>BZ_MEM_ERROR</CODE> + if insufficient memory is available +</PRE> + +<P> +Allowable next actions: + +<PRE> + <CODE>BZ2_bzDecompress</CODE> + if <CODE>BZ_OK</CODE> was returned + no specific action required in case of error +</PRE> + +<P> + + +</P> + + +<H3><A NAME="SEC23" HREF="manual_toc.html#TOC23"><CODE>BZ2_bzDecompress</CODE></A></H3> + +<PRE> +int BZ2_bzDecompress ( bz_stream *strm ); +</PRE> + +<P> +Provides more input and/out output buffer space for the library. The +caller maintains input and output buffers, and uses <CODE>BZ2_bzDecompress</CODE> +to transfer data between them. + +</P> +<P> +Before each call to <CODE>BZ2_bzDecompress</CODE>, <CODE>next_in</CODE> +should point at the compressed data, +and <CODE>avail_in</CODE> should indicate how many bytes the library +may read. <CODE>BZ2_bzDecompress</CODE> updates <CODE>next_in</CODE>, <CODE>avail_in</CODE> +and <CODE>total_in</CODE> +to reflect the number of bytes it has read. + +</P> +<P> +Similarly, <CODE>next_out</CODE> should point to a buffer in which the uncompressed +output is to be placed, with <CODE>avail_out</CODE> indicating how much output space +is available. <CODE>BZ2_bzCompress</CODE> updates <CODE>next_out</CODE>, +<CODE>avail_out</CODE> and <CODE>total_out</CODE> to reflect +the number of bytes output. + +</P> +<P> +You may provide and remove as little or as much data as you like on +each call of <CODE>BZ2_bzDecompress</CODE>. +In the limit, it is acceptable to +supply and remove data one byte at a time, although this would be +terribly inefficient. You should always ensure that at least one +byte of output space is available at each call. + +</P> +<P> +Use of <CODE>BZ2_bzDecompress</CODE> is simpler than <CODE>BZ2_bzCompress</CODE>. + +</P> +<P> +You should provide input and remove output as described above, and +repeatedly call <CODE>BZ2_bzDecompress</CODE> until <CODE>BZ_STREAM_END</CODE> is +returned. Appearance of <CODE>BZ_STREAM_END</CODE> denotes that +<CODE>BZ2_bzDecompress</CODE> has detected the logical end of the compressed +stream. <CODE>BZ2_bzDecompress</CODE> will not produce <CODE>BZ_STREAM_END</CODE> until +all output data has been placed into the output buffer, so once +<CODE>BZ_STREAM_END</CODE> appears, you are guaranteed to have available all +the decompressed output, and <CODE>BZ2_bzDecompressEnd</CODE> can safely be +called. + +</P> +<P> +If case of an error return value, you should call <CODE>BZ2_bzDecompressEnd</CODE> +to clean up and release memory. + +</P> +<P> +Possible return values: + +<PRE> + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>strm</CODE> is <CODE>NULL</CODE> or <CODE>strm->s</CODE> is <CODE>NULL</CODE> + or <CODE>strm->avail_out < 1</CODE> + <CODE>BZ_DATA_ERROR</CODE> + if a data integrity error is detected in the compressed stream + <CODE>BZ_DATA_ERROR_MAGIC</CODE> + if the compressed stream doesn't begin with the right magic bytes + <CODE>BZ_MEM_ERROR</CODE> + if there wasn't enough memory available + <CODE>BZ_STREAM_END</CODE> + if the logical end of the data stream was detected and all + output in has been consumed, eg <CODE>s->avail_out > 0</CODE> + <CODE>BZ_OK</CODE> + otherwise +</PRE> + +<P> +Allowable next actions: + +<PRE> + <CODE>BZ2_bzDecompress</CODE> + if <CODE>BZ_OK</CODE> was returned + <CODE>BZ2_bzDecompressEnd</CODE> + otherwise +</PRE> + + + +<H3><A NAME="SEC24" HREF="manual_toc.html#TOC24"><CODE>BZ2_bzDecompressEnd</CODE></A></H3> + +<PRE> +int BZ2_bzDecompressEnd ( bz_stream *strm ); +</PRE> + +<P> +Releases all memory associated with a decompression stream. + +</P> +<P> +Possible return values: + +<PRE> + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>strm</CODE> is <CODE>NULL</CODE> or <CODE>strm->s</CODE> is <CODE>NULL</CODE> + <CODE>BZ_OK</CODE> + otherwise +</PRE> + +<P> +Allowable next actions: + +<PRE> + None. +</PRE> + + + +<H2><A NAME="SEC25" HREF="manual_toc.html#TOC25">High-level interface</A></H2> + +<P> +This interface provides functions for reading and writing +<CODE>bzip2</CODE> format files. First, some general points. + +</P> + +<UL> +<LI>All of the functions take an <CODE>int*</CODE> first argument, + + <CODE>bzerror</CODE>. + After each call, <CODE>bzerror</CODE> should be consulted first to determine + the outcome of the call. If <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>, + the call completed + successfully, and only then should the return value of the function + (if any) be consulted. If <CODE>bzerror</CODE> is <CODE>BZ_IO_ERROR</CODE>, + there was an error + reading/writing the underlying compressed file, and you should + then consult <CODE>errno</CODE>/<CODE>perror</CODE> to determine the + cause of the difficulty. + <CODE>bzerror</CODE> may also be set to various other values; precise details are + given on a per-function basis below. +<LI>If <CODE>bzerror</CODE> indicates an error + + (ie, anything except <CODE>BZ_OK</CODE> and <CODE>BZ_STREAM_END</CODE>), + you should immediately call <CODE>BZ2_bzReadClose</CODE> (or <CODE>BZ2_bzWriteClose</CODE>, + depending on whether you are attempting to read or to write) + to free up all resources associated + with the stream. Once an error has been indicated, behaviour of all calls + except <CODE>BZ2_bzReadClose</CODE> (<CODE>BZ2_bzWriteClose</CODE>) is undefined. + The implication is that (1) <CODE>bzerror</CODE> should + be checked after each call, and (2) if <CODE>bzerror</CODE> indicates an error, + <CODE>BZ2_bzReadClose</CODE> (<CODE>BZ2_bzWriteClose</CODE>) should then be called to clean up. +<LI>The <CODE>FILE*</CODE> arguments passed to + + <CODE>BZ2_bzReadOpen</CODE>/<CODE>BZ2_bzWriteOpen</CODE> + should be set to binary mode. + Most Unix systems will do this by default, but other platforms, + including Windows and Mac, will not. If you omit this, you may + encounter problems when moving code to new platforms. +<LI>Memory allocation requests are handled by + + <CODE>malloc</CODE>/<CODE>free</CODE>. + At present + there is no facility for user-defined memory allocators in the file I/O + functions (could easily be added, though). +</UL> + + + +<H3><A NAME="SEC26" HREF="manual_toc.html#TOC26"><CODE>BZ2_bzReadOpen</CODE></A></H3> + +<PRE> + typedef void BZFILE; + + BZFILE *BZ2_bzReadOpen ( int *bzerror, FILE *f, + int small, int verbosity, + void *unused, int nUnused ); +</PRE> + +<P> +Prepare to read compressed data from file handle <CODE>f</CODE>. <CODE>f</CODE> +should refer to a file which has been opened for reading, and for which +the error indicator (<CODE>ferror(f)</CODE>)is not set. If <CODE>small</CODE> is 1, +the library will try to decompress using less memory, at the expense of +speed. + +</P> +<P> +For reasons explained below, <CODE>BZ2_bzRead</CODE> will decompress the +<CODE>nUnused</CODE> bytes starting at <CODE>unused</CODE>, before starting to read +from the file <CODE>f</CODE>. At most <CODE>BZ_MAX_UNUSED</CODE> bytes may be +supplied like this. If this facility is not required, you should pass +<CODE>NULL</CODE> and <CODE>0</CODE> for <CODE>unused</CODE> and n<CODE>Unused</CODE> +respectively. + +</P> +<P> +For the meaning of parameters <CODE>small</CODE> and <CODE>verbosity</CODE>, +see <CODE>BZ2_bzDecompressInit</CODE>. + +</P> +<P> +The amount of memory needed to decompress a file cannot be determined +until the file's header has been read. So it is possible that +<CODE>BZ2_bzReadOpen</CODE> returns <CODE>BZ_OK</CODE> but a subsequent call of +<CODE>BZ2_bzRead</CODE> will return <CODE>BZ_MEM_ERROR</CODE>. + +</P> +<P> +Possible assignments to <CODE>bzerror</CODE>: + +<PRE> + <CODE>BZ_CONFIG_ERROR</CODE> + if the library has been mis-compiled + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>f</CODE> is <CODE>NULL</CODE> + or <CODE>small</CODE> is neither <CODE>0</CODE> nor <CODE>1</CODE> + or <CODE>(unused == NULL && nUnused != 0)</CODE> + or <CODE>(unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED))</CODE> + <CODE>BZ_IO_ERROR</CODE> + if <CODE>ferror(f)</CODE> is nonzero + <CODE>BZ_MEM_ERROR</CODE> + if insufficient memory is available + <CODE>BZ_OK</CODE> + otherwise. +</PRE> + +<P> +Possible return values: + +<PRE> + Pointer to an abstract <CODE>BZFILE</CODE> + if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE> + <CODE>NULL</CODE> + otherwise +</PRE> + +<P> +Allowable next actions: + +<PRE> + <CODE>BZ2_bzRead</CODE> + if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE> + <CODE>BZ2_bzClose</CODE> + otherwise +</PRE> + + + +<H3><A NAME="SEC27" HREF="manual_toc.html#TOC27"><CODE>BZ2_bzRead</CODE></A></H3> + +<PRE> + int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len ); +</PRE> + +<P> +Reads up to <CODE>len</CODE> (uncompressed) bytes from the compressed file +<CODE>b</CODE> into +the buffer <CODE>buf</CODE>. If the read was successful, +<CODE>bzerror</CODE> is set to <CODE>BZ_OK</CODE> +and the number of bytes read is returned. If the logical end-of-stream +was detected, <CODE>bzerror</CODE> will be set to <CODE>BZ_STREAM_END</CODE>, +and the number +of bytes read is returned. All other <CODE>bzerror</CODE> values denote an error. + +</P> +<P> +<CODE>BZ2_bzRead</CODE> will supply <CODE>len</CODE> bytes, +unless the logical stream end is detected +or an error occurs. Because of this, it is possible to detect the +stream end by observing when the number of bytes returned is +less than the number +requested. Nevertheless, this is regarded as inadvisable; you should +instead check <CODE>bzerror</CODE> after every call and watch out for +<CODE>BZ_STREAM_END</CODE>. + +</P> +<P> +Internally, <CODE>BZ2_bzRead</CODE> copies data from the compressed file in chunks +of size <CODE>BZ_MAX_UNUSED</CODE> bytes +before decompressing it. If the file contains more bytes than strictly +needed to reach the logical end-of-stream, <CODE>BZ2_bzRead</CODE> will almost certainly +read some of the trailing data before signalling <CODE>BZ_SEQUENCE_END</CODE>. +To collect the read but unused data once <CODE>BZ_SEQUENCE_END</CODE> has +appeared, call <CODE>BZ2_bzReadGetUnused</CODE> immediately before <CODE>BZ2_bzReadClose</CODE>. + +</P> +<P> +Possible assignments to <CODE>bzerror</CODE>: + +<PRE> + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>b</CODE> is <CODE>NULL</CODE> or <CODE>buf</CODE> is <CODE>NULL</CODE> or <CODE>len < 0</CODE> + <CODE>BZ_SEQUENCE_ERROR</CODE> + if <CODE>b</CODE> was opened with <CODE>BZ2_bzWriteOpen</CODE> + <CODE>BZ_IO_ERROR</CODE> + if there is an error reading from the compressed file + <CODE>BZ_UNEXPECTED_EOF</CODE> + if the compressed file ended before the logical end-of-stream was detected + <CODE>BZ_DATA_ERROR</CODE> + if a data integrity error was detected in the compressed stream + <CODE>BZ_DATA_ERROR_MAGIC</CODE> + if the stream does not begin with the requisite header bytes (ie, is not + a <CODE>bzip2</CODE> data file). This is really a special case of <CODE>BZ_DATA_ERROR</CODE>. + <CODE>BZ_MEM_ERROR</CODE> + if insufficient memory was available + <CODE>BZ_STREAM_END</CODE> + if the logical end of stream was detected. + <CODE>BZ_OK</CODE> + otherwise. +</PRE> + +<P> +Possible return values: + +<PRE> + number of bytes read + if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE> or <CODE>BZ_STREAM_END</CODE> + undefined + otherwise +</PRE> + +<P> +Allowable next actions: + +<PRE> + collect data from <CODE>buf</CODE>, then <CODE>BZ2_bzRead</CODE> or <CODE>BZ2_bzReadClose</CODE> + if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE> + collect data from <CODE>buf</CODE>, then <CODE>BZ2_bzReadClose</CODE> or <CODE>BZ2_bzReadGetUnused</CODE> + if <CODE>bzerror</CODE> is <CODE>BZ_SEQUENCE_END</CODE> + <CODE>BZ2_bzReadClose</CODE> + otherwise +</PRE> + + + +<H3><A NAME="SEC28" HREF="manual_toc.html#TOC28"><CODE>BZ2_bzReadGetUnused</CODE></A></H3> + +<PRE> + void BZ2_bzReadGetUnused ( int* bzerror, BZFILE *b, + void** unused, int* nUnused ); +</PRE> + +<P> +Returns data which was read from the compressed file but was not needed +to get to the logical end-of-stream. <CODE>*unused</CODE> is set to the address +of the data, and <CODE>*nUnused</CODE> to the number of bytes. <CODE>*nUnused</CODE> will +be set to a value between <CODE>0</CODE> and <CODE>BZ_MAX_UNUSED</CODE> inclusive. + +</P> +<P> +This function may only be called once <CODE>BZ2_bzRead</CODE> has signalled +<CODE>BZ_STREAM_END</CODE> but before <CODE>BZ2_bzReadClose</CODE>. + +</P> +<P> +Possible assignments to <CODE>bzerror</CODE>: + +<PRE> + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>b</CODE> is <CODE>NULL</CODE> + or <CODE>unused</CODE> is <CODE>NULL</CODE> or <CODE>nUnused</CODE> is <CODE>NULL</CODE> + <CODE>BZ_SEQUENCE_ERROR</CODE> + if <CODE>BZ_STREAM_END</CODE> has not been signalled + or if <CODE>b</CODE> was opened with <CODE>BZ2_bzWriteOpen</CODE> + <CODE>BZ_OK</CODE> + otherwise +</PRE> + +<P> +Allowable next actions: + +<PRE> + <CODE>BZ2_bzReadClose</CODE> +</PRE> + + + +<H3><A NAME="SEC29" HREF="manual_toc.html#TOC29"><CODE>BZ2_bzReadClose</CODE></A></H3> + +<PRE> + void BZ2_bzReadClose ( int *bzerror, BZFILE *b ); +</PRE> + +<P> +Releases all memory pertaining to the compressed file <CODE>b</CODE>. +<CODE>BZ2_bzReadClose</CODE> does not call <CODE>fclose</CODE> on the underlying file +handle, so you should do that yourself if appropriate. +<CODE>BZ2_bzReadClose</CODE> should be called to clean up after all error +situations. + +</P> +<P> +Possible assignments to <CODE>bzerror</CODE>: + +<PRE> + <CODE>BZ_SEQUENCE_ERROR</CODE> + if <CODE>b</CODE> was opened with <CODE>BZ2_bzOpenWrite</CODE> + <CODE>BZ_OK</CODE> + otherwise +</PRE> + +<P> +Allowable next actions: + +<PRE> + none +</PRE> + + + +<H3><A NAME="SEC30" HREF="manual_toc.html#TOC30"><CODE>BZ2_bzWriteOpen</CODE></A></H3> + +<PRE> + BZFILE *BZ2_bzWriteOpen ( int *bzerror, FILE *f, + int blockSize100k, int verbosity, + int workFactor ); +</PRE> + +<P> +Prepare to write compressed data to file handle <CODE>f</CODE>. +<CODE>f</CODE> should refer to +a file which has been opened for writing, and for which the error +indicator (<CODE>ferror(f)</CODE>)is not set. + +</P> +<P> +For the meaning of parameters <CODE>blockSize100k</CODE>, +<CODE>verbosity</CODE> and <CODE>workFactor</CODE>, see +<BR> <CODE>BZ2_bzCompressInit</CODE>. + +</P> +<P> +All required memory is allocated at this stage, so if the call +completes successfully, <CODE>BZ_MEM_ERROR</CODE> cannot be signalled by a +subsequent call to <CODE>BZ2_bzWrite</CODE>. + +</P> +<P> +Possible assignments to <CODE>bzerror</CODE>: + +<PRE> + <CODE>BZ_CONFIG_ERROR</CODE> + if the library has been mis-compiled + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>f</CODE> is <CODE>NULL</CODE> + or <CODE>blockSize100k < 1</CODE> or <CODE>blockSize100k > 9</CODE> + <CODE>BZ_IO_ERROR</CODE> + if <CODE>ferror(f)</CODE> is nonzero + <CODE>BZ_MEM_ERROR</CODE> + if insufficient memory is available + <CODE>BZ_OK</CODE> + otherwise +</PRE> + +<P> +Possible return values: + +<PRE> + Pointer to an abstract <CODE>BZFILE</CODE> + if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE> + <CODE>NULL</CODE> + otherwise +</PRE> + +<P> +Allowable next actions: + +<PRE> + <CODE>BZ2_bzWrite</CODE> + if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE> + (you could go directly to <CODE>BZ2_bzWriteClose</CODE>, but this would be pretty pointless) + <CODE>BZ2_bzWriteClose</CODE> + otherwise +</PRE> + + + +<H3><A NAME="SEC31" HREF="manual_toc.html#TOC31"><CODE>BZ2_bzWrite</CODE></A></H3> + +<PRE> + void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len ); +</PRE> + +<P> +Absorbs <CODE>len</CODE> bytes from the buffer <CODE>buf</CODE>, eventually to be +compressed and written to the file. + +</P> +<P> +Possible assignments to <CODE>bzerror</CODE>: + +<PRE> + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>b</CODE> is <CODE>NULL</CODE> or <CODE>buf</CODE> is <CODE>NULL</CODE> or <CODE>len < 0</CODE> + <CODE>BZ_SEQUENCE_ERROR</CODE> + if b was opened with <CODE>BZ2_bzReadOpen</CODE> + <CODE>BZ_IO_ERROR</CODE> + if there is an error writing the compressed file. + <CODE>BZ_OK</CODE> + otherwise +</PRE> + + + +<H3><A NAME="SEC32" HREF="manual_toc.html#TOC32"><CODE>BZ2_bzWriteClose</CODE></A></H3> + +<PRE> + void BZ2_bzWriteClose ( int *bzerror, BZFILE* f, + int abandon, + unsigned int* nbytes_in, + unsigned int* nbytes_out ); + + void BZ2_bzWriteClose64 ( int *bzerror, BZFILE* f, + int abandon, + unsigned int* nbytes_in_lo32, + unsigned int* nbytes_in_hi32, + unsigned int* nbytes_out_lo32, + unsigned int* nbytes_out_hi32 ); +</PRE> + +<P> +Compresses and flushes to the compressed file all data so far supplied +by <CODE>BZ2_bzWrite</CODE>. The logical end-of-stream markers are also written, so +subsequent calls to <CODE>BZ2_bzWrite</CODE> are illegal. All memory associated +with the compressed file <CODE>b</CODE> is released. +<CODE>fflush</CODE> is called on the +compressed file, but it is not <CODE>fclose</CODE>'d. + +</P> +<P> +If <CODE>BZ2_bzWriteClose</CODE> is called to clean up after an error, the only +action is to release the memory. The library records the error codes +issued by previous calls, so this situation will be detected +automatically. There is no attempt to complete the compression +operation, nor to <CODE>fflush</CODE> the compressed file. You can force this +behaviour to happen even in the case of no error, by passing a nonzero +value to <CODE>abandon</CODE>. + +</P> +<P> +If <CODE>nbytes_in</CODE> is non-null, <CODE>*nbytes_in</CODE> will be set to be the +total volume of uncompressed data handled. Similarly, <CODE>nbytes_out</CODE> +will be set to the total volume of compressed data written. For +compatibility with older versions of the library, <CODE>BZ2_bzWriteClose</CODE> +only yields the lower 32 bits of these counts. Use +<CODE>BZ2_bzWriteClose64</CODE> if you want the full 64 bit counts. These +two functions are otherwise absolutely identical. + +</P> + +<P> +Possible assignments to <CODE>bzerror</CODE>: + +<PRE> + <CODE>BZ_SEQUENCE_ERROR</CODE> + if <CODE>b</CODE> was opened with <CODE>BZ2_bzReadOpen</CODE> + <CODE>BZ_IO_ERROR</CODE> + if there is an error writing the compressed file + <CODE>BZ_OK</CODE> + otherwise +</PRE> + + + +<H3><A NAME="SEC33" HREF="manual_toc.html#TOC33">Handling embedded compressed data streams</A></H3> + +<P> +The high-level library facilitates use of +<CODE>bzip2</CODE> data streams which form some part of a surrounding, larger +data stream. + +<UL> +<LI>For writing, the library takes an open file handle, writes + +compressed data to it, <CODE>fflush</CODE>es it but does not <CODE>fclose</CODE> it. +The calling application can write its own data before and after the +compressed data stream, using that same file handle. +<LI>Reading is more complex, and the facilities are not as general + +as they could be since generality is hard to reconcile with efficiency. +<CODE>BZ2_bzRead</CODE> reads from the compressed file in blocks of size +<CODE>BZ_MAX_UNUSED</CODE> bytes, and in doing so probably will overshoot +the logical end of compressed stream. +To recover this data once decompression has +ended, call <CODE>BZ2_bzReadGetUnused</CODE> after the last call of <CODE>BZ2_bzRead</CODE> +(the one returning <CODE>BZ_STREAM_END</CODE>) but before calling +<CODE>BZ2_bzReadClose</CODE>. +</UL> + +<P> +This mechanism makes it easy to decompress multiple <CODE>bzip2</CODE> +streams placed end-to-end. As the end of one stream, when <CODE>BZ2_bzRead</CODE> +returns <CODE>BZ_STREAM_END</CODE>, call <CODE>BZ2_bzReadGetUnused</CODE> to collect the +unused data (copy it into your own buffer somewhere). +That data forms the start of the next compressed stream. +To start uncompressing that next stream, call <CODE>BZ2_bzReadOpen</CODE> again, +feeding in the unused data via the <CODE>unused</CODE>/<CODE>nUnused</CODE> +parameters. +Keep doing this until <CODE>BZ_STREAM_END</CODE> return coincides with the +physical end of file (<CODE>feof(f)</CODE>). In this situation +<CODE>BZ2_bzReadGetUnused</CODE> +will of course return no data. + +</P> +<P> +This should give some feel for how the high-level interface can be used. +If you require extra flexibility, you'll have to bite the bullet and get +to grips with the low-level interface. + +</P> + + +<H3><A NAME="SEC34" HREF="manual_toc.html#TOC34">Standard file-reading/writing code</A></H3> +<P> +Here's how you'd write data to a compressed file: + +<PRE> +FILE* f; +BZFILE* b; +int nBuf; +char buf[ /* whatever size you like */ ]; +int bzerror; +int nWritten; + +f = fopen ( "myfile.bz2", "w" ); +if (!f) { + /* handle error */ +} +b = BZ2_bzWriteOpen ( &bzerror, f, 9 ); +if (bzerror != BZ_OK) { + BZ2_bzWriteClose ( b ); + /* handle error */ +} + +while ( /* condition */ ) { + /* get data to write into buf, and set nBuf appropriately */ + nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf ); + if (bzerror == BZ_IO_ERROR) { + BZ2_bzWriteClose ( &bzerror, b ); + /* handle error */ + } +} + +BZ2_bzWriteClose ( &bzerror, b ); +if (bzerror == BZ_IO_ERROR) { + /* handle error */ +} +</PRE> + +<P> +And to read from a compressed file: + +<PRE> +FILE* f; +BZFILE* b; +int nBuf; +char buf[ /* whatever size you like */ ]; +int bzerror; +int nWritten; + +f = fopen ( "myfile.bz2", "r" ); +if (!f) { + /* handle error */ +} +b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 ); +if (bzerror != BZ_OK) { + BZ2_bzReadClose ( &bzerror, b ); + /* handle error */ +} + +bzerror = BZ_OK; +while (bzerror == BZ_OK && /* arbitrary other conditions */) { + nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ ); + if (bzerror == BZ_OK) { + /* do something with buf[0 .. nBuf-1] */ + } +} +if (bzerror != BZ_STREAM_END) { + BZ2_bzReadClose ( &bzerror, b ); + /* handle error */ +} else { + BZ2_bzReadClose ( &bzerror ); +} +</PRE> + + + +<H2><A NAME="SEC35" HREF="manual_toc.html#TOC35">Utility functions</A></H2> + + +<H3><A NAME="SEC36" HREF="manual_toc.html#TOC36"><CODE>BZ2_bzBuffToBuffCompress</CODE></A></H3> + +<PRE> + int BZ2_bzBuffToBuffCompress( char* dest, + unsigned int* destLen, + char* source, + unsigned int sourceLen, + int blockSize100k, + int verbosity, + int workFactor ); +</PRE> + +<P> +Attempts to compress the data in <CODE>source[0 .. sourceLen-1]</CODE> +into the destination buffer, <CODE>dest[0 .. *destLen-1]</CODE>. +If the destination buffer is big enough, <CODE>*destLen</CODE> is +set to the size of the compressed data, and <CODE>BZ_OK</CODE> is +returned. If the compressed data won't fit, <CODE>*destLen</CODE> +is unchanged, and <CODE>BZ_OUTBUFF_FULL</CODE> is returned. + +</P> +<P> +Compression in this manner is a one-shot event, done with a single call +to this function. The resulting compressed data is a complete +<CODE>bzip2</CODE> format data stream. There is no mechanism for making +additional calls to provide extra input data. If you want that kind of +mechanism, use the low-level interface. + +</P> +<P> +For the meaning of parameters <CODE>blockSize100k</CODE>, <CODE>verbosity</CODE> +and <CODE>workFactor</CODE>, <BR> see <CODE>BZ2_bzCompressInit</CODE>. + +</P> +<P> +To guarantee that the compressed data will fit in its buffer, allocate +an output buffer of size 1% larger than the uncompressed data, plus +six hundred extra bytes. + +</P> +<P> +<CODE>BZ2_bzBuffToBuffDecompress</CODE> will not write data at or +beyond <CODE>dest[*destLen]</CODE>, even in case of buffer overflow. + +</P> +<P> +Possible return values: + +<PRE> + <CODE>BZ_CONFIG_ERROR</CODE> + if the library has been mis-compiled + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>dest</CODE> is <CODE>NULL</CODE> or <CODE>destLen</CODE> is <CODE>NULL</CODE> + or <CODE>blockSize100k < 1</CODE> or <CODE>blockSize100k > 9</CODE> + or <CODE>verbosity < 0</CODE> or <CODE>verbosity > 4</CODE> + or <CODE>workFactor < 0</CODE> or <CODE>workFactor > 250</CODE> + <CODE>BZ_MEM_ERROR</CODE> + if insufficient memory is available + <CODE>BZ_OUTBUFF_FULL</CODE> + if the size of the compressed data exceeds <CODE>*destLen</CODE> + <CODE>BZ_OK</CODE> + otherwise +</PRE> + + + +<H3><A NAME="SEC37" HREF="manual_toc.html#TOC37"><CODE>BZ2_bzBuffToBuffDecompress</CODE></A></H3> + +<PRE> + int BZ2_bzBuffToBuffDecompress ( char* dest, + unsigned int* destLen, + char* source, + unsigned int sourceLen, + int small, + int verbosity ); +</PRE> + +<P> +Attempts to decompress the data in <CODE>source[0 .. sourceLen-1]</CODE> +into the destination buffer, <CODE>dest[0 .. *destLen-1]</CODE>. +If the destination buffer is big enough, <CODE>*destLen</CODE> is +set to the size of the uncompressed data, and <CODE>BZ_OK</CODE> is +returned. If the compressed data won't fit, <CODE>*destLen</CODE> +is unchanged, and <CODE>BZ_OUTBUFF_FULL</CODE> is returned. + +</P> +<P> +<CODE>source</CODE> is assumed to hold a complete <CODE>bzip2</CODE> format +data stream. <BR> <CODE>BZ2_bzBuffToBuffDecompress</CODE> tries to decompress +the entirety of the stream into the output buffer. + +</P> +<P> +For the meaning of parameters <CODE>small</CODE> and <CODE>verbosity</CODE>, +see <CODE>BZ2_bzDecompressInit</CODE>. + +</P> +<P> +Because the compression ratio of the compressed data cannot be known in +advance, there is no easy way to guarantee that the output buffer will +be big enough. You may of course make arrangements in your code to +record the size of the uncompressed data, but such a mechanism is beyond +the scope of this library. + +</P> +<P> +<CODE>BZ2_bzBuffToBuffDecompress</CODE> will not write data at or +beyond <CODE>dest[*destLen]</CODE>, even in case of buffer overflow. + +</P> +<P> +Possible return values: + +<PRE> + <CODE>BZ_CONFIG_ERROR</CODE> + if the library has been mis-compiled + <CODE>BZ_PARAM_ERROR</CODE> + if <CODE>dest</CODE> is <CODE>NULL</CODE> or <CODE>destLen</CODE> is <CODE>NULL</CODE> + or <CODE>small != 0 && small != 1</CODE> + or <CODE>verbosity < 0</CODE> or <CODE>verbosity > 4</CODE> + <CODE>BZ_MEM_ERROR</CODE> + if insufficient memory is available + <CODE>BZ_OUTBUFF_FULL</CODE> + if the size of the compressed data exceeds <CODE>*destLen</CODE> + <CODE>BZ_DATA_ERROR</CODE> + if a data integrity error was detected in the compressed data + <CODE>BZ_DATA_ERROR_MAGIC</CODE> + if the compressed data doesn't begin with the right magic bytes + <CODE>BZ_UNEXPECTED_EOF</CODE> + if the compressed data ends unexpectedly + <CODE>BZ_OK</CODE> + otherwise +</PRE> + + + +<H2><A NAME="SEC38" HREF="manual_toc.html#TOC38"><CODE>zlib</CODE> compatibility functions</A></H2> +<P> +Yoshioka Tsuneo has contributed some functions to +give better <CODE>zlib</CODE> compatibility. These functions are +<CODE>BZ2_bzopen</CODE>, <CODE>BZ2_bzread</CODE>, <CODE>BZ2_bzwrite</CODE>, <CODE>BZ2_bzflush</CODE>, +<CODE>BZ2_bzclose</CODE>, +<CODE>BZ2_bzerror</CODE> and <CODE>BZ2_bzlibVersion</CODE>. +These functions are not (yet) officially part of +the library. If they break, you get to keep all the pieces. +Nevertheless, I think they work ok. + +<PRE> +typedef void BZFILE; + +const char * BZ2_bzlibVersion ( void ); +</PRE> + +<P> +Returns a string indicating the library version. + +<PRE> +BZFILE * BZ2_bzopen ( const char *path, const char *mode ); +BZFILE * BZ2_bzdopen ( int fd, const char *mode ); +</PRE> + +<P> +Opens a <CODE>.bz2</CODE> file for reading or writing, using either its name +or a pre-existing file descriptor. +Analogous to <CODE>fopen</CODE> and <CODE>fdopen</CODE>. + +<PRE> +int BZ2_bzread ( BZFILE* b, void* buf, int len ); +int BZ2_bzwrite ( BZFILE* b, void* buf, int len ); +</PRE> + +<P> +Reads/writes data from/to a previously opened <CODE>BZFILE</CODE>. +Analogous to <CODE>fread</CODE> and <CODE>fwrite</CODE>. + +<PRE> +int BZ2_bzflush ( BZFILE* b ); +void BZ2_bzclose ( BZFILE* b ); +</PRE> + +<P> +Flushes/closes a <CODE>BZFILE</CODE>. <CODE>BZ2_bzflush</CODE> doesn't actually do +anything. Analogous to <CODE>fflush</CODE> and <CODE>fclose</CODE>. + +</P> + +<PRE> +const char * BZ2_bzerror ( BZFILE *b, int *errnum ) +</PRE> + +<P> +Returns a string describing the more recent error status of +<CODE>b</CODE>, and also sets <CODE>*errnum</CODE> to its numerical value. + +</P> + + + +<H2><A NAME="SEC39" HREF="manual_toc.html#TOC39">Using the library in a <CODE>stdio</CODE>-free environment</A></H2> + + + +<H3><A NAME="SEC40" HREF="manual_toc.html#TOC40">Getting rid of <CODE>stdio</CODE></A></H3> + +<P> +In a deeply embedded application, you might want to use just +the memory-to-memory functions. You can do this conveniently +by compiling the library with preprocessor symbol <CODE>BZ_NO_STDIO</CODE> +defined. Doing this gives you a library containing only the following +eight functions: + +</P> +<P> +<CODE>BZ2_bzCompressInit</CODE>, <CODE>BZ2_bzCompress</CODE>, <CODE>BZ2_bzCompressEnd</CODE> <BR> +<CODE>BZ2_bzDecompressInit</CODE>, <CODE>BZ2_bzDecompress</CODE>, <CODE>BZ2_bzDecompressEnd</CODE> <BR> +<CODE>BZ2_bzBuffToBuffCompress</CODE>, <CODE>BZ2_bzBuffToBuffDecompress</CODE> + +</P> +<P> +When compiled like this, all functions will ignore <CODE>verbosity</CODE> +settings. + +</P> + + +<H3><A NAME="SEC41" HREF="manual_toc.html#TOC41">Critical error handling</A></H3> +<P> +<CODE>libbzip2</CODE> contains a number of internal assertion checks which +should, needless to say, never be activated. Nevertheless, if an +assertion should fail, behaviour depends on whether or not the library +was compiled with <CODE>BZ_NO_STDIO</CODE> set. + +</P> +<P> +For a normal compile, an assertion failure yields the message + +<PRE> + bzip2/libbzip2: internal error number N. + This is a bug in bzip2/libbzip2, 1.0 of 21-Mar-2000. + Please report it to me at: jseward@acm.org. If this happened + when you were using some program which uses libbzip2 as a + component, you should also report this bug to the author(s) + of that program. Please make an effort to report this bug; + timely and accurate bug reports eventually lead to higher + quality software. Thanks. Julian Seward, 21 March 2000. +</PRE> + +<P> +where <CODE>N</CODE> is some error code number. <CODE>exit(3)</CODE> +is then called. + +</P> +<P> +For a <CODE>stdio</CODE>-free library, assertion failures result +in a call to a function declared as: + +<PRE> + extern void bz_internal_error ( int errcode ); +</PRE> + +<P> +The relevant code is passed as a parameter. You should supply +such a function. + +</P> +<P> +In either case, once an assertion failure has occurred, any +<CODE>bz_stream</CODE> records involved can be regarded as invalid. +You should not attempt to resume normal operation with them. + +</P> +<P> +You may, of course, change critical error handling to suit +your needs. As I said above, critical errors indicate bugs +in the library and should not occur. All "normal" error +situations are indicated via error return codes from functions, +and can be recovered from. + +</P> + + + +<H2><A NAME="SEC42" HREF="manual_toc.html#TOC42">Making a Windows DLL</A></H2> +<P> +Everything related to Windows has been contributed by Yoshioka Tsuneo +<BR> (<CODE>QWF00133@niftyserve.or.jp</CODE> / +<CODE>tsuneo-y@is.aist-nara.ac.jp</CODE>), so you should send your queries to +him (but perhaps Cc: me, <CODE>jseward@acm.org</CODE>). + +</P> +<P> +My vague understanding of what to do is: using Visual C++ 5.0, +open the project file <CODE>libbz2.dsp</CODE>, and build. That's all. + +</P> +<P> +If you can't +open the project file for some reason, make a new one, naming these files: +<CODE>blocksort.c</CODE>, <CODE>bzlib.c</CODE>, <CODE>compress.c</CODE>, +<CODE>crctable.c</CODE>, <CODE>decompress.c</CODE>, <CODE>huffman.c</CODE>, <BR> +<CODE>randtable.c</CODE> and <CODE>libbz2.def</CODE>. You will also need +to name the header files <CODE>bzlib.h</CODE> and <CODE>bzlib_private.h</CODE>. + +</P> +<P> +If you don't use VC++, you may need to define the proprocessor symbol +<CODE>_WIN32</CODE>. + +</P> +<P> +Finally, <CODE>dlltest.c</CODE> is a sample program using the DLL. It has a +project file, <CODE>dlltest.dsp</CODE>. + +</P> +<P> +If you just want a makefile for Visual C, have a look at +<CODE>makefile.msc</CODE>. + +</P> +<P> +Be aware that if you compile <CODE>bzip2</CODE> itself on Win32, you must set +<CODE>BZ_UNIX</CODE> to 0 and <CODE>BZ_LCCWIN32</CODE> to 1, in the file +<CODE>bzip2.c</CODE>, before compiling. Otherwise the resulting binary won't +work correctly. + +</P> +<P> +I haven't tried any of this stuff myself, but it all looks plausible. + +</P> + +<P><HR><P> +<p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_2.html">previous</A>, <A HREF="manual_4.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>. +</BODY> +</HTML> |