aboutsummaryrefslogtreecommitdiff
path: root/research
diff options
context:
space:
mode:
authorIvan Nikulin <vanickulin@google.com>2016-09-15 17:19:26 +0200
committerIvan Nikulin <vanickulin@google.com>2016-09-15 17:19:26 +0200
commit4291932022e67a4d0b9a2b8cedd011ef574f9861 (patch)
tree7bc52d04b36eccefacdb7b34677fc070d4076914 /research
parent0e52c59a07c62c537183a7848244037fe8172930 (diff)
downloadbrotli-4291932022e67a4d0b9a2b8cedd011ef574f9861.zip
brotli-4291932022e67a4d0b9a2b8cedd011ef574f9861.tar.gz
brotli-4291932022e67a4d0b9a2b8cedd011ef574f9861.tar.bz2
Update research tools description.
Diffstat (limited to 'research')
-rw-r--r--research/README.md27
-rw-r--r--research/img/enwik9_brotli.pngbin0 -> 1981984 bytes
-rw-r--r--research/img/enwik9_diff.pngbin0 -> 5096698 bytes
-rw-r--r--research/img/enwik9_opt.pngbin0 -> 2025431 bytes
4 files changed, 21 insertions, 6 deletions
diff --git a/research/README.md b/research/README.md
index ce89dd6..9c87ef8 100644
--- a/research/README.md
+++ b/research/README.md
@@ -1,8 +1,7 @@
## Introduction
-This directory contains several research tools that have been very useful during LZ77 backward distance encoding research.
+In this directory we publish simple tools to analyze backward reference distance distributions in LZ77 compression. We developed these tools to be able to make more efficient encoding of distances in large-window brotli. In large-window compression the average cost of a backward reference distance is higher, and this may allow for more advanced encoding strategies, such as delta coding or an increase in context size, to bring significant compression density improvements. Our tools visualize the backward references as histogram images, i.e., one pixel in the image shows how many distances of a certain range exist at a certain locality in the data. The human visual system is excellent at pattern detection, so we tried to roughly identify patterns visually before going into more quantitative analysis. These tools can turn out to be useful in development of other LZ77-based compressors and we hope you try them out.
-Notice that all `FLAGS_*` variables were supposed to be command-line flags.
## Tools
### find\_opt\_references
@@ -15,31 +14,47 @@ Example usage:
### draw\_histogram
-This tool generates a visualization of the distribution of backward references stored in `*.dist` file. The output is a grayscale PGM (binary) image.
+This tool generates a visualization of the distribution of backward references stored in `*.dist` file. The original file size has to be specified as a second parameter. The output is a grayscale PGM (binary) image.
Example usage:
draw_histogram input.dist 65536 output.pgm
+Here's an example of resulting image:
+
+![](img/enwik9_brotli.png)
+
### draw\_diff
-This tool generates a diff PPM (binary) image between two input PGM (binary) images. Input images must be of same size and contain 255 colors. Useful for comparing different backward references distributions for same input file.
+This tool generates a diff PPM (binary) image between two input 8-bit PGM (binary) images. Input images must be of same size. Useful for comparing different backward references distributions for same input file. Normally used for comparison of output images from `draw_histogram` tool.
Example usage:
draw_diff image1.pgm image2.pgm diff.ppm
+For example the diff of this image
+
+![](img/enwik9_brotli.png)
+
+and this image
+
+![](img/enwik9_opt.png)
+
+looks like this:
+
+![](img/enwik9_diff.png)
+
## Backward distance file format
The format of `*.dist` files is as follows:
- [[ 0| match legnth][ 1|position|distance]...]
+ [[ 0| match length][ 1|position|distance]...]
[1 byte| 4 bytes][1 byte| 4 bytes| 4 bytes]
More verbose explanation: for each backward reference there is a position-distance pair, also a copy length may be specified. Copy length is prefixed with flag byte 0, position-distance pair is prefixed with flag byte 1. Each number is a 32-bit integer. Copy length always comes before position-distance pair. Standalone copy length is allowed, in this case it is ignored.
-Here's an example how to read from `*.dist` file:
+Here's an example of how to read from `*.dist` file:
```c++
#include "read_dist.h"
diff --git a/research/img/enwik9_brotli.png b/research/img/enwik9_brotli.png
new file mode 100644
index 0000000..c95eba9
--- /dev/null
+++ b/research/img/enwik9_brotli.png
Binary files differ
diff --git a/research/img/enwik9_diff.png b/research/img/enwik9_diff.png
new file mode 100644
index 0000000..5df6748
--- /dev/null
+++ b/research/img/enwik9_diff.png
Binary files differ
diff --git a/research/img/enwik9_opt.png b/research/img/enwik9_opt.png
new file mode 100644
index 0000000..43c655d
--- /dev/null
+++ b/research/img/enwik9_opt.png
Binary files differ