Skip to content
This repository has been archived by the owner on Sep 14, 2022. It is now read-only.

Trying to understand context coding #3

Open
exilef opened this issue Oct 25, 2018 · 2 comments
Open

Trying to understand context coding #3

exilef opened this issue Oct 25, 2018 · 2 comments

Comments

@exilef
Copy link

exilef commented Oct 25, 2018

Hi Christoph, I was trying to understand how IZ does its magic and stumbled upon the code in table.cpp that I found interesting and struggle to understand completely.

I understood that you use CONTEXT_BITS = 4 to store the integer values 0..8 (so CONTEXT_COUNT = 9) that encode how many bits are used to code each color channel of a given pixel.

But what I struggle to understand is the meaning of MAX_CODE_LENGTH = 6, and how are the entries of staticdCount are chosen and then used to calculate the entries of staticdBits (for encoding) and decodeTable (for decoding). I figured out that you code the current context depending on the last one, but why and how exactly is this done?

Is this a standard approach that I am not aware of? Or have you explained the idea somewhere else?

Any further information is appreciated!

@cfeck
Copy link
Owner

cfeck commented Oct 26, 2018

Hi, thanks for your interest in IZ.

In IZ, the context value is the number of bits needed to code the largest component difference between the predicted pixel and the actual pixel. The higher the context value, the more noisy is the pixel (assuming we cannot predict noise).

For example, if we predicted (R',G',B')=(123,45,67), but actual pixel is (R,G,B)=(120,50,60), then the deltas are (-3,5,-7). The sign bit is moved to the LSB, so we get the unsigned values (5,10,13). The largest value needs 4 bits: the context value for this pixel is 4. Taking into account wrapping of the 8 bit components, it can be shown that the context can have values 0 ... 8.

To code the example pixel, the deltas are coded as XXXX-0101-1010-1101, where XXXX is the code to store the context value 4, and the remaining bits are the binary representation of the unsigned (r,g,b) deltas.

We could always code the context value using CONTEXT_BITS (== 4) bits, but since log_2(9) = 3.17 < 4, we waste a lot, so we try to use less bits for the context value in the common cases. It is coded depended on the previous context value, because we expect that noisy pixels are surrounded by noisy pixels and flat pixels are surrounded by flat pixels.

In staticdCount, the number of bits for each code is listed. Each row is for one previous context, the column is for the context value to code (the table is 16x16 instead of 9x9 due to alignment). For our example, let's assume the previous pixel had the context value 3, then according to staticdCount[3][4] we would need 2 bits to code the 3→4 context value update. Our deltas then need 14 bits: XX-0101-1010-1101.

IZ uses fixed canonical Huffman codes, with up to MAX_CODE_LENGTH bits (6 in the implementation). If we use less than 4 bits for the common cases, we may need more than 4 bits for the uncommon cases. The maximum code length of 6 bits was found by experimentation, but the source code in table.cpp also suggests a code table for MAX_CODE_LENGTH == 4.

To understand the table generation, let's look at the first row (1, 3, 2, 5, 5, 6, 6, 6, 6) for previous context 0. It says that we store a single bit for context update 0→0, three bits for context update 0→1, two bits for context update 0→2 and so on.

staticdBits contains the actual bits for the canonical Huffman codes. For the first row in the table, the codes are: 0, 110, 10, 11100, 11101, 111100, 111101, 111110, 111111. (Exercise: Order these by code length to confirm they are canonical!) Sorting uses plain old qsort() in the implementation.

decodeTable needs to get the context value from the bits in the stream. If your stream starts with the bits 110110110110... then you would see that it is indeed context value 1 (leading 110 bits). You only need to examine up to MAX_CODE_LENGTH bits, which means decodeTable has MAX_CODE_VALUE (== 64) entries for each previous context. Eight of them (110xxx) all map to the value 1. The first 32 of them (0xxxxx) map to the value 0, and so on. Enable CHECKTABLE to see the resulting table entries; they start with 32x 0, then 16x 2, then 8x 1 etc.

All code tables are static. An improvement would be to dynamically adapt tables to the actual image statistics. That is done in Alex's Qic coder; I never got around to implement that for IZ.

Please ask if something remains unclear.

@exilef
Copy link
Author

exilef commented Oct 30, 2018

Thanks a lot for your detailed explanation! I now much better understand what is going on under the hood!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants