-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get raw image data #4
Comments
TIFF files store compressed image data in strips or tiles. You can extract raw data of strip/tile using this code: // Open TIFF file
using var tiff = await TiffFileReader.OpenAsync(@"C:\Test\1.tif");
using var fieldReader = await tiff.CreateFieldReaderAsync();
var ifd = await tiff.ReadImageFileDirectoryAsync();
var tagReader = new TiffTagReader(fieldReader, ifd);
// Get offsets to the strip/tile data
TiffValueCollection<ulong> offsets, byteCounts;
if (ifd.Contains(TiffTag.TileOffsets))
{
offsets = await tagReader.ReadTileOffsetsAsync();
byteCounts = await tagReader.ReadTileByteCountsAsync();
}
else if (ifd.Contains(TiffTag.StripOffsets))
{
offsets = await tagReader.ReadStripOffsetsAsync();
byteCounts = await tagReader.ReadStripByteCountsAsync();
}
else
{
throw new InvalidDataException("This TIFF file is neither striped or tiled.");
}
if (offsets.Count != byteCounts.Count)
{
throw new InvalidDataException();
}
// Extract strip/tile data
using var contentReader = await tiff.CreateContentReaderAsync();
int count = offsets.Count;
for (int i = 0; i < count; i++)
{
long offset = (long)offsets[i];
int byteCount = (int)byteCounts[i];
byte[] data = ArrayPool<byte>.Shared.Rent(byteCount);
try
{
await contentReader.ReadAsync(offset, data.AsMemory(0, byteCount));
using var fs = new FileStream(@$"C:\Test\extracted-{i}.dat", FileMode.Create, FileAccess.Write);
await fs.WriteAsync(data, 0, byteCount);
}
finally
{
ArrayPool<byte>.Shared.Return(data);
}
} You also need to read the Compression tag, the PhotometricInterpretation tag and other tags if you want to interpret the pixels in the extracted data. For example, extracting an LZW/Deflate-compressed TIFF using this code will give you compressed raw pixel data. While extracting JPEG-compressed data will give you JPEG streams (which may or may not contains JPEG table definitions) that can further be decoded using JPEG decoder. |
One question. Lets say that I want to extract one entire subimage/idf into memory (so not just individual tiles). Is that possible with this library ? Should something like this work:
I am not sure is every tile a valid (.jpeg lets say its tiff/jpeg) AND the ifd/subimage is a valid .jpeg, or only tiles ? |
To read the entire image in an IFD into the memory, you can simply stick to using TiffFileReader tiff = await TiffFileReader.OpenAsync(@"C:\Data\test.tif");
TiffImageDecoder decoder = await tiff.CreateImageDecoderAsync();
// Create an array to store the pixels
TiffRgba32[] pixels = new TiffRgba32[decoder.Width * decoder.Height];
// Decode
var pixelBuffer = TiffPixelBuffer.Wrap(pixels, decoder.Width, decoder.Height);
await decoder.DecodeAsync(pixelBuffer); According to TIFF 6 specification, An IFD contains information about the image, as well as pointers to the actual image data (stored in strips/tiles). So it's not the IFD that contains JPEG streams in TIFF/JPEG file. Rather, each tile/strip contains a single JPEG stream, and an IFD points to multiple tiles/strips. A side node here: It is a bit complicated if you want to extract raw JPEG streams fron the tiles/strips and feed these streams directly into other JPEG decoder, because it involves dealing with the JPEGTables tag. (details in TIFF Technical Note #2) I think it is worth it to write a sample program to demonstrate how to do this. |
This is expected. According to section 15 of the TIFF Specification, all tiles in an image are the same size. Boundary tiles are padded to the tile boundaries. TIFF readers should display only the pixels defined by ImageWidth and ImageLength and ignore any padded pixels. As for the padded pixels, it is up to the author of the encoding program to decide how they are generated. They can be empty pixels (all 0s) or duplicated from nearby pixels. The specification claims that some compression schemes work best if the padding is accomplished by replicating the last column and last row instead of padding with 0s. In your case, the encoder decides to generate padded pixels from adjacent tiles. I don't know the reasoning behind this decision. Maybe the author believes that it contributes to better compression, or simply because it is easier to implement than replicating the last column and last row. Anyway, how the tiles are padded should not affect the image decoded by other TIFF readers. |
I have one LZW compressed stripped subimage. It has one LZW specific tag - Predictor , but I dont know what to do with it. Do you know by any chance of a C# lib to easily decompress the LZW compressed byte array ? |
For the Predictor tag, please refer to section 14 of the TIFF Specification. You can also refer to the implementation (encoder and decoder) in this library. For the LZW-compressed data question, SharpZipLib is expecting a header at the start of the LZW-compressed data. It uses this header to determine the maximum code length (see here). However, TIFF files don't contain this header and use a fixed code length of 12 bits according to section 13 of the specification. This is why SharpZipLib can not decompress LZW data in TIFF files. The workaround would be to manually add the required header before the LZW-compressed data extracted from TIFF files. (not tested and I don't know if there are other caveats) I don't known if there are any ready-to-use packages available for this scenario. But you can easily copy the LZW decoding code from this library and use in your application. The files you would need are TiffLzwDecoderLeastSignificantBitFirst.cs and TiffLzwDecoderMostSignificantBitFirst.cs. You can refer to this file to see how they are used, and integrate these code into your decoding pipeline. |
O great wizard of the tiff, On one hand - it seems that it is working ok as the BUT, I cannot create an image out of this byte array, I use the CMU-1.svs as my test data reference. Manually copy pasting the values gives some output but its not the real image. Simplifyed, this:
P.S |
This image is encoded with // undo horizontal differencing
for (int row = 0; row < height; row++)
{
int rowOffset = 3 * row * width; // 3 components (r,g,b)
uint r = decodedOutput[rowOffset];
uint g = decodedOutput[rowOffset + 1];
uint b = decodedOutput[rowOffset + 2];
for (int col = 1; col < width; col++)
{
int offset = rowOffset + 3 * col;
r += decodedOutput[offset];
g += decodedOutput[offset + 1];
b += decodedOutput[offset + 2];
decodedOutput[offset] = (byte)r;
decodedOutput[offset + 1] = (byte)g;
decodedOutput[offset + 2] = (byte)b;
}
}
// After this you can copy the pixels into your Bitmap using your posted code |
Both When you already obtained decoded pixel data (by using my code above) and you want to construct a There are still some optimization opportunities such as copying pixels row by row instead of one by one. I believe this can be accomplish by using |
I have one additional question. As can be seen here: tiles from CMU1.svs file seem to have some specific color behaviour. They look a bit pinky. These raw tiles themselves cannot be opened as jpgs at all, unless one pastes the (a bit cleaned) quantization tables into the tiles. In CMU-1.svs, in tiled IFDs, YCbCr is used as TiffPhotometricInterpretation. But in some other files this TiffPhotometricInterpretation is also used and yet they are not making me these color problems. They sometimes have Huffman tables missing, but after adding them they appear to be valid jpegs. CMU-1.svs additionaly has the Tag YCbCrSubSampling defined (as [2,2]), so maybe that means that file expects me to do some postprocessing to convert the raw byte array into RGB. In any case, I am confused beacuse I have:
Do you have any advice ? |
Does the TiffLibrary provide a way to access the raw image data as byte array? Asking that because I need to extract the images from the Tiff.
Edit:
I think
MemoryMarshal.AsBytes(span)
will work as expected.It won't work because I need the image data as a whole not just the pixel data. So the question remains.
The text was updated successfully, but these errors were encountered: