Releases: ARM-software/astc-encoder
3.0
Status: Released
The 3.0 release is the first in a series of updates to the compressor that are making more radical changes than we felt we could make with the 2.x series. The primary goals of the 3.x series are to keep the image quality the same or better compared to the 2.5 release, but continue to improve performance.
Reminder for users of the library interface - the API is not designed to be binary compatible across versions, and this release is not compatible with earlier releases. Please update and rebuild your client-side code using the updated astcenc.h
header.
- General:
- Feature: The code has been significantly cleaned up, with improved comments, API documentation, function naming, and variable naming.
- Core API:
- API Change: The core APIs for
astcenc_compress_image()
and forastcenc_decompress_image()
now accept swizzle structures byconst
pointer, instead of pass-by-value. - API Change: Calling the
astcenc_compress_reset()
and theastcenc_decompress_reset()
functions between images is no longer required f the context was created for use by a single thread. - Feature: New heuristics have been added for controlling when to search beyond 2 partitions and 1 plane, and when to search beyond 3 partitions and 1 plane. The previous
tune_partition_early_out_limit
config option has been removed, and replaced with two new optionstune_2_partition_early_out_limit_factor
andtune_3_partition_early_out_limit_factor
. See command line help for more detailed documentation. - Feature: New heuristics have been added for controlling when to use dual weight planes. The previous
tune_two_plane_early_out_limit
has been renamed totune_2_plane_early_out_limit_correlation
. See command line help for more detailed documentation. - Feature: Support for using dual weight planes has been restricted to single partition blocks; it rarely helps blocks with 2 or more partitions and takes considerable compression search time.
- API Change: The core APIs for
Performance:
This release includes further performance optimizations which improve performance vs the 2.5 release by between 25% and 75%, depending on image and quality search preset used. Smaller block sizes and higher search qualities benefit the most.
Image quality:
The -medium
and -fast
presets have been tuned to give measurably better image quality. Despite this they are still faster than the equivalent in the 2.5 release.
Binary release sha256 checksums
663f67a2eb85c4eb539857534f32d828aecff770dfb2fe35f2355996cbdf2bdd astcenc-3.0-linux-x64.zip
97ee6fc61a2c203132ad91c5f065a9ead39b6cf38e5530bebba463492a05449b astcenc-3.0-macos-aarch64.zip
006d4b14c9914b9793a1843683f29b42fb22cfc17fb74a5bc8450bba09ff119b astcenc-3.0-macos-x64.zip
40e4f87920c722e5ddd59635a91b651c7e58c352b62864518f52d7e71556b051 astcenc-3.0-windows-x64.zip
2.5
Status: Released
The 2.5 release is the last major release in the 2.x series. After this release a 2.x
branch will provide stable long-term support, and the main
branch will switch to focusing on more radical changes for the 3.x series.
Reminder for users of the library interface - the API is not designed to be stable across versions, and this release is not compatible with earlier 2.x releases. Please update and rebuild your client-side code using the updated astcenc.h
header.
General:
- Feature: The
ISA_INVARIANCE
build option is no longer supported, as there is no longer any performance benefit from the variant paths. All builds are now using the equivalent of theISA_INVARIANCE=ON
setting, and all builds (except Armv7) are now believed to be invariant across operating systems, compilers, CPU architectures, and SIMD instruction sets. - Feature: Armv8 32-bit builds with NEON are now supported, with out-of-the-box support for Arm Linux soft-float and hard-float ABIs. There are no pre-built binaries for these targets; support is included for library users targeting older 32-bit Android and iOS devices.
- Feature: A compressor mode for encoding HDR textures that have been encoded into LDR RGBM wrapper format is now supported. Note that this encoding has some strong recommendations for how the RGBM encoding is implemented to avoid block artifacts in the compressed image.
Core API:
- API Change: The core API has been changed to be a pure C API, making it easier to wrap the codec in a stable shared library ABI. Some entry points that used to accept references now expect pointers.
- API Change: The decompression functionality in the core API has been changed to allow use of multiple threads. The design pattern matches the compression functionality, requiring the caller to create the threads, synchronize them between images, and to call the new
astcenc_decompress_reset()
function between images. - API Feature: Defines to support exporting public API entry point symbols from a shared object are provided, but not exposed off-the-shelf by the CMake provided by the project.
- API Feature: New
astcenc_get_block_info()
function added to the core API to allow users to perform high level analysis of compressed data. This API is not implemented in decompressor-only builds. - API Feature: Codec configuration structure has been extended to expose the new RGBM compression mode. See the API header for details.
Performance:
This release includes further performance optimizations which improve performance vs the 2.4 release by between 15 and 25%, depending on image and quality search preset used.
Image quality:
The -fast
and -fastest
presets have been retuned to give measurably better image quality. Despite this they are still faster than the equivalent in the 2.4 release.
Binary release sha256 checksums
a90396a4151fb1cda949c5179a710acbd6204fd9d4af98a5c36a6c16baec3cf2 astcenc-2.5-linux-x64.zip
a40dfda14ed4a7c7cffc9c1376fe492adf7ed182d6e7d576572b272e97014853 astcenc-2.5-macos-aarch64.zip
4a8108deadce5a12319d6e31a12b08d08b3cbfb207bce8b2b43cdd1cb0288dd3 astcenc-2.5-macos-x64.zip
8f38f6c380131ab5b4a4a6e1bb89f70304a466ef0c95f2efc18746208c83f48b astcenc-2.5-windows-x64.zip
2.4
Status: Released
This release is a patch release fixing an HDR image handling bug in the command line tool (non-square images would load incorrectly).
Command Line:
- Bug fix: The command line wrapper now correctly loads HDR images that have a non-square aspect ratio.
General:
- Feature: When using the
-a
option, or the equivalent config option for the API, any 2D blocks that are entirely zero alpha after the alpha filter radius is taken into account are replaced by transparent black constant color blocks. This is an RDO-like technique to improve compression ratios of any additional application packaging compression that is applied.
Binary release sha256 checksums
95efbe4e87b9b4a6e379f068faa6f69b1f994fe6f3861dc5ae094218d8375892 astcenc-2.4-linux-x64.zip
aa743ac59efe1064f02d001f569e6cb30cbff3fffc89fbbc0b743046a02c72e8 astcenc-2.4-macos-aarch64.zip
76603df0ff71e06e16314a837d9307efc2fb1df81dcea9969755ee6f482d08b8 astcenc-2.4-macos-x64.zip
5099f8ccdd0d323629d7c92715f132f1366a932027e902e0e76430027c36c742 astcenc-2.4-windows-x64.zip
2.3
Status: Released
The 2.3 release is the fourth release in the 2.x series. It includes a number of performance improvements and new features.
Reminder for users of the library interface - the API is not designed to be stable across versions, and this release is not compatible with 2.2. Please recompile your client-side code using the updated astcenc.h
header.
- General:
- Feature: Decompressor-only builds of the codec are supported again. While this is primarily a feature for library users who want to shrink binary size, a variant command line tool
astcdec
can be built by specifyingDECOMPRESSOR=ON
on the CMake configure command line. - Feature: Diagnostic builds of the codec can now be built. These builds generate a JSON file containing a trace of the compressor execution. Diagnostic builds are only suitable for codec development; they are slower and JSON generation cannot be disabled. Build by setting
DIAGNOSTICS=ON
on the CMake configure command line. - Feature: Code compatibility improved with older versions of GCC, earliest compiler now tested is GCC 7.5 (was GCC 9.3).
- Feature: Code compatibility improved with newer versions of LLVM, latest compiler now tested is Clang 12.0 (was Clang 9.0).
- Feature: Code compatibility improved with the Visual Studio 2019 LLVM toolset (
clang-cl
). Using the LLVM toolset gives 25% performance improvements and is recommended.
- Feature: Decompressor-only builds of the codec are supported again. While this is primarily a feature for library users who want to shrink binary size, a variant command line tool
- Command Line:
- Feature: Quality level now accepts either a preset (
-fast
, etc) or a float value between 0 and 100, allowing more control over the compression quality vs performance trade-off. The presets are not evenly spaced in the float range; they have been spaced to give the best distribution of points between the fast and thorough presets.-fastest
: 0.0-fast
: 10.0-medium
: 60.0-thorough
: 98.0-exhaustive
: 100.0
- Feature: Quality level now accepts either a preset (
- Core API:
- API Change: Quality level preset enum replaced with a float value between 0 (
-fastest
) and 100 (-exhaustive
). See above for more info.
- API Change: Quality level preset enum replaced with a float value between 0 (
Performance
This release includes a number of optimizations to improve performance.
- New compressor algorithm for handling encoding candidates and refinement.
- Vectorized implementation of
compute_error_of_weight_set()
. - Unrolled implementation of
encode_ise()
. - Many other small improvements!
The most significant change is the change to the compressor path, which now uses an adaptive approach to candidate trials and block refinement.
In earlier releases the quality level will determine the number of encoding candidates and the number of iterative refinement passes that are used for each major encoding trial. This is a fixed behavior; it will always try the full N candidates and M refinement iterations specified by the quality level for each encoding trial.
The new approach implements two optimizations for this:
- Compression will complete when a block candidate hits the specified target quality, after its M refinement iterations have been applied. Later block candidates are simply abandoned.
- Block candidates will predict how much refinement can improve them, and abandon refinement if they are unlikely to improve upon the best known encoding already in-hand.
This pair of optimizations provides significant performance improvement to the high quality modes which use the most block candidates and refinement iterations. A minor loss of image quality is expected, as the blocks we no longer test or refine may have been better coding choices.
Performance benefits vary based on image color format, ASTC quality settings, and ASTC block size.
Compressing using -thorough
is typically between 1.3x and 1.6x faster than asctenc 2.2.
Compressing using -medium
is typically between 1.2x and 1.5x faster than asctenc 2.2.
Compressing using -fast
is typically between 1.1x and 1.3x faster than asctenc 2.2.
Compressing using -fastest
is typically between 1.1x and 1.2x faster than asctenc 2.2.
Binary release sha256 checksums
b31188a4c0343751354a7f68def833112fd1695e38b7f5da212efe11bd59e957 astcenc-2.3-linux-x64.zip
d6a1c8c437bc09dbf9113ac26bbc60856d7f2c974e5a0bfd3d737f51c1b9e318 astcenc-2.3-macos-aarch64.zip
eca84b57cc3ee653b809e30da999eac4f0525134600ba8b4a767cb9237f8bcad astcenc-2.3-macos-x64.zip
1fd28ddafffdc5cdc6088c0129b9bfc1eac5e72b2ccfb4580195efe05af2c924 astcenc-2.3-windows-x64.zip
2.2
Status: Released, January 2021
The 2.2 release is the third release in the 2.x series. It includes a number of performance improvements and new features.
Reminder for users of the library interface - the API is not designed to be stable across versions, and this release is not compatible with 2.1. Please recompile your client-side code using the updated astcenc.h
header.
- General:
- Feature: New Arm aarch64 NEON accelerated vector library support.
- Improvement: New CMake build system for all platforms.
- Improvement: SSE4.2 feature profile changed to SSE4.1, which more accurately reflects the feature set used.
- Binary releases:
- Improvement: Linux binaries changed to use Clang 9.0.
- Improvement: Windows binaries are now code signed.
- Improvement: macOS binaries for Apple Silicon platforms now provided.
- Improvement: macOS binaries are now code signed and notarized.
- Command Line:
- Feature: New image preprocess
-pp-normalize
option added. - Feature: New image preprocess
-pp-premultiply
option added. - Improvements: Cleaner error handling for corrupt images.
- Feature: New image preprocess
- Core API:
- API Change: Images no longer need to include padding. All input images should be tightly packed. The
dim_pad
field is removed
from theastcenc_image
structure. - API Change: Image
data
is no longer a 3D array accessed usingdata[z][y][x]
indexing, it's an array of N packed 2D slices. - API Change: New
ASTCENC_FLG_SELF_DECOMPRESS_ONLY
flag added to the codec config. Using this flag enables additional optimizations that aggressively exploit implementation- and configuration-specific, behavior. When using this flag the codec can only reliably decompress images that were compressed in the same context session.
- API Change: Images no longer need to include padding. All input images should be tightly packed. The
Performance
There is one major set of optimizations in this release, related to the new ASTCENC_FLG_SELF_DECOMPRESS_ONLY
mode. These allow the compressor to only create data tables it knows that it is going to use, based on its current set of heuristics, rather than needing the full set the format allows.
The first benefit of these changes is reduced context creation time. This can be a significant percentage of the command line utility runtime when compressing a small image and/or when using a quick search preset. Compressing the whole Kodak test suite using the command
line utility and the -fastest
preset is ~30% faster with this release, which is mostly due to faster context creation.
The reduction in the data table size in this mode also improves the core codec speed. Our test sets show an average of 12% improvement in the codec for -fastest
mode, and an average of 3% for -medium
mode.
Binary release sha256 checksums
d3e5fe5dd0cad92ae12406654c1f7de563d042b5130c064de852c6293ffdcda2 astcenc-2.2-linux-x64.zip
ff5e095609db0d08560c3fbf0bc6ed19484d77ed0112f822df61305d3345383e astcenc-2.2-macos-aarch64.zip
5662773b923b5ffa0b5c907afd9c5599c977d4736af4bab1f77cbf92a5902a84 astcenc-2.2-macos-x64.zip
733b74264ec3da8fb243cd2720f06b20b030c6d38e330981b8f2fe0531af4345 astcenc-2.2-windows-x64.zip
2.1
The 2.1 release is the second release in the 2.x series. It includes another set of significant performance optimizations and number of smaller new features.
astcenc.h
header.
Features:
- Command line:
- Bug fix: The meaning of the
-tH\cH\dH
and-th\ch\dh
compression modes was inverted. They now match the documentation; use-*H
for HDR RGBA, and-*h
for HDR RGB with LDR alpha. - Feature: A new
-fastest
quality preset is now available. This is designed for fast "roughing out" of new content, and sacrifices significant image quality compared to-fast
. We do not recommend its use for production builds. - Feature: A new
-candidatelimit
compression tuning option is now available. This is a power-user control to determine how many candidates are returned for each block mode encoding trial. This feature is used automatically by the search presets; see-help
for details. - Improvement: The compression test modes (
-tl\ts\th\tH
) now emit a MTex/s performance metric, in addition to coding time.
- Bug fix: The meaning of the
- Core API:
- Feature: A new quality preset
ASTCENC_PRE_FASTEST
is available. See-fastest
above for details. - Feature: A new tuning option
tune_candidate_limit
is available in the config structure. See-candidatelimit
above for details. - Feature: Image input/output can now use
ASTCENC_TYPE_F32
data types.
- Feature: A new quality preset
- Stability:
- Feature: The SSE2, SSE4.2, and AVX2 variants now produce identical compressed output when run on the same CPU when compiled with the preprocessor define
ASTCENC_ISA_INVARIANCE=1
. For Make builds this can be set on the command line by settingISA_INV=1
. ISA invariance is off by default; it reduces performance by 1-3%.
- Feature: The SSE2, SSE4.2, and AVX2 variants now produce identical compressed output when run on the same CPU when compiled with the preprocessor define
Performance
Performance benefits vary based on image color format, ASTC quality settings, and ASTC block size.
- Compressing using
-thorough
is between 1.3x and 1.6x faster than asctenc 2.0. - Compressing using
-medium
is between 1.4x and 2.1x faster than asctenc 2.0. - Compressing using
-fast
is between 1.4x and 2.4x faster than asctenc 2.0. - Compressing using the new
-fastest
setting is 2.1x - 3.8x faster than using astcenc 2.0-fast
.
2.0
This is first release of the astcenc 2.x series, providing a number of major improvements for content creators.
The core codec performance is now up to three times faster, with optimized builds for the SSE2, SSE4.2, and AVX2 instruction sets.
The core codec now has a clean library API separating the command line wrapper and the main algorithm core. This makes it easier to directly integrate the compressor into other applications.
The command line tool supports a wider range of input and output file formats for both uncompressed files and compressed files.
Note: this release is not command line compatible with the 1.x series. Some options have been renamed and some options have been removed.
1.7
- Re-enable JPEG and GIF image support
1.6
1.5
Update all binaries to source version a4cfd6b