Skip to content

Intel QPL v1.6.0

Latest
Compare
Choose a tag to compare
@mzhukova mzhukova released this 17 Jul 22:31
c26fb69

Functionality

  • Introduced a new internal submission mechanism for platforms based on Linux* OS kernel versions where MMAP is no longer permitted. For more details, refer to the Intel Security Advisory. When MMAP is unavailable, the write system call is used instead. This may introduce additional overhead for small data sizes (4KB and smaller) in the Inflate functionality, but no performance implications are expected for larger data sizes or Deflate.
  • Updated the QPL device search mechanism to a new default behavior. Now, the platforms with Sub-NUMA clustering configured such that not all NUMA nodes have an accelerator instance can utilize any IAA instance from the same socket for execution unless specified by the user. You still can restrict device selection to a specific NUMA node of the current thread by specifying QPL_DEVICE_NUMA_ID_CURRENT, or to a specific NUMA node by setting job->numa_id = <numa_node_id>. Additionally, you can extend the entire system by setting QPL_DEVICE_NUMA_ID_ANY.
  • Added support for host fallback in the asynchronous API when using the Auto Path feature.
  • Implemented an internal mechanism to save intermediate job states in the dynamic Deflate job. This feature prevents duplicate work when executing with the synchronous API on the Hardware Path and encountering the QPL_STS_QUEUES_ARE_BUSY_ERR error. In such cases, the job is resubmitted without repeating the already completed work.

Usability and Documentation

  • Added support for Canned mode in QPL Benchmarks Frameworks.
  • Optimized memory usage and reduced startup time for benchmarks when utilizing an exact filter.
  • Introduced a new build option -DQPL_USE_CLANG_TIDY={ON,OFF} to enable QPL to build with clang-tidy checks. Clang-tidy support is limited to Linux* OS only and requires building QPL with the Clang* compiler. Additionally, introduced a configuration file for clang-tidy and refactored QPL to comply with the introduced clang-tidy configuration file.
  • Added a new example demonstrating the utilization of dictionary compression with the Hardware Path for compression and the Software Path for decompression.
  • Added new test cases for Select, Scan, and Extract operations to validate the functionality of Force Array Output Modification.
  • Expanded the bad argument scenarios for the Force Array Output Modification tests to include additional cases for the Software Path.
  • Added new tests to validate error handling for bad arguments when submitting jobs on the Hardware Path and Auto Path.

Deprecated Functionality

  • Deprecated support for canned mode with indexing on the Software Path to align with the Hardware Path.

Bug Fixes

  • Resolved the issue with compression verification when utilizing IAA 2.0.
  • Corrected the test setup for auto_path in tb_c_api_deflate_with_dictionary.level_none, tb_c_api_deflate_with_dictionary.hw_multi_chunk, and tn_c_api_deflate.dynamic/fixed/static}_default_stored_block_overflow.
  • Added an execution path check to ensure proper handling of unsupported paths in the Force Array Output Modification.
  • Resolved potential undefined behavior by fixing uninitialized pointers in the canned_one_chuck_hw_vs_sw.cpp test.
  • Removed tests related to the unsupported Software Path for the canned mode with indexing.
  • Fixed invalid parquet generation for tn_c_api_expand.tn_rle_input_error_handling.

Known Limitations

  • Intel(R) QPL could be built from directly downloadable files (.tar, .tgz) without tests and benchmark frameworks, using the -DQPL_BUILD_TESTS=OFF build option. This is because it requires submodules that are not included in the archives by GitHub* during release creation.

  • Known test failures are listed below. Some tests only fail under certain conditions, which are noted in parentheses

    • Functional tests:
      • (software_path, auto_path only on platforms without IAA) ta_c_api_deflate_stateful.{dynamic/fixed/static}_default_verify
      • (software_path, auto_path) ta_c_api_deflate_stateful.{dynamic/fixed/static}_high_verify
      • (hardware_path, auto_path on IAA 2.0) ta_c_api_deflate_index_extended.PerformOperation
      • (auto_path) ta_c_api_huffman_only{_verify./.}{dynamic/static}_be
      • (auto_path) ta_c_api_inflate_huffman_only.generated_data
      • (auto_path) ta_c_api_deflate_index.{dynamic/static}_blocks_default_level_verify
      • (auto_path) tb_c_api_expand.source_errors
      • (auto_path) ta_c_api_deflate_inflate_canned_in_loops.default_level
  • Compression verification on the qpl_path_software works only with indexing mode and data of size smaller than 32KB in other modes.

  • Inflate does not report the error code QPL_STS_BIG_HEADER_ERR when a header is too big to fit in the input buffer.

  • The implementation of QPL_FLAG_CRC32C is in progress.

  • When using qpl_path_hardware, the compression and decompression with indexing mode on IAA 2.0 are limited to data sizes smaller than 4KB.

Thanks to the Contributors

The release includes contributions from the project team and @aekoroglu, @fwph, and @Permanence-AI-Coder.