From 75b6f35fea5e4efd202a17ae0a8a9610615f9e06 Mon Sep 17 00:00:00 2001 From: Yongbing Wang <981756644@qq.com> Date: Tue, 7 Jan 2025 17:30:29 +0800 Subject: [PATCH] [Doc]Fix wrong pr url in release notes. (#54774) (cherry picked from commit 67cdf03c4c8091145b745ec5fce0812010e14ddf) # Conflicts: # docs/en/release_notes/release-3.2.md # docs/zh/release_notes/release-3.2.md --- docs/en/release_notes/release-3.2.md | 655 +++++++++++++++++++++++++++ docs/zh/release_notes/release-3.2.md | 649 ++++++++++++++++++++++++++ 2 files changed, 1304 insertions(+) create mode 100644 docs/en/release_notes/release-3.2.md create mode 100644 docs/zh/release_notes/release-3.2.md diff --git a/docs/en/release_notes/release-3.2.md b/docs/en/release_notes/release-3.2.md new file mode 100644 index 0000000000000..5de456df541e2 --- /dev/null +++ b/docs/en/release_notes/release-3.2.md @@ -0,0 +1,655 @@ +--- +displayed_sidebar: docs +--- + +# StarRocks version 3.2 + +## 3.2.13 + +Release date: December 13, 2024 + +### Improvements + +- Supports setting a time range within which Base Compaction is forbidden for a specific table. [#50120](https://github.com/StarRocks/starrocks/pull/50120) + +### Bug Fixes + +Fixed the following issues: + +- The `loadRowsRate` field returned `0` after executing SHOW ROUTINE LOAD. [#52151](https://github.com/StarRocks/starrocks/pull/52151) +- The `Files()` function read columns that were not queried. [#52210](https://github.com/StarRocks/starrocks/pull/52210) +- Prometheus failed to parse materialized view metrics with special characters in their names. (Now materialized view metrics support tags.) [#52782](https://github.com/StarRocks/starrocks/pull/52782) +- The `array_map` function caused BE to crash. [#52909](https://github.com/StarRocks/starrocks/pull/52909) +- Metadata Cache issues caused BE to crash. [#52968](https://github.com/StarRocks/starrocks/pull/52968) +- Routine Load tasks were canceled due to expired transactions. (Now tasks are canceled only if the database or table no longer exists). [#50334](https://github.com/StarRocks/starrocks/pull/50334) +- Stream Load failures when submitted using HTTP 1.0. [#53010](https://github.com/StarRocks/starrocks/pull/53010) [#53008](https://github.com/StarRocks/starrocks/pull/53008) +- Issues related to Glue and S3 integration: [#48433](https://github.com/StarRocks/starrocks/pull/48433) + - Some error messages did not display the root cause. + - Error messages for writing to a Hive partitioned table with the partition column of type STRING when Glue was used as the metadata service. + - Dropping Hive tables failed without proper error messages when the user lacked sufficient permissions. +- The `storage_cooldown_time` property for materialized views did not take effect when set to `maximum`. [#52079](https://github.com/StarRocks/starrocks/pull/52079) + +## 3.2.12 + +Release date: October 23, 2024 + +### Improvements + +- Optimized memory allocation and statistics in BE for certain complex query scenarios to avoid OOM. [#51382](https://github.com/StarRocks/starrocks/pull/51382) +- Optimized memory usage in FE in Schema Change scenarios. [#50855](https://github.com/StarRocks/starrocks/pull/50855) +- Optimized the job status display when querying the system-defined view `information_schema.routine_load_jobs` from Follower FE nodes. [#51763](https://github.com/StarRocks/starrocks/pull/51763) +- Supports Backup and Restore of with the List partitioned tables. [#51993](https://github.com/StarRocks/starrocks/pull/51993) + +### Bug Fixes + +Fixed the following issues: + +- The error message was lost after writing to Hive failed. [#33167](https://github.com/StarRocks/starrocks/pull/33167) +- The `array_map` function causes a crash when excessive constant parameters are used. [#51244](https://github.com/StarRocks/starrocks/pull/51244) +- Special characters in the PARTITION BY columns of expression partitioned tables cause FE CheckPoint failures. [#51677](https://github.com/StarRocks/starrocks/pull/51677) +- Accessing the system-defined view `information_schema.fe_locks` causes a crash. [#51742](https://github.com/StarRocks/starrocks/pull/51742) +- Querying generated columns causes an error. [#51755](https://github.com/StarRocks/starrocks/pull/51755) +- Optimize Table fails when the table name contains special characters. [#51755](https://github.com/StarRocks/starrocks/pull/51755) +- Tablets could not be balanced in certain scenarios. [#51828](https://github.com/StarRocks/starrocks/pull/51828) + +### Behavior Changes + +- Supports dynamic modification of Backup and Restore-related parameters.[#52111](https://github.com/StarRocks/starrocks/pull/52111) + +## 3.2.11 + +Release date: September 9, 2024 + +### Improvements + +- Supports masking authentication information for Files() and PIPE. [#47629](https://github.com/StarRocks/starrocks/pull/47629) +- Support automatic inference for the STRUCT type when reading Parquet files through Files(). [#50481](https://github.com/StarRocks/starrocks/pull/50481) + +### Bug Fixes + +Fixed the following issues: + +- An error is returned for equi-join queries because they failed to be rewritten by the global dictionary. [#50690](https://github.com/StarRocks/starrocks/pull/50690) +- The error "version has been compacted" caused by an infinite loop on the FE side during Tablet Clone. [#50561](https://github.com/StarRocks/starrocks/pull/50561) +- Incorrect scheduling for unhealthy replica repairs after distributing data based on labels. [#50331](https://github.com/StarRocks/starrocks/pull/50331) +- An error in the statistics collection log: "Unknown column '%s' in '%s." [#50785](https://github.com/StarRocks/starrocks/pull/50785) +- Incorrect timezone usage when reading complex types like TIMESTAMP from Parquet files via Files(). [#50448](https://github.com/StarRocks/starrocks/pull/50448) + +### Behavior Changes + +- When downgrading StarRocks from v3.3.x to v3.2.11, the system will ignore it if there is incompatible metadata. [#49636](https://github.com/StarRocks/starrocks/pull/49636) + +## 3.2.10 + +Release date: August 23, 2024 + +### Improvements + +- Files() will automatically convert `BYTE_ARRAY` data with a `logical_type` of `JSON` in Parquet files to the JSON type in StarRocks. [#49385](https://github.com/StarRocks/starrocks/pull/49385) +- Optimized error messages for Files() when Access Key ID and Secret Access Key are missing. [#49090](https://github.com/StarRocks/starrocks/pull/49090) +- `information_schema.columns` supports the `GENERATION_EXPRESSION` field. [#49734](https://github.com/StarRocks/starrocks/pull/49734) + +### Bug Fixes + +Fixed the following issues: + +- Downgrading a v3.3 shared-data cluster to v3.2 after setting the Primary Key table property `"persistent_index_type" = "CLOUD_NATIVE"` causes a crash. [#48149](https://github.com/StarRocks/starrocks/pull/48149) +- Exporting data to CSV files using SELECT INTO OUTFILE may cause data inconsistency. [#48052](https://github.com/StarRocks/starrocks/pull/48052) +- Queries encounter failures during concurrent query execution. [#48180](https://github.com/StarRocks/starrocks/pull/48180) +- Queries would hang due to a timeout in the Plan phase without exiting. [#48405](https://github.com/StarRocks/starrocks/pull/48405) +- After disabling index compression for Primary Key tables in older versions and then upgrading to v3.2.9, accessing `page_off` information causes an array out-of-bounds crash. [#48230](https://github.com/StarRocks/starrocks/pull/48230) +- BE crash caused by concurrent execution of ADD/DROP COLUMN operations. [#49355](https://github.com/StarRocks/starrocks/pull/49355) +- Queries against negative `TINYINT` values in ORC format files return `None` on the aarch64 architecture. [#49517](https://github.com/StarRocks/starrocks/pull/49517) +- If the disk write operation fails, failures of `l0` snapshots for Primary Key Persistent Index may cause data loss. [#48045](https://github.com/StarRocks/starrocks/pull/48045) +- Partial Update in Column mode for Primary Key tables fails under scenarios with large-volume data updates. [#49054](https://github.com/StarRocks/starrocks/pull/49054) +- BE crash caused by Fast Schema Evolution when downgrading a v3.3.0 shared-data cluster to v3.2.9. [#42737](https://github.com/StarRocks/starrocks/pull/42737) +- `partition_linve_nubmer` does not take effect. [#49213](https://github.com/StarRocks/starrocks/pull/49213) +- The conflict between index persistence and compaction in Primary Key tables could cause clone failures. [#49341](https://github.com/StarRocks/starrocks/pull/49341) +- Modifications of `partition_line_number` using ALTER TABLE do not take effect. [#49437](https://github.com/StarRocks/starrocks/pull/49437) +- Rewrite of CTE distinct grouping sets generates an invalid plan. [#48765](https://github.com/StarRocks/starrocks/pull/48765) +- RPC failures polluted the thread pool. [#49619](https://github.com/StarRocks/starrocks/pull/49619) +- authentication failure issues when loading files from AWS S3 via PIPE. [#49837](https://github.com/StarRocks/starrocks/pull/49837) + +### Behavior Changes + +- Added a check for the `meta` directory in the FE startup script. If the directory does not exist, it will be automatically created. [#48940](https://github.com/StarRocks/starrocks/pull/48940) +- Added a memory limit parameter `load_process_max_memory_hard_limit_ratio` for data loading. If memory usage exceeds the limit, subsequent loading tasks will fail. [#48495](https://github.com/StarRocks/starrocks/pull/48495) + +## 3.2.9 + +Release date: July 11, 2024 + +### New Features + +- Paimon tables now support DELETE Vectors. [#45866](https://github.com/StarRocks/starrocks/issues/45866) +- Supports Column-level access control through Apache Ranger. [#47702](https://github.com/StarRocks/starrocks/pull/47702) +- Stream Load can automatically convert JSON strings into STRUCT/MAP/ARRAY types during loading. [#45406](https://github.com/StarRocks/starrocks/pull/45406) +- JDBC Catalog now supports Oracle and SQL Server. [#35691](https://github.com/StarRocks/starrocks/issues/35691) + +### Improvements + +- Improved privilege management by restricting `user_admin` role users from resetting the password of the root user. [#47801](https://github.com/StarRocks/starrocks/pull/47801) +- Stream Load now supports using `\t` and `\n` as row and column delimiters. Users do not need to convert them to their hexadecimal ASCII codes. [#47302](https://github.com/StarRocks/starrocks/pull/47302) +- Optimized memory usage during data loading. [#47047](https://github.com/StarRocks/starrocks/pull/47047) +- Supports masking authentication information for the Files() function in audit logs. [#46893](https://github.com/StarRocks/starrocks/pull/46893) +- Hive tables now support the `skip.header.line.count` property. [#47001](https://github.com/StarRocks/starrocks/pull/47001) +- JDBC Catalog supports more data types. [#47618](https://github.com/StarRocks/starrocks/pull/47618) + +### Behavior Changes + +- Changed the value inheritance order of the `JAVA_OPTS` parameters. If versions other than JDK_9 or JDK_11 are used, users need to configure `JAVA_OPTS` directly. [#47495](https://github.com/StarRocks/starrocks/pull/47495) +- When users create a non-partitioned table without specifying the bucket number, the minimum bucket number the system sets for the table is `16` (instead of `2` based on the formula `2*BE or CN count`). If users want to set a smaller bucket number when creating a small table, they must set it explicitly. [#47005](https://github.com/StarRocks/starrocks/pull/47005) +- When users create a partitioned table without specifying the bucket number, if the number of partitions exceeds 5, the rule for setting the bucket count is changed to `max(2*BE or CN count, bucket number calculated based on the largest historical partition data volume)`. The previous rule was to calculate the bucket number based on the largest historical partition data volume. [#47949](https://github.com/StarRocks/starrocks/pull/47949) + +### Bug Fixes + +Fixed the following issues: + +- BE crash caused by ALTER TABLE ADD COLUMN after upgrading a shared-data cluster from v3.2.x to v3.3.0 and then rolling it back. [#47826](https://github.com/StarRocks/starrocks/pull/47826) +- Tasks initiated through SUBMIT TASK showed a Running status indefinitely in the QueryDetail interface. [#47619](https://github.com/StarRocks/starrocks/pull/47619) +- Forwarding queries to the FE Leader node caused a null pointer exception. [#47559](https://github.com/StarRocks/starrocks/pull/47559) +- SHOW MATERIALIZED VIEWS with WHERE conditions caused a null pointer exception. [#47811](https://github.com/StarRocks/starrocks/pull/47811) +- Vertical Compaction fails for Primary Key tables in shared-data clusters. [#47192](https://github.com/StarRocks/starrocks/pull/47192) +- Improper handling of I/O Error when sinking data to Hive or Iceberg tables. [#46979](https://github.com/StarRocks/starrocks/pull/46979) +- Table properties do not take effect when whitespaces are added to their values. [#47119](https://github.com/StarRocks/starrocks/pull/47119) +- BE crash caused by concurrent migration and Index Compaction operations on Primary Key tables. [#46675](https://github.com/StarRocks/starrocks/pull/46675) + +## 3.2.8 + +Release date: June 7, 2024 + +### New Features + +- **[Supports adding labels on BEs](https://docs.starrocks.io/docs/3.2/administration/management/resource_management/be_label/)**: Supports adding labels on BEs based on information such as the racks and data centers where BEs are located. It ensures even data distribution among racks and data centers, and facilitates disaster recovery in case of power failures in certain racks or faults in data centers. [#38833](https://github.com/StarRocks/starrocks/pull/38833) + +### Bug Fixes + +Fixed the following issues: + +- An error is returned when users DELETE data rows from tables that use the expression partitioning method with str2date. [#45939](https://github.com/StarRocks/starrocks/pull/45939) +- BEs in the destination cluster crash when the StarRocks Cross-cluster Data Migration Tool fails to retrieve the Schema information from the source cluster. [#46068](https://github.com/StarRocks/starrocks/pull/46068) +- The error `Multiple entries with same key` is returned to queries with non-deterministic functions. [#46602](https://github.com/StarRocks/starrocks/pull/46602) + +## 3.2.7 + +Release date: May 24, 2024 + +### New Features + +- Stream Load supports data compression during transmission, reducing network bandwidth overhead. Users can specify different compression algorithms using parameters `compression` and `Content-Encoding`. Supported compression algorithms including GZIP, BZIP2, LZ4_FRAME, and ZSTD. [#43732](https://github.com/StarRocks/starrocks/pull/43732) +- Optimized the garbage collection (GC) mechanism in shared-data clusters. Supports manual compaction for tables or partitions stored in object storage. [#39532](https://github.com/StarRocks/starrocks/issues/39532) +- Flink connector supports reading complex data types ARRAY, MAP, and STRUCT from StarRocks. [#42932](https://github.com/StarRocks/starrocks/pull/42932) [#347](https://github.com/StarRocks/starrocks-connector-for-apache-flink/pull/347) +- Supports populating Data Cache asynchronously during queries, reducing the impact of populating cache on query performance. [#40489](https://github.com/StarRocks/starrocks/pull/40489) +- ANALYZE TABLE supports collecting histograms for external tables, effectively addressing data skews. For more information, see [CBO statistics](https://docs.starrocks.io/docs/3.2/using_starrocks/Cost_based_optimizer/#collect-statistics-of-hiveiceberghudi-tables). [#42693](https://github.com/StarRocks/starrocks/pull/42693) +- Lateral Join with [UNNEST](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/array-functions/unnest/) supports LEFT JOIN. [#43973](https://github.com/StarRocks/starrocks/pull/43973) +- Query Pool supports configuring memory usage threshold that triggers spilling via BE static parameter `query_pool_spill_mem_limit_threshold`. Once the threshold is reached, intermediate results of queries will be spilled to disks to reduce memory usage, thus avoiding OOM. +- Supports creating asynchronous materialized views based on Hive views. + +### Improvements + +- Optimized the error message returned for Broker Load tasks when there is no data under the specified HDFS paths. [#43839](https://github.com/StarRocks/starrocks/pull/43839) +- Optimized the error message returned when the Files function is used to read data from AWS S3 without Access Key and Secret Key specified. [#42450](https://github.com/StarRocks/starrocks/pull/42450) +- Optimized the error message returned for Broker Load tasks that load no data to any partitions. [#44292](https://github.com/StarRocks/starrocks/pull/44292) +- Optimized the error message returned for INSERT INTO SELECT tasks when the column count of the destination table does not match that in the SELECT statement. [#44331](https://github.com/StarRocks/starrocks/pull/44331) + +### Bug Fixes + +Fixed the following issues: + +- Concurrent read or write of the BITMAP-type data may cause BE to crash. [#44167](https://github.com/StarRocks/starrocks/pull/44167) +- Primary key indexes may cause BE to crash. [#43793](https://github.com/StarRocks/starrocks/pull/43793) [#43569](https://github.com/StarRocks/starrocks/pull/43569) [#44034](https://github.com/StarRocks/starrocks/pull/44034) +- Under high query concurrency scenarios, the str_to_map function may cause BE to crash. [#43901](https://github.com/StarRocks/starrocks/pull/43901) +- When the Masking policy of Apache Ranger is used, an error is returned when table aliases are specified in queries. [#44445](https://github.com/StarRocks/starrocks/pull/44445) +- In shared-data clusters, query execution cannot be routed to a backup node when the current node encounters exceptions. The corresponding error message is optimized for this issue. [#43489](https://github.com/StarRocks/starrocks/pull/43489) +- Memory information is incorrect in the container environment. [#43225](https://github.com/StarRocks/starrocks/issues/43225) +- An exception is thrown when INSERT tasks are canceled. [#44239](https://github.com/StarRocks/starrocks/pull/44239) +- Expression-based dynamic partitions cannot be automatically created. [#44163](https://github.com/StarRocks/starrocks/pull/44163) +- Creating partitions may cause FE deadlock. [#44974](https://github.com/StarRocks/starrocks/pull/44974) + +## 3.2.6 + +Release date: April 18, 2024 + +### Bug Fixes + +Fixed the following issue: + +- The privileges of external tables cannot be found due to incompatibility issues. [#44030](https://github.com/StarRocks/starrocks/pull/44030) + +## 3.2.5 (Yanked) + +Release date: April 12, 2024 + +:::tip + +This version has been taken offline due to privilege issues in querying external tables in external catalogs such as Hive and Iceberg. + +- **Problem**: When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege. + +- **Impact scope**: This problem only affects queries on external tables in external catalogs. Other queries are not affected. + +- **Temporary workaround**: The query succeeds after the SELECT privilege on this table is granted to the user again. But `SHOW GRANTS` will return duplicate privilege entries. After an upgrade to v3.2.6, users can run `REVOKE` to remove one of the privilege entries. + +::: + +### New Features + +- Supports the [dict_mapping](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/dict-functions/dict_mapping/) column property, which can significantly facilitate the loading process during the construction of a global dictionary, accelerating the exact COUNT DISTINCT calculation. + +### Behavior Changes + +- When null values in JSON data are evaluated based on the `IS NULL` operator, they are considered NULL values following SQL language. For example, `true` is returned for `SELECT parse_json('{"a": null}') -> 'a' IS NULL` (before this behavior change, `false` is returned). [#42765](https://github.com/StarRocks/starrocks/pull/42765) + +### Improvements + +- Optimized the column type unionization rules for automatic schema detection in the FILES table function. When columns with the same name but different types exist in separate files, FILES will attempt to merge them by selecting the type with the larger granularity as the final type. For example, if there are columns with the same name but of types FLOAT and INT respectively, FILES will return DOUBLE as the final type. [#40959](https://github.com/StarRocks/starrocks/pull/40959) +- Primary Key tables support Size-tiered Compaction to reduce the I/O amplification. [#41130](https://github.com/StarRocks/starrocks/pull/41130) +- When Broker Load is used to load data from ORC files that contain TIMESTAMP-type data, StarRocks supports retaining microseconds in the timestamps when converting the timestamps to match its own DATETIME data type. [#42179](https://github.com/StarRocks/starrocks/pull/42179) +- Optimized the error messages for Routine Load. [#41306](https://github.com/StarRocks/starrocks/pull/41306) +- Optimized the error messages when the FILES table function is used to convert invalid data types. [#42717](https://github.com/StarRocks/starrocks/pull/42717) + +### Bug Fixes + +Fixed the following issues: + +- FEs fail to start after system-defined views are dropped. Dropping system-defined views is now prohibited. [#43552](https://github.com/StarRocks/starrocks/pull/43552) +- BEs crash when duplicate sort key columns exist in Primary Key tables. Duplicate sort key columns are now prohibited. [#43206](https://github.com/StarRocks/starrocks/pull/43206) +- An error, instead of NULL, is returned when the input value of the to_json() function is NULL. [#42171](https://github.com/StarRocks/starrocks/pull/42171) +- In shared-data mode, the garbage collection and thread eviction mechanisms for handling persistent indexes created on Primary Key tables cannot take effect on CN nodes. As a result, obsolete data cannot be deleted. [#41955](https://github.com/StarRocks/starrocks/pull/41955) +- In shared-data mode, an error is returned when users modify the `enable_persistent_index` property of a Primary Key table. [#42890](https://github.com/StarRocks/starrocks/pull/42890) +- In shared-data mode, NULL values are given to columns that are not supposed to be changed when users update a Primary Key table with partial updates in column mode. [#42355](https://github.com/StarRocks/starrocks/pull/42355) +- Queries cannot be rewritten with asynchronous materialized views created on logical views. [#42173](https://github.com/StarRocks/starrocks/pull/42173) +- CNs crash when the Cross-cluster Data Migration Tool is used to migrate Primary Key tables to a shared-data cluster. [#42260](https://github.com/StarRocks/starrocks/pull/42260) +- The partition ranges of the external catalog-based asynchronous materialized views are not consecutive. [#41957](https://github.com/StarRocks/starrocks/pull/41957) + +## 3.2.4 (Yanked) + +Release date: March 12, 2024 + +:::tip + +This version has been taken offline due to privilege issues in querying external tables in external catalogs such as Hive and Iceberg. + +- **Problem**: When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege. + +- **Impact scope**: This problem only affects queries on external tables in external catalogs. Other queries are not affected. + +- **Temporary workaround**: The query succeeds after the SELECT privilege on this table is granted to the user again. But `SHOW GRANTS` will return duplicate privilege entries. After an upgrade to v3.2.6, users can run `REVOKE` to remove one of the privilege entries. + +::: + +### New Features + +- Cloud-native Primary Key tables in shared-data clusters support Size-tiered Compaction to reduce the write I/O amplification. [#41034](https://github.com/StarRocks/starrocks/pull/41034) +- Added the date function `milliseconds_diff`. [#38171](https://github.com/StarRocks/starrocks/pull/38171) +- Added the session variable `catalog`, which specifies the catalog to which the session belongs. [#41329](https://github.com/StarRocks/starrocks/pull/41329) +- Supports [setting user-defined variables in hints](https://docs.starrocks.io/docs/3.2/administration/Query_planning/#user-defined-variable-hint). [#40746](https://github.com/StarRocks/starrocks/pull/40746) +- Supports CREATE TABLE LIKE in Hive catalogs. [#37685](https://github.com/StarRocks/starrocks/pull/37685) +- Added the view `information_schema.partitions_meta`, which records detailed metadata of partitions. [#39265](https://github.com/StarRocks/starrocks/pull/39265) +- Added the view `sys.fe_memory_usage`, which records the memory usage for StarRocks. [#40464](https://github.com/StarRocks/starrocks/pull/40464) + +### Behavior Changes + +- `cbo_decimal_cast_string_strict` is used to control how CBO converts data from the DECIMAL type to the STRING type. The default value `true` indicates that the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). The DECIMAL type is not strictly filled in earlier versions, causing different results when comparing the DECIMAL type and the STRING type. [#40619](https://github.com/StarRocks/starrocks/pull/40619) +- The default value of the Iceberg Catalog parameter `enable_iceberg_metadata_cache` has been changed to `false`. From v3.2.1 to v3.2.3, this parameter is set to `true` by default, regardless of what metastore service is used. In v3.2.4 and later, if the Iceberg cluster uses AWS Glue as metastore, this parameter still defaults to `true`. However, if the Iceberg cluster uses other metastore service such as Hive metastore, this parameter defaults to `false`. [#41826](https://github.com/StarRocks/starrocks/pull/41826) +- The user who can refresh materialized views is changed from the `root` user to the user who creates the materialized views. This change does not affect existing materialized views. [#40670](https://github.com/StarRocks/starrocks/pull/40670) +- By default, when comparing columns of constant and string types, StarRocks compares them as strings. Users can use the session variable `cbo_eq_base_type` to adjust the rule used for the comparison. For example, users can set `cbo_eq_base_type` to `decimal`, and StarRocks then compares the columns as numeric values. [#40619](https://github.com/StarRocks/starrocks/pull/40619) + +### Improvements + +- Shared-data StarRocks clusters support the Partitioned Prefix feature for S3-compatible object storage systems. When this feature is enabled, StarRocks stores the data into multiple, uniformly prefixed partitions (sub-paths) under the bucket. This improves the read and write efficiency on data files in S3-compatible object storages. [#41627](https://github.com/StarRocks/starrocks/pull/41627) +- StarRocks supports using the parameter `s3_compatible_fs_list` to specify which S3-compatible object storage can be accessed via AWS SDK, and supports using the parameter `fallback_to_hadoop_fs_list` to specify non-S3-compatible object storages that require access via HDFS Schema (this method requires the use of vendor-provided JAR packages). [#41123](https://github.com/StarRocks/starrocks/pull/41123) +- Optimized compatibility with Trino. Supports syntax conversion from the following Trino functions: current_catalog, current_schema, to_char, from_hex, to_date, to_timestamp, and index. [#41217](https://github.com/StarRocks/starrocks/pull/41217) [#41319](https://github.com/StarRocks/starrocks/pull/41319) [#40803](https://github.com/StarRocks/starrocks/pull/40803) +- Optimized the query rewrite logic of materialized views. StarRocks can rewrite queries with materialized views created upon logical views. [#42173](https://github.com/StarRocks/starrocks/pull/42173) +- Improved the efficiency of converting the STRING type to the DATETIME type by 35% to 40%. [#41464](https://github.com/StarRocks/starrocks/pull/41464) +- The `agg_type` of BITMAP-type columns in an Aggregate table can be set to `replace_if_not_null` in order to support updates only to a few columns of the table. [#42034](https://github.com/StarRocks/starrocks/pull/42034) +- Improved the Broker Load performance when loading small ORC files. [#41765](https://github.com/StarRocks/starrocks/pull/41765) +- The tables with hybrid row-column storage support Schema Change. [#40851](https://github.com/StarRocks/starrocks/pull/40851) +- The tables with hybrid row-column storage support complex types including BITMAP, HLL, JSON, ARRAY, MAP, and STRUCT. [#41476](https://github.com/StarRocks/starrocks/pull/41476) +- A new internal SQL log file is added to record log data related to statistics and materialized views. [#40453](https://github.com/StarRocks/starrocks/pull/40453) + +### Bug Fixes + +Fixed the following issues: + +- "Analyze Error" is thrown if inconsistent letter cases are assigned to the names or aliases of tables or views queried in the creation of a Hive view. [#40921](https://github.com/StarRocks/starrocks/pull/40921) +- I/O usage reaches the upper limit if persistent indexes are created on Primary Key tables. [#39959](https://github.com/StarRocks/starrocks/pull/39959) +- In shared-data clusters, primary key index directories are deleted every 5 hours. [#40745](https://github.com/StarRocks/starrocks/pull/40745) +- After users execute ALTER TABLE COMPACT by hand, the memory usage statistics for compaction operations are abnormal. [#41150](https://github.com/StarRocks/starrocks/pull/41150) +- Retries of the Publish phase may hang for Primary Key tables. [#39890](https://github.com/StarRocks/starrocks/pull/39890) + +## 3.2.3 + +Release date: February 8, 2024 + +### New Features + +- [Preview] Supports hybrid row-column storage for tables. It allows better performance for high-concurrency, low-latency point lookups against Primary Key tables and partial data updates. Currently, this feature does not support modification via ALTER TABLE, changing Sort Key, and partial updates in column mode. +- Supports backing up and restoring asynchronous materialized views. +- Broker Load supports loading JSON-type data. +- Supports query rewrite using asynchronous materialized views created upon views. Queries against a view can be rewritten based on materialized views that are created upon that view. +- Supports CREATE OR REPLACE PIPE. [#37658](https://github.com/StarRocks/starrocks/pull/37658) + +### Behavior Changes + +- Added the session variable `enable_strict_order_by`. When this variable is set to the default value `TRUE`, an error is reported for such a query pattern: Duplicate alias is used in different expressions of the query and this alias is also a sorting field in ORDER BY, for example, `select distinct t1.* from tbl1 t1 order by t1.k1;`. The logic is the same as that in v2.3 and earlier. When this variable is set to `FALSE`, a loose deduplication mechanism is used, which processes such queries as valid SQL queries. [#37910](https://github.com/StarRocks/starrocks/pull/37910) +- Added the session variable `enable_materialized_view_for_insert`, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is `false`. [#37505](https://github.com/StarRocks/starrocks/pull/37505) +- When a single query is executed within the Pipeline framework, its memory limit is now constrained by the variable `query_mem_limit` instead of `exec_mem_limit`. Setting the value of `query_mem_limit` to `0` indicates no limit. [#34120](https://github.com/StarRocks/starrocks/pull/34120) + +### Parameter Changes + +- Added the FE configuration item `http_worker_threads_num`, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is `0`. If the value for this parameter is set to a negative value or `0`, the actual thread number is twice the number of CPU cores. [#37530](https://github.com/StarRocks/starrocks/pull/37530) +- Added the BE configuration item `lake_pk_compaction_max_input_rowsets`, which controls the maximum number of input rowsets allowed in a Primary Key table compaction task in a shared-data StarRocks cluster. This helps optimize resource consumption for compaction tasks. [#39611](https://github.com/StarRocks/starrocks/pull/39611) +- Added the session variable `connector_sink_compression_codec`, which specifies the compression algorithm used for writing data into Hive tables or Iceberg tables, or exporting data with Files(). Valid algorithms include GZIP, BROTLI, ZSTD, and LZ4. [#37912](https://github.com/StarRocks/starrocks/pull/37912) +- Added the FE configuration item `routine_load_unstable_threshold_second`. [#36222](https://github.com/StarRocks/starrocks/pull/36222) +- Added the BE configuration item `pindex_major_compaction_limit_per_disk` to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is `1`. [#36681](https://github.com/StarRocks/starrocks/pull/36681) +- Added the BE configuration item `enable_lazy_delta_column_compaction`. The default value is `true`, indicating that StarRocks does not perform frequent compaction operations on delta columns. [#36654](https://github.com/StarRocks/starrocks/pull/36654) +- Added the FE configuration item `default_mv_refresh_immediate`, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value is `true`. [#37093](https://github.com/StarRocks/starrocks/pull/37093) +- Changed the default value of the FE configuration item `default_mv_refresh_partition_num`to `1`. This indicates that when multiple partitions need to be updated during a materialized view refresh, the task will be split in batches, refreshing only one partition at a time. This helps reduce resource consumption during each refresh. [#36560](https://github.com/StarRocks/starrocks/pull/36560) +- Changed the default value of the BE/CN configuration item `starlet_use_star_cache` to `true`. This indicates that Data Cache is enabled by default in shared-data clusters. If, before upgrading, you have manually configured the BE/CN configuration item `starlet_cache_evict_high_water` to `X`, you must configure the BE/CN configuration item `starlet_star_cache_disk_size_percent` to `(1.0 - X) * 100`. For example, if you have set `starlet_cache_evict_high_water` to `0.3` before upgrading, you must set `starlet_star_cache_disk_size_percent` to `70`. This ensures that both file data cache and Data Cache will not exceed the disk capacity limit. [#38200](https://github.com/StarRocks/starrocks/pull/38200) + +### Improvements + +- Added date formats `yyyy-MM-ddTHH:mm` and `yyyy-MM-dd HH:mm` to support TIMESTAMP partition fields in Apache Iceberg tables. [#39986](https://github.com/StarRocks/starrocks/pull/39986) +- Added Data Cache-related metrics to the monitoring API. [#40375](https://github.com/StarRocks/starrocks/pull/40375) +- Optimized BE log printing to prevent too many irrelevant logs. [#22820](https://github.com/StarRocks/starrocks/pull/22820) [#36187](https://github.com/StarRocks/starrocks/pull/36187) +- Added the field `storage_medium` to the view `information_schema.be_tablets`. [#37070](https://github.com/StarRocks/starrocks/pull/37070) +- Supports `SET_VAR` in multiple sub-queries. [#36871](https://github.com/StarRocks/starrocks/pull/36871) +- A new field `LatestSourcePosition` is added to the return result of SHOW ROUTINE LOAD to record the position of the latest message in each partition of the Kafka topic, helping check the latencies of data loading. [#38298](https://github.com/StarRocks/starrocks/pull/38298) +- When the string on the right side of the LIKE operator within the WHERE clause does not include `%` or `_`, the LIKE operator is converted into the `=` operator. [#37515](https://github.com/StarRocks/starrocks/pull/37515) +- The default retention period of trash files is changed to 1 day from the original 3 days. [#37113](https://github.com/StarRocks/starrocks/pull/37113) +- Supports collecting statistics from Iceberg tables with Partition Transform. [#39907](https://github.com/StarRocks/starrocks/pull/39907) +- The scheduling policy for Routine Load is optimized, so that slow tasks do not block the execution of the other normal tasks. [#37638](https://github.com/StarRocks/starrocks/pull/37638) + +### Bug Fixes + +Fixed the following issues: + +- The execution of ANALYZE TABLE gets stuck occasionally. [#36836](https://github.com/StarRocks/starrocks/pull/36836) +- The memory consumption by PageCache exceeds the threshold specified by the BE dynamic parameter `storage_page_cache_limit` in certain circumstances. [#37740](https://github.com/StarRocks/starrocks/pull/37740) +- Hive metadata in Hive catalogs is not automatically refreshed when new fields are added to Hive tables. [#37549](https://github.com/StarRocks/starrocks/pull/37549) +- In some cases, `bitmap_to_string` may return incorrect results due to data type overflow. [#37405](https://github.com/StarRocks/starrocks/pull/37405) +- When `SELECT ... FROM ... INTO OUTFILE` is executed to export data into CSV files, the error "Unmatched number of columns" is reported if the FROM clause contains multiple constants. [#38045](https://github.com/StarRocks/starrocks/pull/38045) +- In some cases, querying semi-structured data in tables may cause BEs to crash. [#40208](https://github.com/StarRocks/starrocks/pull/40208) + +## 3.2.2 + +Release date: December 30, 2023 + +### Bug Fixes + +Fixed the following issue: + +- When StarRocks is upgraded from v3.1.2 or earlier to v3.2, FEs may fail to restart. [#38172](https://github.com/StarRocks/starrocks/pull/38172) + +## 3.2.1 + +Release date: December 21, 2023 + +### New Features + +#### Data Lake Analytics + +- Supports reading [Hive Catalog](https://docs.starrocks.io/docs/3.2/data_source/catalog/hive_catalog/) tables and file external tables in Avro, SequenceFile, and RCFile formats through Java Native Interface (JNI). + +#### Materialized View + +- Added a view `object_dependencies` to the database `sys`. It contains the lineage information of asynchronous materialized views. [#35060](https://github.com/StarRocks/starrocks/pull/35060) +- Supports creating synchronous materialized views with the WHERE clause. +- Supports partition-level incremental refresh for asynchronous materialized views created upon Iceberg catalogs. +- [Preview] Supports creating asynchronous materialized views based on tables in a Paimon catalog with partition-level refresh. + +#### Query and SQL functions + +- Supports the prepared statement. It allows better performance for processing high-concurrency point lookup queries. It also prevents SQL injection effectively. +- Supports the following Bitmap functions: [subdivide_bitmap](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/bitmap-functions/subdivide_bitmap/), [bitmap_from_binary](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/bitmap-functions/bitmap_from_binary/), and [bitmap_to_binary](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/bitmap-functions/bitmap_to_binary/). +- Supports the Array function [array_unique_agg](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/array-functions/array_unique_agg/). + +#### Monitoring and alerts + +- Added a new metric `max_tablet_rowset_num` for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". [#36539](https://github.com/StarRocks/starrocks/pull/36539) + +### Parameter change + +- A new BE configuration item `enable_stream_load_verbose_log` is added. The default value is `false`. With this parameter set to `true`, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. [#36113](https://github.com/StarRocks/starrocks/pull/36113) + +### Improvements + +- Upgraded the default GC algorithm in JDK8 to G1. [#37268](https://github.com/StarRocks/starrocks/pull/37268) +- A new value option `GROUP_CONCAT_LEGACY` is added to the session variable [sql_mode](https://docs.starrocks.io/docs/3.2/reference/System_variable/#sql_mode) to provide compatibility with the implementation logic of the [group_concat](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/string-functions/group_concat/) function in versions earlier than v2.5. [#36150](https://github.com/StarRocks/starrocks/pull/36150) +- The authentication information `aws.s3.access_key` and `aws.s3.access_secret` for [AWS S3 in Broker Load jobs](https://docs.starrocks.io/docs/3.2/loading/s3/) are hidden in audit logs. [#36571](https://github.com/StarRocks/starrocks/pull/36571) +- The `be_tablets` view in the `information_schema` database provides a new field `INDEX_DISK`, which records the disk usage (measured in bytes) of persistent indexes. [#35615](https://github.com/StarRocks/starrocks/pull/35615) +- The result returned by the [SHOW ROUTINE LOAD](https://docs.starrocks.io/docs/3.2/sql-reference/sql-statements/data-manipulation/SHOW_ROUTINE_LOAD/) statement provides a new field `OtherMsg`, which shows information about the last failed task. [#35806](https://github.com/StarRocks/starrocks/pull/35806) + +### Bug Fixes + +Fixed the following issues: + +- The BEs crash if users create persistent indexes in the event of data corruption.[#30841](https://github.com/StarRocks/starrocks/pull/30841) +- The [array_distinct](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/array-functions/array_distinct/) function occasionally causes the BEs to crash. [#36377](https://github.com/StarRocks/starrocks/pull/36377) +- After the DISTINCT window operator pushdown feature is enabled, errors are reported if SELECT DISTINCT operations are performed on the complex expressions of the columns computed by window functions. [#36357](https://github.com/StarRocks/starrocks/pull/36357) +- Some S3-compatible object storage returns duplicate files, causing the BEs to crash. [#36103](https://github.com/StarRocks/starrocks/pull/36103) + +## 3.2.0 + +Release date: December 1, 2023 + +### New Features + +#### Shared-data cluster + +- Supports persisting indexes of [Primary Key tables](https://docs.starrocks.io/docs/3.2/table_design/table_types/primary_key_table/) to local disks. +- Supports even distribution of Data Cache among multiple local disks. + +#### Materialized View + +**Asynchronous materialized view** + +- The Query Dump file can include information of asynchronous materialized views. +- The Spill to Disk feature is enabled by default for the refresh tasks of asynchronous materialized views, reducing memory consumption. + +#### Data Lake Analytics + +- Supports creating and dropping databases and managed tables in [Hive catalogs](https://docs.starrocks.io/docs/3.2/data_source/catalog/hive_catalog/), and supports exporting data to Hive's managed tables using INSERT or INSERT OVERWRITE. +- Supports [Unified Catalog](https://docs.starrocks.io/docs/3.2/data_source/catalog/unified_catalog/), with which users can access different table formats (Hive, Iceberg, Hudi, and Delta Lake) that share a common metastore like Hive metastore or AWS Glue. +- Supports collecting statistics of Hive and Iceberg tables using ANALYZE TABLE, and storing the statistics in StarRocks, thus facilitating optimization of query plans and accelerating subsequent queries. +- Supports Information Schema for external tables, providing additional convenience for interactions between external systems (such as BI tools) and StarRocks. + +#### Storage engine, data ingestion, and export + +- Added the following features of loading with the table function [FILES()](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/table-functions/files/): + - Loading Parquet and ORC format data from Azure or GCP. + - Extracting the value of a key/value pair from the file path as the value of a column using the parameter `columns_from_path`. + - Loading complex data types including ARRAY, JSON, MAP, and STRUCT. +- Supports unloading data from StarRocks to Parquet-formatted files stored in AWS S3 or HDFS by using INSERT INTO FILES. For detailed instructions, see [Unload data using INSERT INTO FILES](https://docs.starrocks.io/docs/3.2/unloading/unload_using_insert_into_files/). +- Supports [manual optimization of table structure and data distribution strategy](https://docs.starrocks.io/docs/3.2/table_design/Data_distribution#optimize-data-distribution-after-table-creation-since-32) used in an existing table to optimize the query and loading performance. You can set a new bucket key, bucket number, or sort key for a table. You can also set a different bucket number for specific partitions. +- Supports continuous data loading from [AWS S3](https://docs.starrocks.io/docs/3.2/loading/s3/#use-pipe) or [HDFS](https://docs.starrocks.io/docs/3.2/loading/hdfs_load/#use-pipe) using the PIPE method. + - When PIPE detects new or modifications in a remote storage directory, it can automatically load the new or modified data into the destination table in StarRocks. While loading data, PIPE automatically splits a large loading task into smaller, serialized tasks, enhancing stability in large-scale data ingestion scenarios and reducing the cost of error retries. + +#### Query + +- Supports [HTTP SQL API](https://docs.starrocks.io/docs/3.2/reference/HTTP_API/SQL/), enabling users to access StarRocks data via HTTP and execute SELECT, SHOW, EXPLAIN, or KILL operations. +- Supports Runtime Profile and text-based Profile analysis commands (SHOW PROFILELIST, ANALYZE PROFILE, EXPLAIN ANALYZE) to allow users to directly analyze profiles via MySQL clients, facilitating bottleneck identification and discovery of optimization opportunities. + +#### SQL reference + +Added the following functions: + +- String functions: substring_index, url_extract_parameter, url_encode, url_decode, and translate +- Date functions: dayofweek_iso, week_iso, quarters_add, quarters_sub, milliseconds_add, milliseconds_sub, date_diff, jodatime_format, str_to_jodatime, to_iso8601, to_tera_date, and to_tera_timestamp +- Pattern matching function: regexp_extract_all +- hash function: xx_hash3_64 +- Aggregate functions: approx_top_k +- Window functions: cume_dist, percent_rank and session_number +- Utility functions: get_query_profile and is_role_in_session + +#### Privileges and security + +StarRocks supports access control through [Apache Ranger](https://docs.starrocks.io/docs/3.2/administration/ranger_plugin/), providing a higher level of data security and allowing the reuse of existing services of external data sources. After integrating with Apache Ranger, StarRocks enables the following access control methods: + +- When accessing internal tables, external tables, or other objects in StarRocks, access control can be enforced based on the access policies configured for the StarRocks Service in Ranger. +- When accessing an external catalog, access control can also leverage the corresponding Ranger service of the original data source (such as Hive Service) to control access (currently, access control for exporting data to Hive is not yet supported). + +For more information, see [Manage permissions with Apache Ranger](https://docs.starrocks.io/docs/3.2/administration/ranger_plugin/). + +### Improvements + +#### Data Lake Analytics + +- Optimized ORC Reader: + - Optimized the ORC Column Reader, resulting in nearly a two-fold performance improvement for VARCHAR and CHAR data reading. + - Optimized the decompression performance of ORC files in Zlib compression format. +- Optimized Parquet Reader: + - Supports adaptive I/O merging, allowing adaptive merging of columns with and without predicates based on filtering effects, thus reducing I/O. + - Optimized Dict Filter for faster predicate rewriting. Supports STRUCT sub-columns, and on-demand dictionary column decoding. + - Optimized Dict Decode performance. + - Optimized late materialization performance. + - Supports caching file footers to avoid repeated computation overhead. + - Supports decompression of Parquet files in lzo compression format. +- Optimized CSV Reader: + - Optimized the Reader performance. + - Supports decompression of CSV files in Snappy and lzo compression formats. +- Optimized the performance of the count calculation. +- Optimized Iceberg Catalog capabilities: + - Supports collecting column statistics from Manifest files to accelerate queries. + - Supports collecting NDV (number of distinct values) from Puffin files to accelerate queries. + - Supports partition pruning. + - Reduced Iceberg metadata memory consumption to enhance stability in scenarios with large metadata volume or high query concurrency. + +#### Materialized View + +**Asynchronous materialized view** + +- Supports automatic refresh for an asynchronous materialized view created upon views or materialized views when schema changes occur on the views, materialized views, or their base tables. +- Data consistency: + - Added the property `query_rewrite_consistency` for asynchronous materialized view creation. This property defines the query rewrite rules based on the consistency check. + - Add the property `force_external_table_query_rewrite` for external catalog-based asynchronous materialized view creation. This property defines whether to allow force query rewrite for asynchronous materialized views created upon external catalogs. + - For detailed information, see [CREATE MATERIALIZED VIEW](https://docs.starrocks.io/docs/3.2/sql-reference/sql-statements/data-definition/CREATE_MATERIALIZED_VIEW/). +- Added a consistency check for materialized views' partitioning key. + - When users create an asynchronous materialized view with window functions that include a PARTITION BY expression, the partitioning column of the window function must match that of the materialized view. + +#### Storage engine, data ingestion, and export + +- Optimized the persistent index for Primary Key tables by improving memory usage logic while reducing I/O read and write amplification. [#24875](https://github.com/StarRocks/starrocks/pull/24875) [#27577](https://github.com/StarRocks/starrocks/pull/27577) [#28769](https://github.com/StarRocks/starrocks/pull/28769) +- Supports data re-distribution across local disks for Primary Key tables. +- Partitioned tables support automatic cooldown based on the partition time range and cooldown time. Compared to the original cooldown logic, it is more convenient to perform hot and cold data management on the partition level. For more information, see [Specify initial storage medium, automatic storage cooldown time, replica number](https://docs.starrocks.io/docs/3.2/sql-reference/sql-statements/data-definition/CREATE_TABLE#specify-initial-storage-medium-automatic-storage-cooldown-time-replica-number). +- The Publish phase of a load job that writes data into a Primary Key table is changed from asynchronous mode to synchronous mode. As such, the data loaded can be queried immediately after the load job finishes. For more information, see [enable_sync_publish](https://docs.starrocks.io/docs/3.2/administration/FE_configuration#enable_sync_publish). +- Supports Fast Schema Evolution, which is controlled by the table property [`fast_schema_evolution`](https://docs.starrocks.io/docs/3.2/sql-reference/sql-statements/data-definition/CREATE_TABLE#set-fast-schema-evolution). After this feature is enabled, the execution efficiency of adding or dropping columns is significantly improved. This mode is disabled by default (Default value is `false`). You cannot modify this property for existing tables using ALTER TABLE. +- [Supports dynamically adjusting the number of tablets to create](https://docs.starrocks.io/docs/3.2/table_design/Data_distribution#set-the-number-of-buckets) according to cluster information and the size of the data for **Duplicate Key** tables created with the Radom Bucketing strategy. + +#### Query + +- Optimized StarRocks' compatibility with Metabase and Superset. Supports integrating them with external catalogs. + +#### SQL Reference + +- [array_agg](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/array-functions/array_agg/) supports the keyword DISTINCT. +- INSERT, UPDATE, and DELETE operations now support `SET_VAR`. [#35283](https://github.com/StarRocks/starrocks/pull/35283) + +#### Others + +- Added the session variable `large_decimal_underlying_type = "panic"|"double"|"decimal"` to set the rules to deal with DECIMAL type overflow. `panic` indicates returning an error immediately, `double` indicates converting the data to DOUBLE type, and `decimal` indicates converting the data to DECIMAL(38,s). + +### Developer tools + +- Supports Trace Query Profile for asynchronous materialized views, which can be used to analyze its transparent rewrite. + +### Behavior Change + +To be updated. + +### Parameter Change + +#### FE Parameters + +- Added the following FE configuration items: + - `catalog_metadata_cache_size` + - `enable_backup_materialized_view` + - `enable_colocate_mv_index` + - `enable_fast_schema_evolution` + - `json_file_size_limit` + - `lake_enable_ingest_slowdown` + - `lake_ingest_slowdown_threshold` + - `lake_ingest_slowdown_ratio` + - `lake_compaction_score_upper_bound` + - `mv_auto_analyze_async` + - `primary_key_disk_schedule_time` + - `statistic_auto_collect_small_table_rows` + - `stream_load_task_keep_max_num` + - `stream_load_task_keep_max_second` +- Removed FE configuration item `enable_pipeline_load`. +- Default value modifications: + - The default value of `enable_sync_publish` is changed from `false` to `true`. + - The default value of `enable_persistent_index_by_default` is changed from `false` to `true`. + +#### BE Parameters + +- Data Cache-related configuration changes. + - Added `datacache_enable` to replace `block_cache_enable`. + - Added `datacache_mem_size` to replace `block_cache_mem_size`. + - Added `datacache_disk_size` to replace `block_cache_disk_size`. + - Added `datacache_disk_path` to replace `block_cache_disk_path`. + - Added `datacache_meta_path` to replace `block_cache_meta_path`. + - Added `datacache_block_size` to replace `block_cache_block_size`. + - Added `datacache_checksum_enable` to replace `block_cache_checksum_enable`. + - Added `datacache_direct_io_enable` to replace `block_cache_direct_io_enable`. + - Added `datacache_max_concurrent_inserts` to replace `block_cache_max_concurrent_inserts`. + - Added `datacache_max_flying_memory_mb`. + - Added `datacache_engine` to replace `block_cache_engine`. + - Removed `block_cache_max_parcel_memory_mb`. + - Removed `block_cache_report_stats`. + - Removed `block_cache_lru_insertion_point`. + + After renaming Block Cache to Data Cache, StarRocks has introduced a new set of BE parameters prefixed with `datacache` to replace the original parameters prefixed with `block_cache`. After upgrade to v3.2, the original parameters will still be effective. Once enabled, the new parameters will override the original ones. The mixed usage of new and original parameters is not supported, as it may result in some configurations not taking effect. In the future, StarRocks plans to deprecate the original parameters with the `block_cache` prefix, so we recommend you use the new parameters with the `datacache` prefix. + +- Added the following BE configuration items: + - `spill_max_dir_bytes_ratio` + - `streaming_agg_limited_memory_size` + - `streaming_agg_chunk_buffer_size` +- Removed the following BE configuration items: + - Dynamic parameter `tc_use_memory_min` + - Dynamic parameter `tc_free_memory_rate` + - Dynamic parameter `tc_gc_period` + - Static parameter `tc_max_total_thread_cache_byte` +- Default value modifications: + - The default value of `disable_column_pool` is changed from `false` to `true`. + - The default value of `thrift_port` is changed from `9060` to `0`. + - The default value of `enable_load_colocate_mv` is changed from `false` to `true`. + - The default value of `enable_pindex_minor_compaction` is changed from `false` to `true`. + +#### System Variables + +- Added the following session variables: + - `enable_per_bucket_optimize` + - `enable_write_hive_external_table` + - `hive_temp_staging_dir` + - `spill_revocable_max_bytes` + - `thrift_plan_protocol` +- Removed the following session variables: + - `enable_pipeline_query_statistic` + - `enable_deliver_batch_fragments` +- Renamed the following session variables: + - `enable_scan_block_cache` is renamed as `enable_scan_datacache`. + - `enable_populate_block_cache` is renamed as `enable_populate_datacache`. + +#### Reserved Keywords + +Added reserved keywords `OPTIMIZE` and `PREPARE`. + +### Bug Fixes + +Fixed the following issues: + +- BEs crash when libcurl is invoked. [#31667](https://github.com/StarRocks/starrocks/pull/31667) +- Schema Change may fail if it takes an excessively long period of time, because the specified tablet version is handled by garbage collection. [#31376](https://github.com/StarRocks/starrocks/pull/31376) +- Failed to access the Parquet files in MinIO via file external tables. [#29873](https://github.com/StarRocks/starrocks/pull/29873) +- The ARRAY, MAP, and STRUCT type columns are not correctly displayed in `information_schema.columns`. [#33431](https://github.com/StarRocks/starrocks/pull/33431) +- An error is reported if specific path formats are used during data loading via Broker Load: `msg:Fail to parse columnsFromPath, expected: [rec_dt]`. [#32720](https://github.com/StarRocks/starrocks/pull/32720) +- `DATA_TYPE` and `COLUMN_TYPE` for BINARY or VARBINARY data types are displayed as `unknown` in the `information_schema.columns` view. [#32678](https://github.com/StarRocks/starrocks/pull/32678) +- Complex queries that involve many unions, expressions, and SELECT columns can result in a sudden surge in the bandwidth or CPU usage within an FE node. +- The refresh of asynchronous materialized view may occasionally encounter deadlock. [#35736](https://github.com/StarRocks/starrocks/pull/35736) + +### Upgrade Notes + +- Optimization on **Random Bucketing** is disabled by default. To enable it, you need to add the property `bucket_size` when creating tables. This allows the system to dynamically adjust the number of tablets based on cluster information and the size of loaded data. Please note that once this optimization is enabled, if you need to roll back your cluster to v3.1 or earlier, you must delete tables with this optimization enabled and manually execute a metadata checkpoint (by executing `ALTER SYSTEM CREATE IMAGE`). Otherwise, the rollback will fail. +- Starting from v3.2.0, StarRocks has disabled non-Pipeline queries. Therefore, before upgrading your cluster to v3.2, you need to globally enable the Pipeline engine (by adding the configuration `enable_pipeline_engine=true` in the FE configuration file **fe.conf**). Failure to do so will result in errors for non-Pipeline queries. diff --git a/docs/zh/release_notes/release-3.2.md b/docs/zh/release_notes/release-3.2.md new file mode 100644 index 0000000000000..2f1cdfde4e898 --- /dev/null +++ b/docs/zh/release_notes/release-3.2.md @@ -0,0 +1,649 @@ +--- +displayed_sidebar: docs +--- + +# StarRocks version 3.2 + +## 3.2.13 + +发布日期:2024 年 12 月 13 日 + +### 功能优化 + +- 支持对单个表设置禁止进行 Base Compaction 的时间范围。[#50120](https://github.com/StarRocks/starrocks/pull/50120) + +### 问题修复 + +修复了如下问题: + +- 执行 SHOW ROUTINE LOAD 后 `loadRowsRate` 字段返回为 `0`。[#52151](https://github.com/StarRocks/starrocks/pull/52151) +- 函数 `F``iles()` 读取文件时读取未被查询的列。 [#52210](https://github.com/StarRocks/starrocks/pull/52210) +- Prometheus 不能解析含有特殊符号名称的物化视图相关指标(当前物化视图统计指标支持 Tag)。[#52782](https://github.com/StarRocks/starrocks/pull/52782) +- 函数 `array_map` 导致 BE Crash。[#52909](https://github.com/StarRocks/starrocks/pull/52909) +- Metadata Cache 导致 BE Crash 问题。[#52968](https://github.com/StarRocks/starrocks/pull/52968) +- Routine Load 因事务过期而导致任务取消(当前仅有数据库或表不存在任务才会被取消)。[#50334](https://github.com/StarRocks/starrocks/pull/50334) +- 通过 HTTP 1.0 提交的 Stream Load 失败。[#53010](https://github.com/StarRocks/starrocks/pull/53010) [#53008](https://github.com/StarRocks/starrocks/pull/53008) +- 一些和 Glue、S3 集成相关的问题:[#48433](https://github.com/StarRocks/starrocks/pull/48433) + - 部分报错信息未能展示根源报错原因。 + - 使用 Glue 作为元数据服务时,写入分区列为 SRTING 类型的 Hive 分区表的报错。 + - 删除 Hive 表时,用户权限不足但系统并未报错。 +- 物化视图属性 `storage_cooldown_time` 设置为 `maximum` 不生效。[#52079](https://github.com/StarRocks/starrocks/pull/52079) + +## 3.2.12 + +发布日期:2024 年 10 月 23 日 + +### 功能优化 + +- 优化在部分复杂查询场景下 BE 内存分配和统计,避免 OOM。[#51382](https://github.com/StarRocks/starrocks/pull/51382) +- 优化在 Schema Change 场景下 FE 的内存使用。[#50855](https://github.com/StarRocks/starrocks/pull/50855) +- 优化从 Follower FE 节点查询系统定义视图 `information_schema.routine_load_jobs` 时 Job 状态的展示。[#51763](https://github.com/StarRocks/starrocks/pull/51763) +- 支持备份还原 List 分区表。[#51993](https://github.com/StarRocks/starrocks/pull/51993) + +### 问题修复 + +修复了如下问题: + +- 写入 Hive 失败后,报错信息丢失。[#33167](https://github.com/StarRocks/starrocks/pull/33167) +- 函数 `array_map` 在常量参数过多时导致 Crash。[#51244](https://github.com/StarRocks/starrocks/pull/51244) +- 表达式分区表的分区列里有特殊字符会导致 FE CheckPoint 失败。[#51677](https://github.com/StarRocks/starrocks/pull/51677) +- 访问系统定义视图 `information_schema.fe_locks` 导致 Crash。[#51742](https://github.com/StarRocks/starrocks/pull/51742) +- 查询生成列报错。[#51755](https://github.com/StarRocks/starrocks/pull/51755) +- 表名存在特殊字符时执行 Optimize Table 失败。[#51755](https://github.com/StarRocks/starrocks/pull/51755) +- 某些场景下 Tablet 无法 Balance。[#51828](https://github.com/StarRocks/starrocks/pull/51828) + +### 行为变更 + +- 支持动态修改备份还原相关的参数。[#52111](https://github.com/StarRocks/starrocks/pull/52111) + +## 3.2.11 + +发布日期:2024 年 9 月 9 日 + +### 功能优化 + +- 对 Files()、PIPE 相关操作中的敏感信息进行脱敏。[#47629](https://github.com/StarRocks/starrocks/pull/47629) +- 通过 Files() 读取 Parquet 文件支持自动推导 STRUCT 类型。[#50481](https://github.com/StarRocks/starrocks/pull/50481) + +### 问题修复 + +修复了如下问题: + +- Equi-join 查询由于全局字典未改写导致报错。[#50690](https://github.com/StarRocks/starrocks/pull/50690) +- Tablet Clone 时 FE 侧死循环导致报错 "version has been compacted"。[#50561](https://github.com/StarRocks/starrocks/pull/50561) +- 数据副本基于 Label 分布后,不健康副本修复调度错误。[#50331](https://github.com/StarRocks/starrocks/pull/50331) +- 统计信息收集日志中报错 "Unknown column '%s' in '%s"。[#50785](https://github.com/StarRocks/starrocks/pull/50785) +- Files() 读取 Parquet 格式文件中复杂类型 TIMESTAMP 时使用的 Timezone 不正确。[#50448](https://github.com/StarRocks/starrocks/pull/50448) + +### 行为变更 + +- 从 v3.3.x 版本降级至 v3.2.11 版本,如果存在不兼容的元数据信息,系统将直接忽略。[#49636](https://github.com/StarRocks/starrocks/pull/49636) + +## 3.2.10 + +发布日期:2024 年 8 月 23 日 + +### 功能优化 + +- Files() 读取 Parquet 文件中的 `logical_type` 为 JSON 的 BYTE_ARRAY 数据自动转换为 StarRocks 中的 JSON 类型。[#49385](https://github.com/StarRocks/starrocks/pull/49385) +- 优化 Files() 在缺失 Access Key ID 和 Secret Access Key 时的报错信息。[#49090](https://github.com/StarRocks/starrocks/pull/49090) +- `information_schema.columns` 支持 `GENERATION_EXPRESSION` 字段。[#49734](https://github.com/StarRocks/starrocks/pull/49734) + +### 问题修复 + +修复了如下问题: + +- 在 v3.3 存算分离集群中为主键表设置 Property `"persistent_index_type" = "CLOUD_NATIVE"` 后,将集群降级到 v3.2 导致 Crash。[#48149](https://github.com/StarRocks/starrocks/pull/48149) +- SELECT INTO OUTFILE 导出数据至 CSV 文件可能导致数据不一致。[#48052](https://github.com/StarRocks/starrocks/pull/48052) +- 并发执行查询时查询失败。[#48180](https://github.com/StarRocks/starrocks/pull/48180) +- Plan 阶段超时但不退出,导致的查询卡住。[#48405](https://github.com/StarRocks/starrocks/pull/48405) +- 在旧版本中为主键表关闭索引压缩功能后,升级至 v3.1.13 或 v3.2.9,访问索引的 `page_off` 信息时数组越界导致 Crash。[#48230](https://github.com/StarRocks/starrocks/pull/48230) +- 并发执行 ADD/DROP COLUMN 操作导致 BE Crash。[#49355](https://github.com/StarRocks/starrocks/pull/49355) +- 在 aarch64 架构下查询 ORC 格式文件中的 TINYINT 类型负数显示为 None。[#49517](https://github.com/StarRocks/starrocks/pull/49517) +- 当写盘失败时,主键表持久化主键索引的 `l``0` 可能会因为无法捕捉错误导致数据丢失。[#48045](https://github.com/StarRocks/starrocks/pull/48045) +- 主键表部分列更新在大量数据更新的场景下写入失败。[#49054](https://github.com/StarRocks/starrocks/pull/49054) +- v3.3.0 存算分离集群降级到 v3.2.9 后,Fast Schema Evolution 导致 BE Crash。[#42737](https://github.com/StarRocks/starrocks/pull/42737) +- `partition_linve_nubmer` 不生效。[#49213](https://github.com/StarRocks/starrocks/pull/49213) +- 主键表索引落盘和 Compaction 并发的冲突可能导致 Clone 失败。[#49341](https://github.com/StarRocks/starrocks/pull/49341) +- 通过 ALTER TABLE 修改 `partition_linve_nubmer` 不生效。[#49437](https://github.com/StarRocks/starrocks/pull/49437) +- CTE distinct grouping sets 查询改写生成错误计划。[#48765](https://github.com/StarRocks/starrocks/pull/48765) +- RPC 失败导致线程池污染。[#49619](https://github.com/StarRocks/starrocks/pull/49619) +- 通过 PIPE 导入 AWS S3 中的文件时访问鉴权失败。[#49837](https://github.com/StarRocks/starrocks/pull/49837) + +### 行为变更 + +- FE 启动脚本中增加 `meta` 目录检查,如果不存在则自动创建 `meta` 目录。[#48940](https://github.com/StarRocks/starrocks/pull/48940) +- 增加导入内存限制参数 `load_process_max_memory_hard_limit_ratio`,当导入内存超过使用限制后,后续导入任务将失败。[#48495](https://github.com/StarRocks/starrocks/pull/48495) + +## 3.2.9 + +发布日期:2024 年 7 月 11 日 + +### 新增特性 + +- Paimon 外表支持 DELETE Vector。 [#45866](https://github.com/StarRocks/starrocks/issues/45866) +- 支持通过 Apache Ranger 实现 Column 级别权限控制。[#47702](https://github.com/StarRocks/starrocks/pull/47702) +- Stream Load 支持在导入时将 JSON 字符串自动转换成 STRUCT/MAP/ARRAY 类型数据。[#45406](https://github.com/StarRocks/starrocks/pull/45406) +- JDBC Catalog支持 Oracle 和 SQL Server。[#35691](https://github.com/StarRocks/starrocks/issues/35691) + +### 功能优化 + +- 优化权限管理,限制 `user_admin` 角色的用户修改 root 密码。[#47801](https://github.com/StarRocks/starrocks/pull/47801) +- Stream Load 支持将 `\t` 和 `\n` 分别作为行列分割符,无需转成对应的十六进制 ASCII 码。[#47302](https://github.com/StarRocks/starrocks/pull/47302) +- 降低导入时的内存占用。[#47047](https://github.com/StarRocks/starrocks/pull/47047) +- 在审计日志中对 Files() 函数的认证信息进行脱敏处理。[#46893](https://github.com/StarRocks/starrocks/pull/46893) +- Hive 外表支持 `skip.header.line.count` 属性。 [#47001](https://github.com/StarRocks/starrocks/pull/47001) +- JDBC Catalog 支持更多的数据类型。[#47618](https://github.com/StarRocks/starrocks/pull/47618) + +### 问题修复 + +修复了如下问题: + +- 存算分离集群从 v3.2.x 升级到 v3.3.0 后回滚,ALTER TABLE ADD COLUMN 导致 BE Crash。[#47826](https://github.com/StarRocks/starrocks/pull/47826) +- 通过 SUBMIT TASK 发起的任务 QueryDetail 接口显示状态一直为 Running。[#47619](https://github.com/StarRocks/starrocks/pull/47619) +- 向 FE Leader 节点转发查询导致空指针。[#47559](https://github.com/StarRocks/starrocks/pull/47559) +- 执行 SHOW MATERIALIZED VIEWS 时带 WHERE 条件导致空指针。[#47811](https://github.com/StarRocks/starrocks/pull/47811) +- 存算一体集群中主键表 Vertical Compaction 失败。[#47192](https://github.com/StarRocks/starrocks/pull/47192) +- 写入 Hive 或 Iceberg 表时没有正确处理 I/O Error。[#46979](https://github.com/StarRocks/starrocks/pull/46979) +- 给表属性赋值时添加空格不生效。[#47119](https://github.com/StarRocks/starrocks/pull/47119) +- 对主键表并发执行迁移操作和 Index Compaction 时导致 BE Crash。[#46675](https://github.com/StarRocks/starrocks/pull/46675) + +### 行为变更 + +- 修改 `JAVA_OPTS` 参数继承顺序,如果使用 JDK_9 或 JDK_11 以外的版本,用户需直接在 `JAVA_OPTS` 中配置。[#47495](https://github.com/StarRocks/starrocks/pull/47495) +- 用户创建非分区表但未设置分桶数时,系统自动设置的分桶数最小值修改为 `16`(原来的规则是 `2 * BE 或 CN 数量`,也即最小会创建 2 个 Tablet)。如果是小数据且想要更小的分桶数,需要手动设置。[#47005](https://github.com/StarRocks/starrocks/pull/47005) +- 用户创建分区表但未设置分桶数时,当分区数量超过 5 个后,系统自动设置分桶数的规则更改为 `max(2 * BE 或 CN 数量, 根据最大历史分区数据量计算得出的分桶数)`。原来的规则是根据最大历史分区数据量计算分桶数。[#47949](https://github.com/StarRocks/starrocks/pull/47949) + +## 3.2.8 + +发布日期:2024 年 6 月 7 日 + +### 新增特性 + +- **[使用标签管理 BE](https://docs.starrocks.io/zh/docs/3.2/administration/management/resource_management/be_label/)**:支持基于 BE 节点所在机架、数据中心等信息,使用标签对 BE 节点进行分组,以保证数据在机架或数据中心等之间均匀分布,应对某些机架断电或数据中心故障情况下的灾备需求。[#38833](https://github.com/StarRocks/starrocks/pull/38833) + +### 问题修复 + +修复了如下问题: + +- 基于 str2date 函数的表达式分区表使用 DELETE 语句删除数据报错。[#45939](https://github.com/StarRocks/starrocks/pull/45939) +- 跨集群迁移工具因获取不到源集群 Schema 信息而导致目标集群 BE Crash。[#46068](https://github.com/StarRocks/starrocks/pull/46068) +- 查询使用非确定性函数时报错 `Multiple entries with same key`。[#46602](https://github.com/StarRocks/starrocks/pull/46602) + +## 3.2.7 + +发布日期:2024 年 5 月 24 日 + +### 新增特性 + +- Stream Load 支持在传输过程中对数据进行压缩,减少网络带宽开销。可以通过 `compression` 或 `Content-Encoding` 参数指定不同的压缩方式,支持 GZIP、BZIP2、LZ4_FRAME、ZSTD 压缩算法。[#43732](https://github.com/StarRocks/starrocks/pull/43732) +- 优化了存算分离集群的垃圾回收机制,支持手动对表或分区进行 Compaction 操作,可以更高效的回收对象存储上的数据。[#39532](https://github.com/StarRocks/starrocks/issues/39532) +- 支持从 StarRocks 读取 ARRAY、MAP 和 STRUCT 等复杂类型的数据,并以 Arrow 格式可提供给 Flink connector 读取使用。[#42932](https://github.com/StarRocks/starrocks/pull/42932) [#347](https://github.com/StarRocks/starrocks-connector-for-apache-flink/pull/347) +- 支持查询时异步填充 Data Cache,从而减少缓存填充对首次查询性能影响。[#40489](https://github.com/StarRocks/starrocks/pull/40489) +- 外表 ANALYZE TABLE 命令支持收集直方图统计信息,可以有效应对数据倾斜场景。参见 [CBO 统计信息](https://docs.starrocks.io/zh/docs/3.2/using_starrocks/Cost_based_optimizer/#%E9%87%87%E9%9B%86-hiveiceberghudi-%E8%A1%A8%E7%9A%84%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF)。[#42693](https://github.com/StarRocks/starrocks/pull/42693) +- Lateral Join 结合 [UNNEST](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-functions/array-functions/unnest/) 支持 LEFT JOIN。[#43973](https://github.com/StarRocks/starrocks/pull/43973) +- Query Pool 内存支持通过 BE 静态参数 `query_pool_spill_mem_limit_threshold` 配置 Spill 阈值,如果超过阈值,查询可以通过中间结果落盘的方式降低内存占用减少 OOM。[#44063](https://github.com/StarRocks/starrocks/pull/44063) +- 支持基于 Hive View 创建异步物化视图。[#45085](https://github.com/StarRocks/starrocks/pull/45085) + +### 功能优化 + +- 优化 Broker Load 任务导入 HDFS 数据时对应路径下没有数据时的报错信息。[#43839](https://github.com/StarRocks/starrocks/pull/43839) +- 优化 Files 函数读取 S3 数据场景下没有配置 Access Key 和 Secret Key 的报错信息。[#42450](https://github.com/StarRocks/starrocks/pull/42450) +- 优化 Broker Load 导入时任何分区下均没有数据导入的报错信息。[#44292](https://github.com/StarRocks/starrocks/pull/44292) +- 优化 INSERT INTO SELECT 导入时,目标表与 SELECT 列数据不匹配的场景下的报错信息。[#44331](https://github.com/StarRocks/starrocks/pull/44331) + +### 问题修复 + +修复了如下问题: + +- BITMAP 类型在并发读写场景下可能会导致 BE Crash。[#44167](https://github.com/StarRocks/starrocks/pull/44167) +- 主键索引可能会导致 BE Crash。[#43793](https://github.com/StarRocks/starrocks/pull/43793) [#43569](https://github.com/StarRocks/starrocks/pull/43569) [#44034](https://github.com/StarRocks/starrocks/pull/44034) +- str_to_map 函数并发场景下可能会导致 BE Crash。[#43901](https://github.com/StarRocks/starrocks/pull/43901) +- Apache Ranger 的 Masking 策略下,在查询中添加表的别名报错。[#44445](https://github.com/StarRocks/starrocks/pull/44445) +- 存算分离模式下执行过程中某个节点异常,无法路由到备用节点。同时针对该问题,优化部分报错信息。[#43489](https://github.com/StarRocks/starrocks/pull/43489) +- 在容器环境下获取内存信息不正确。[#43225](https://github.com/StarRocks/starrocks/issues/43225) +- 取消 INSERT 任务时抛出异常。[#44239](https://github.com/StarRocks/starrocks/pull/44239) +- 无法动态创建基于表达式的动态分区。[#44163](https://github.com/StarRocks/starrocks/pull/44163) +- 创建分区可能导致 FE 死锁。[#44974](https://github.com/StarRocks/starrocks/pull/44974) + +## 3.2.6 + +发布日期:2024 年 4 月 18 日 + +### 问题修复 + +修复了如下问题: + +- 外表权限丢失。[#44030](https://github.com/StarRocks/starrocks/pull/44030) + + +## 3.2.5 (已下线) + +发布日期:2024 年 4 月 12 日 + +:::tip + +此版本因存在 Hive/Iceberg catalog 等外表权限相关问题已经下线。 + +- 问题:查询 Hive/Iceberg catalog 等外表时报错无权限,权限丢失,但用 `SHOW GRANTS` 查询时对应的权限是存在的。 +- 影响范围:对于不涉及 Hive/Iceberg catalog 等外表权限的查询,不受影响。 +- 临时解决方法:在对 Hive/Iceberg catalog 等外表进行重新授权后,查询可以恢复正常。但是 `SHOW GRANTS` 会出现重复的权限条目。后续在升级 3.2.6 后,通过 `REVOKE` 操作删除其中一条即可。 + +::: + +### 新增特性 + +- 支持 [dict_mapping](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-functions/dict-functions/dict_mapping/) 列属性,能够极大地方便构建全局字典中的数据导入过程,用以加速计算精确去重等。 + +### 行为变更 + +- JSON 中的 null 值通过 `IS NULL` 等方式判断时,修改为按照 SQL 的 NULL 值计算。即,`SELECT parse_json('{"a": null}') -> 'a' IS NULL` 会返回 `true`(原来是返回 `false` )。 [#42765](https://github.com/StarRocks/starrocks/pull/42765) + +### 功能优化 + +- 优化 FILES 表函数自动探测文件 Schema 时的列类型合并规则。当不同文件中存在同名但类型不同的列时,FILES 会尽可能将更大粒度的类型作为最终的探测类型,比如分别为 FLOAT 和 INT 类型的同名列,最终返回 DOUBLE 类型。[#40959](https://github.com/StarRocks/starrocks/pull/40959) +- 主键表支持 Size-tiered Compaction 以减少 I/O 放大问题。[#41130](https://github.com/StarRocks/starrocks/pull/41130) +- 通过 Broker Load 导入 ORC 格式的数据,在 TIMESTAMP 类型的数据转化为 StarRocks 中的 DATETIME 类型的数据时,新增支持保留微秒信息。[#42179](https://github.com/StarRocks/starrocks/pull/42179) +- 优化 Routine Load 报错信息。[#41306](https://github.com/StarRocks/starrocks/pull/41306) +- 优化 FILES 表函数转换数据类型失败时的报错信息。[#42717](https://github.com/StarRocks/starrocks/pull/42717) + +### 问题修复 + +修复了如下问题: + +- 删除系统视图后 FE 启动失败。修复后禁止删除系统视图。[#43552](https://github.com/StarRocks/starrocks/pull/43552) +- 主键表 Sort Key 存在重复列情况下 BE Crash。修复后禁止 Sort Key 存在重复列。[#43206](https://github.com/StarRocks/starrocks/pull/43206) +- 当 JSON 对象为 NULL 时,to_json 函数返回错误。修复后,当 JSON 对象为 NULL 时,该函数返回 NULL 。[#42171](https://github.com/StarRocks/starrocks/pull/42171) +- 对于存算分离中的主键表,本地持久化索引的垃圾回收 (Garbage Collection) 和淘汰线程对 CN 节点没有生效,导致无用数据没有被删除。[#41955](https://github.com/StarRocks/starrocks/pull/41955) +- 存算分离模式下,修改主键表 `enable_persistent_index` 属性报错。[#42890](https://github.com/StarRocks/starrocks/pull/42890) +- 存算分离模式下,主键表部分列更新时未更新列的值被修改为 NULL。[#42355](https://github.com/StarRocks/starrocks/pull/42355) +- 物化视图在基表为逻辑视图情况下改写失败。[#42173](https://github.com/StarRocks/starrocks/pull/42173) +- 跨集群同步工具在迁移主键表到存算分离集群时 CN Crash。[#42260](https://github.com/StarRocks/starrocks/pull/42260) +- 外表物化视图范围分区不连续。[#41957](https://github.com/StarRocks/starrocks/pull/41957) + +## 3.2.4 (已下线) + +发布日期:2024 年 3 月 12 日 + +:::tip + +此版本因存在 Hive/Iceberg catalog 等外表权限相关问题已经下线。 + +- 问题:查询 Hive/Iceberg catalog 等外表时报错无权限,权限丢失,但用 `SHOW GRANTS` 查询时对应的权限是存在的。 +- 影响范围:对于不涉及 Hive/Iceberg catalog 等外表权限的查询,不受影响。 +- 临时解决方法:在对 Hive/Iceberg catalog 等外表进行重新授权后,查询可以恢复正常。但是 `SHOW GRANTS` 会出现重复的权限条目。后续在升级 3.2.6 后,通过 `REVOKE` 操作删除其中一条即可。 + +::: + +### 新增特性 + +- 存算分离集群中的云原生主键表支持 Size-tiered 模式 Compaction,以减轻导入较多小文件时 Compaction 的写放大问题。[#41034](https://github.com/StarRocks/starrocks/pull/41034) +- Storage Volume 支持 HDFS 的参数化配置,包括 Simple 认证方式支持配置 username,Kerberos 认证,NameNode HA,以及 ViewFS。 +- 新增日期函数 `milliseconds_diff`。[#38171](https://github.com/StarRocks/starrocks/pull/38171) +- 新增 Session 变量 `catalog`,用于指定当前会话所在的 Catalog。[#41329](https://github.com/StarRocks/starrocks/pull/41329) +- Hint 中支持设置[用户自定义变量](https://docs.starrocks.io/zh/docs/3.2/administration/Query_planning/#%E7%94%A8%E6%88%B7%E8%87%AA%E5%AE%9A%E4%B9%89%E5%8F%98%E9%87%8F-hint)。[#40746](https://github.com/StarRocks/starrocks/pull/40746) +- Hive Catalog 支持 CREATE TABLE LIKE。[#37685](https://github.com/StarRocks/starrocks/pull/37685) +- 新增 `information_schema.partitions_meta` 视图,提供丰富的 PARTITION 元信息。[#39265](https://github.com/StarRocks/starrocks/pull/39265) +- 新增 `sys.fe_memory_usage` 视图,提供 StarRocks 的内存使用信息。[#40464](https://github.com/StarRocks/starrocks/pull/40464) + +### 行为变更 + +- `cbo_decimal_cast_string_strict` 用于优化器控制 DECIMAL 类型转为 STRING 类型的行为。默认值是 `true`,即执行严格转换(按 Scale 截断补 `0`)。在历史版本中没有严格按照 DECIMAL 类型进行补齐,从而在 DECIMAL 与 STRING 类型进行比等时会产生不同效果。[#40619](https://github.com/StarRocks/starrocks/pull/40619) +- Iceberg Catalog 的参数 `enable_iceberg_metadata_cache` 默认值改为 `false`。在 3.2.1 到 3.2.3 版本,该参数默认值统一为 `true`。自 3.2.4 版本起,如果 Iceberg 集群的元数据服务为 AWS Glue,该参数默认值仍为 `true`,如果 Iceberg 集群的元数据服务为 Hive Metastore(简称 HMS)或其他,则该参数默认值变更为 `false`。[#41826](https://github.com/StarRocks/starrocks/pull/41826) +- 修改能发起物化视图刷新任务的用户,从原本的 `root` 用户变成创建物化视图的用户,已有的物化视图不受影响。[#40670](https://github.com/StarRocks/starrocks/pull/40670) +- 常量和字符串类型的列进行比较时,默认按字符串进行比较,用户可以通过设置变量 `cbo_eq_base_type` 来调整默认行为。将 `cbo_eq_base_type` 设置为 `decimal` 可以改为按数值进行比较。[#40619](https://github.com/StarRocks/starrocks/pull/40619) + +### 功能优化 + +- 存算分离架构中,支持将数据分区存储于兼容 S3 的存储桶中的不同分区(子路径)中,分区路径使用统一前缀。此举可以提升 StarRocks 对 S3 文件的读写访问效率。[#41627](https://github.com/StarRocks/starrocks/pull/41627) +- 支持通过 `s3_compatible_fs_list` 参数设置可以使用 AWS SDK 接入的 S3 兼容对象存储。同时支持通过 `fallback_to_hadoop_fs_list` 参数配置需要通过 HDFS 的 Schema 接入的非 S3 兼容对象存储(该方法需要使用厂商提供的 JAR 包)。[#41123](https://github.com/StarRocks/starrocks/pull/41123) +- 优化 Trino 语法兼容性,支持 Trino 的 `current_catalog`、`current_schema`、`to_char`、`from_hex`、`to_date`、`to_timestamp` 以及 `index` 函数的语法转换。[#41217](https://github.com/StarRocks/starrocks/pull/41217) [#41319](https://github.com/StarRocks/starrocks/pull/41319) [#40803](https://github.com/StarRocks/starrocks/pull/40803) +- 优化物化视图改写,支持基于逻辑视图创建的物化视图的改写。[#42173](https://github.com/StarRocks/starrocks/pull/42173) +- 优化 STRING 向 DATETIME 类型转换的效率,性能约提升 35%~40%。[#41464](https://github.com/StarRocks/starrocks/pull/41464) +- 聚合表中 BITMAP 类型的列支持指定聚合类型为 `replace_if_not_null`,从而支持部分列更新。[#42034](https://github.com/StarRocks/starrocks/pull/42034) +- 优化 Broker Load 导入 ORC 小文件时的性能。[#41765](https://github.com/StarRocks/starrocks/pull/41765) +- 行列混存表支持 Schema Change。[#40851](https://github.com/StarRocks/starrocks/pull/40851) +- 行列混存表支持 BITMAP、HLL、JSON、ARRAY、MAP 和 STRUCT 等复杂类型。[#41476](https://github.com/StarRocks/starrocks/pull/41476) +- 新增内部 SQL 日志,其中包含统计信息和物化视图等相关的日志信息。[#40453](https://github.com/StarRocks/starrocks/pull/40453) + +### 问题修复 + +修复了如下问题: + +- 当创建 Hive 视图的查询语句中存在同一个表或视图的名称或别名大小写不一致的情况时,会出现 "Analyze Error" 的问题。[#40921](https://github.com/StarRocks/starrocks/pull/40921) +- 主键表使用持久化索引会导致磁盘 I/O 打满。[#39959](https://github.com/StarRocks/starrocks/pull/39959) +- 存算分离集群中,主键索引目录每 5 小时会被错误删除。 [#40745](https://github.com/StarRocks/starrocks/pull/40745) +- 手动执行 ALTER TABLE COMPACT 后,Compaction 内存统计有异常。[#41150](https://github.com/StarRocks/starrocks/pull/41150) +- 主键表 Publish 重试时可能会卡住。[#39890](https://github.com/StarRocks/starrocks/pull/39890) + +## 3.2.3 + +发布日期:2024 年 2 月 8 日 + +### 新增特性 + +- 【公测中】支持行列混存的表存储格式,对于基于主键的高并发、低延时点查,以及数据部分列更新等场景有更好的性能。但目前还不支持 ALTER,Sort Key 和列模式部分列更新。 +- 支持异步物化视图的备份(BACKUP)和恢复(RESTORE)。 +- Broker Load 支持 JSON 格式的数据的导入。 +- 支持基于视图创建的物化视图的查询改写。例如,直接基于视图创建了物化视图,后续基于该视图的查询可以被改写到物化视图上。 +- 支持 CREATE OR REPLACE PIPE。 [#37658](https://github.com/StarRocks/starrocks/pull/37658) + +### 行为变更 + +- 新增 Session 变量 `enable_strict_order_by`。当取值为默认值 `TRUE` 时,如果查询中的输出列存在不同的表达式使用重复别名的情况,且按照该别名进行排序,查询会报错,例如 `select distinct t1.* from tbl1 t1 order by t1.k1;`。该行为和 2.3 及之前版本的逻辑一致。如果取值为 `FALSE`,采用宽松的去重机制,把这类查询作为有效 SQL 处理。[#37910](https://github.com/StarRocks/starrocks/pull/37910) +- 新增 Session 变量 `enable_materialized_view_for_insert`,默认值为 `FALSE`,即物化视图默认不改写 INSERT INTO SELECT 语句中的查询。[#37505](https://github.com/StarRocks/starrocks/pull/37505) +- 单个查询在 Pipeline 框架中执行时所使用的内存限制不再受 `exec_mem_limit` 限制,仅由 `query_mem_limit` 限制。取值为 `0` 表示没有限制。 [#34120](https://github.com/StarRocks/starrocks/pull/34120) + +### 参数变更 + +- 新增 FE 配置项 `http_worker_threads_num`,HTTP Server 用于处理 HTTP 请求的线程数。默认取值为 0。如果配置为负数或 0 ,线程数将设置为 CPU 核数的 2 倍。[#37530](https://github.com/StarRocks/starrocks/pull/37530) +- 新增 BE 配置项 `lake_pk_compaction_max_input_rowsets`,用于控制存算分离集群下主键表 Compaction 任务中允许的最大输入 Rowset 数量,优化 Compaction 时资源的使用。[#39611](https://github.com/StarRocks/starrocks/pull/39611) +- 新增 Session 变量 `connector_sink_compression_codec`,用于指定写入 Hive 表或 Iceberg 表时以及使用 Files() 导出数据时的压缩算法,可选算法包括 GZIP、BROTLI、ZSTD 以及 LZ4。 [#37912](https://github.com/StarRocks/starrocks/pull/37912) +- 新增 FE 配置项 `routine_load_unstable_threshold_second`。[#36222](https://github.com/StarRocks/starrocks/pull/36222) +- 新增 BE 配置项 `pindex_major_compaction_limit_per_disk`,配置每块盘 Compaction 的最大并发数,用于解决 Compaction 在磁盘之间不均衡导致个别磁盘 I/O 过高的问题,默认取值为 `1`。[#36681](https://github.com/StarRocks/starrocks/pull/36681) +- 新增 BE 配置项 `enable_lazy_delta_column_compaction`,默认取值是 `true`,表示不启用频繁的进行 Delta Column 的 Compaction。[#36654](https://github.com/StarRocks/starrocks/pull/36654) +- 新增 FE 配置项 `default_mv_refresh_immediate`,用于控制物化视图创建完成后是否立刻进行刷新,默认值为 `true`,表示立刻刷新,`false` 表示延迟刷新。 [#37093](https://github.com/StarRocks/starrocks/pull/37093) +- 调整 FE 配置项 `default_mv_refresh_partition_num` 默认值为 `1`,即单次物化视图刷新需更新多个分区时,任务将分批执行,一次只刷新一个分区。此举可以减少每次刷新占用的资源。 [#36560](https://github.com/StarRocks/starrocks/pull/36560) +- 调整 BE/CN 配置项 `starlet_use_star_cache` 默认值为 `true`,即在存算分离模式下默认开启 Data Cache。如果您在升级前将 BE/CN 参数 `starlet_cache_evict_high_water` 配置为 `X`,则需要将 BE/CN 参数 `starlet_star_cache_disk_size_percent` 配置为 `(1.0 - X) * 100`。例如,如果您将 `starlet_cache_evict_high_water` 设置为 0.3,则需要设置 `starlet_star_cache_disk_size_percent` 为 70。此举可以确保 file data cache 和 Data Cache 不会超过磁盘容量上限。[#38200](https://github.com/StarRocks/starrocks/pull/38200) + +### 功能优化 + +- 对于分区字段为 TIMESTAMP 类型的 Iceberg 表,新增 `yyyy-MM-ddTHH:mm` 和 `yyyy-MM-dd HH:mm` 两种数据格式的支持。[#39986](https://github.com/StarRocks/starrocks/pull/39986) +- 监控 API 增加 Data Cache 相关指标。 [#40375](https://github.com/StarRocks/starrocks/pull/40375) +- 优化 BE 的日志打印,避免日志过多。 [#22820](https://github.com/StarRocks/starrocks/pull/22820) [#36187](https://github.com/StarRocks/starrocks/pull/36187) +- 视图 `information_schema.be_tablets` 中增加 `storage_medium` 字段。 [#37070](https://github.com/StarRocks/starrocks/pull/37070) +- 支持在多个子查询中使用 `SET_VAR`。 [#36871](https://github.com/StarRocks/starrocks/pull/36871) +- SHOW ROUTINE LOAD 返回结果中增加 `LatestSourcePosition`,记录数据源 Kafka 中 Topic 内各个分区的最新消息位点,便于检查导入延迟情况。[#38298](https://github.com/StarRocks/starrocks/pull/38298) +- WHERE 子句中 LIKE 运算符右侧字符串中不包括 `%` 或者 `_` 时,LIKE 运算符会转换成 `=` 运算符。[#37515](https://github.com/StarRocks/starrocks/pull/37515) +- 调整 Trash 文件的默认过期时间为 1 天(原来是 3 天)。[#37113](https://github.com/StarRocks/starrocks/pull/37113) +- 支持收集带 Partition Transform 的 Iceberg 表的统计信息。 [#39907](https://github.com/StarRocks/starrocks/pull/39907) +- 优化 Rountine Load 的调度策略,慢任务不阻塞其他正常任务的执行。[#37638](https://github.com/StarRocks/starrocks/pull/37638) + +### 问题修复 + +修复了如下问题: + +- ANALYZE TABLE 偶尔会卡住。 [#36836](https://github.com/StarRocks/starrocks/pull/36836) +- PageCache 内存占用在有些情况下会超过 BE 动态参数 `storage_page_cache_limit` 设定的阈值。[#37740](https://github.com/StarRocks/starrocks/pull/37740) +- Hive Catalog 的元数据在 Hive 表新增字段后不会自动刷新。[#37549](https://github.com/StarRocks/starrocks/pull/37549) +- 某些情况下 `bitmap_to_string` 会因为转换时数据类型溢出导致查询结果错误。[#37405](https://github.com/StarRocks/starrocks/pull/37405) +- `SELECT ... FROM ... INTO OUTFILE` 导出至 CSV 时,如果 FROM 子句中包含多个常量,执行时会报错:"Unmatched number of columns"。[#38045](https://github.com/StarRocks/starrocks/pull/38045) +- 查询表中半结构化数据时,某些情况下会导致 BE Crash。 [#40208](https://github.com/StarRocks/starrocks/pull/40208) + +## 3.2.2 + +发布日期:2023 年 12 月 30 日 + +### 问题修复 + +修复了如下问题: + +- 从 v3.1.2 及之前版本升级至 v3.2 后,FE 可能启动失败。 [#38172](https://github.com/StarRocks/starrocks/pull/38172) + +## 3.2.1 + +发布日期:2023 年 12 月 21 日 + +### 新增特性 + +#### 数据湖分析 + +- 支持通过 Java Native Interface(JNI)读取 Avro、SequenceFile 以及 RCFile 格式的 [Hive Catalog](https://docs.starrocks.io/zh/docs/3.2/data_source/catalog/hive_catalog/) 表和文件外部表。 + +#### 物化视图 + +- `sys` 数据库新增 `object_dependencies` 视图,可用于查询异步物化视图血缘关系。 [#35060](https://github.com/StarRocks/starrocks/pull/35060) +- 支持创建带有 WHERE 子句的同步物化视图。 +- Iceberg 异步物化视图支持分区级别的增量刷新。 +- [Preview] 支持基于 Paimon Catalog 外表创建异步物化视图,支持分区级别刷新。 + +#### 查询和函数 + +- 支持预处理语句(Prepared Statement)。预处理语句可以提高处理高并发点查查询的性能,同时有效地防止 SQL 注入。 +- 新增如下 Bitmap 函数:[subdivide_bitmap](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-functions/bitmap-functions/subdivide_bitmap/)、[bitmap_from_binary](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-functions/bitmap-functions/bitmap_from_binary/)、[bitmap_to_binary](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-functions/bitmap-functions/bitmap_to_binary/)。 +- 新增如下 Array 函数:[array_unique_agg](https://docs.starrocks.io/docs/3.2/sql-reference/sql-functions/array-functions/array_unique_agg/)。 + +#### 监控指标 + +- 新增了监控指标 `max_tablet_rowset_num`(用于设置 Rowset 的最大数量),可以协助提前发现 Compaction 是否会出问题并及时干预,减少报错信息“too many versions”的出现。[#36539](https://github.com/StarRocks/starrocks/pull/36539) + +### 参数变更 + +- 新增 BE 配置项 `enable_stream_load_verbose_log`,默认取值是 `false`,打开后日志中可以记录 Stream Load 的 HTTP 请求和响应信息,方便出现问题后的定位调试。[#36113](https://github.com/StarRocks/starrocks/pull/36113) + +### 功能优化 + +- 使用 JDK8 时,默认 GC 算法采用 G1。 [#37268](https://github.com/StarRocks/starrocks/pull/37268) +- 系统变量 [sql_mode](https://docs.starrocks.io/zh/docs/3.2/reference/System_variable/#sql_mode) 增加 `GROUP_CONCAT_LEGACY` 选项,用以兼容 [group_concat](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-functions/string-functions/group_concat/) 函数在 2.5(不含)版本之前的实现逻辑。[#36150](https://github.com/StarRocks/starrocks/pull/36150) +- 隐藏了审计日志(Audit Log)中 [Broker Load 作业里 AWS S3](https://docs.starrocks.io/zh/docs/3.2/loading/s3/) 的鉴权信息 `aws.s3.access_key` 和 `aws.s3.access_secret`。[#36571](https://github.com/StarRocks/starrocks/pull/36571) +- 在 `be_tablets` 表中增加 `INDEX_DISK` 记录持久化索引的磁盘使用量,单位是 Bytes。[#35615](https://github.com/StarRocks/starrocks/pull/35615) +- [SHOW ROUTINE LOAD](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-statements/data-manipulation/SHOW_ROUTINE_LOAD/) 返回结果中增加 `OtherMsg`,展示最后一个失败的任务的相关信息。[#35806](https://github.com/StarRocks/starrocks/pull/35806) + +### 问题修复 + +修复了如下问题: + +- 数据损坏情况下,建立持久化索引会引起 BE Crash。[#30841](https://github.com/StarRocks/starrocks/pull/30841) +- ARRAY_DISTINCT 函数偶发 BE Crash。[#36377](https://github.com/StarRocks/starrocks/pull/36377) +- 启用 DISTINCT 下推窗口算子功能时,对窗口函数的输出列的复杂表达式进行 SELECT DISTINCT 操作会报错。[#36357](https://github.com/StarRocks/starrocks/pull/36357) +- 某些兼容 S3 协议的对象存储会返回重复的文件,导致 BE Crash。[#36103](https://github.com/StarRocks/starrocks/pull/36103) + +## 3.2.0 + +发布日期:2023 年 12 月 1 日 + +### 新增特性 + +#### 存算分离 + +- 支持[主键表](https://docs.starrocks.io/zh/docs/3.2/table_design/table_types/primary_key_table/)的索引在本地磁盘的持久化。 +- 支持 Data Cache 在多磁盘间均匀分布。 + +#### 物化视图 + +**异步物化视图** + +- 物化视图支持 Query Dump。 +- 物化视图的刷新默认开启中间结果落盘,降低刷新的内存消耗。 + +#### 数据湖分析 + +- 支持在 [Hive Catalog](https://docs.starrocks.io/zh/docs/3.2/data_source/catalog/hive_catalog/) 中创建、删除数据库以及 Managed Table,支持使用 INSERT 或 INSERT OVERWRITE 导出数据到 Hive 的 Managed Table。 +- 支持 [Unified Catalog](https://docs.starrocks.io/zh/docs/3.2/data_source/catalog/unified_catalog/)。如果同一个 Hive Metastore 或 AWS Glue 元数据服务包含多种表格式(Hive、Iceberg、Hudi、Delta Lake 等),则可以通过 Unified Catalog 进行统一访问。 +- 支持通过 ANALYZE TABLE 收集 Hive 和 Iceberg 表的统计信息,并存储在 StaRocks 内部,方便优化加速后续查询。 +- 支持外表的 Information Schema,为外部系统(如BI)与 StarRocks 的交互提供更多便利。 + +#### 导入、导出和存储 + +- 使用表函数 [FILES()](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-functions/table-functions/files/) 进行数据导入新增以下功能: + - 支持导入 Azure 和 GCP 中的 Parquet 或 ORC 格式文件的数据。 + - 支持 `columns_from_path` 参数,能够从文件路径中提取字段信息。 + - 支持导入复杂类型(JSON、ARRAY、MAP 及 STRUCT)的数据。 +- 支持使用 INSERT INTO FILES() 语句将数据导出至 AWS S3 或 HDFS 中的 Parquet 格式的文件。有关详细说明,请参见[使用 INSERT INTO FILES 导出数据](https://docs.starrocks.io/zh/docs/3.2/unloading/unload_using_insert_into_files/)。 +- 通过增强 ALTER TABLE 命令提供了 [optimize table 功能](https://docs.starrocks.io/zh/docs/3.2/table_design/Data_distribution#建表后优化数据分布自-32),可以调整表结构并重组数据,以优化查询和导入的性能。支持的调整项包括:分桶方式和分桶数、排序键,以及可以只调整部分分区的分桶数。 +- 支持使用 PIPE 导入方式从[云存储 S3](https://docs.starrocks.io/zh/docs/3.2/loading/s3/#通过-pipe-导入) 或 [HDFS](https://docs.starrocks.io/zh/docs/3.2/loading/hdfs_load/#通过-pipe-导入) 中导入大规模数据和持续导入数据。在导入大规模数据时,PIPE 命令会自动根据导入数据大小和导入文件数量将一个大导入任务拆分成很多个小导入任务穿行运行,降低任务出错重试的代价、减少导入中对系统资源的占用,提升数据导入的稳定性。同时,PIPE 也能不断监听云存储目录中的新增文件或文件内容修改,并自动将变化的数据文件数据拆分成一个个小的导入任务,持续地将新数据导入到目标表中。 + +#### 查询 + +- 支持 [HTTP SQL API](https://docs.starrocks.io/zh/docs/3.2/reference/HTTP_API/SQL/)。用户可以通过 HTTP 方式访问 StarRocks 数据,执行 SELECT、SHOW、EXPLAIN 或 KILL 操作。 +- 新增 Runtime Profile,以及基于文本的 Profile 分析指令(SHOW PROFILELIST,ANALYZE PROFILE,EXPLAIN ANALYZE),用户可以通过 MySQL 客户端直接进行 Profile 的分析,方便定位瓶颈点并发现优化机会。 + +#### SQL 语句和函数 + +新增如下函数: + +- 字符串函数:substring_index、url_extract_parameter、url_encode、url_decode、translate +- 日期函数:dayofweek_iso、week_iso、quarters_add、quarters_sub、milliseconds_add、milliseconds_sub、date_diff、jodatime_format、str_to_jodatime、to_iso8601、to_tera_date、to_tera_timestamp +- 模糊/正则匹配函数:regexp_extract_all +- hash 函数:xx_hash3_64 +- 聚合函数:approx_top_k +- 窗口函数:cume_dist、percent_rank、session_number +- 工具函数:get_query_profile、is_role_in_session + +#### 权限 + +支持通过 [Apache Ranger](https://docs.starrocks.io/zh/docs/3.2/administration/ranger_plugin/) 实现访问控制,提供更高层次的数据安全保障,并且允许复用原有的外部数据源 Service。StarRocks 集成 Apache Ranger 后可以实现以下权限控制方式: + +- 访问 StarRocks 内表、外表或其他对象时,可根据在 Ranger 中创建的 StarRocks Service 配置的访问策略来进行访问控制。 +- 访问 External Catalog 时,也可以复用对应数据源原有的 Ranger service(如 Hive Service)来进行访问控制(当前暂未支持导出数据到 Hive 操作的权限控制)。 + +### 功能优化 + +#### 数据湖分析 + +- 优化了 ORC Reader: + - 优化 ORC Column Reader,VARCHAR 和 CHAR 数据读取性能有接近两倍提升。 + - 优化 ORC 文件 Zlib 压缩格式的解压性能。 +- 优化了 Parquet Reader: + - 支持自适应 I/O 合并,可根据过滤效果自适应是否合并带谓词的列和不带谓词的列,从而减少 I/O。 + - 优化 Dict Filter。针对对字典编码类型文件,支持更快的谓词改写、Dict Filter 支持 STRUCT 子列、按需进行字典列译码。 + - 优化 Dict Decode 性能。 + - 优化延迟物化性能。 + - 支持缓存文件 Footer,从而避免反复计算开销。 + - 支持读取 lzo 压缩格式。 +- 优化了 CSV Reader + - 优化了读取性能。 + - 支持读取 Snappy 和 lzo 压缩格式。 +- 优化了 Count 操作的性能。 +- 优化了 Iceberg Catalog 能力: + - 支持收集 Manifest 文件中的列统计信息为查询加速。 + - 支持收集 Puffin 文件中的 NDV(number of distinct values)为查询加速。 + - 支持分区裁剪。 + - 优化 Iceberg 元数据内存占用,提升在元数据量过大或查询并发较高时的稳定性。 + +#### 物化视图 + +**异步物化视图** + +- 异步物化视图自动刷新:当创建物化视图涉及的表、视图及视图内涉及的表、物化视图发生 Schema Change 或 Swap 操作后,物化视图可以进行自动刷新 +- 数据一致性: + - 创建物化视图时,添加了 `query_rewrite_consistency` 属性。该属性允许用户基于一致性检查结果定义查询改写规则。 + - 创建物化视图时,添加了 `force_external_table_query_rewrite` 属性。该属性用于定义是否为外表物化视图强制开启查询重写。 + - 有关详细信息,请参见[CREATE MATERIALIZED VIEW](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-statements/data-definition/CREATE_MATERIALIZED_VIEW/)。 +- 增加分区列一致性检查:当创建分区物化视图时,如物化视图的查询中涉及带分区的窗口函数,则窗口函数的分区列需要与物化视图的分区列一致。 + +#### 导入、导出和存储 + +- 优化主键表(Primary Key)表持久化索引功能,优化内存使用逻辑,同时降低 I/O 的读写放大。 +- 主键表(Primary Key)表支持本地多块磁盘间数据均衡。 +- 分区中数据可以随着时间推移自动进行降冷操作(List 分区方式暂不支持)。相对原来的设置,更方便进行分区冷热管理。有关详细信息,请参见[设置数据的初始存储介质、自动降冷时间](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-statements/data-definition/CREATE_TABLE#设置数据的初始存储介质自动降冷时间和副本数)。 +- 主键表数据写入时的 Publish 过程由异步改为同步,导入作业成功返回后数据立即可见。有关详细信息,请参见 [enable_sync_publish](https://docs.starrocks.io/zh/docs/3.2/administration/FE_configuration#enable_sync_publish)。 +- 支持 Fast Schema Evolution 模式,由表属性 [`fast_schema_evolution`](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-statements/data-definition/CREATE_TABLE#设置-fast-schema-evolution) 控制。启用该模式可以在进行加减列变更时提高执行速度并降低资源使用。该属性默认值是 `false`(即关闭)。不支持建表后通过 ALTER TABLE 修改该表属性。 +- 对于采用随机分桶的**明细表**,系统进行了优化,会根据集群信息及导入中的数据量大小[按需动态调整 Tablet 数量](https://docs.starrocks.io/zh/docs/3.2/table_design/Data_distribution#设置分桶数量)。 + +#### 查询 + +- Metabase 和 Superset 兼容性提升,支持集成 External Catalog。 + +#### SQL 语句和函数 + +- [array_agg](https://docs.starrocks.io/zh/docs/3.2/sql-reference/sql-functions/array-functions/array_agg/) 支持使用 DISTINCT 关键词。 +- INSERT、UPDATE 以及 DELETE 支持使用 `SET_VAR`。 [#35283](https://github.com/StarRocks/starrocks/pull/35283) + +#### 其他优化 + +- 新增会话变量 `large_decimal_underlying_type = "panic"|"double"|"decimal"`,用以设置超出范围的 DECIMAL 类型数据的转换规则。其中 `panic` 表示直接报错,`double` 表示转换为 DOUBLE 类型,`decimal` 表示转换为 DECIMAL(38,s)。 + +### 开发者工具 + +- 异步物化视图支持 Trace Query Profile,用于分析物化视图透明改写的场景。 + +### 行为变更 + +待更新。 + +### 参数变更 + +#### FE 配置项 + +- 新增以下 FE 配置项: + - `catalog_metadata_cache_size` + - `enable_backup_materialized_view` + - `enable_colocate_mv_index` + - `enable_fast_schema_evolution` + - `json_file_size_limit` + - `lake_enable_ingest_slowdown` + - `lake_ingest_slowdown_threshold` + - `lake_ingest_slowdown_ratio` + - `lake_compaction_score_upper_bound` + - `mv_auto_analyze_async` + - `primary_key_disk_schedule_time` + - `statistic_auto_collect_small_table_rows` + - `stream_load_task_keep_max_num` + - `stream_load_task_keep_max_second` +- 删除 FE 配置项 `enable_pipeline_load`。 +- 默认值修改: + - `enable_sync_publish` 默认值从 `false` 变为 `true`。 + - `enable_persistent_index_by_default` 默认值从 `false` 变为 `true`。 + +#### BE 配置项 + +- Data Cache 相关配置项变更。 + - 新增 `datacache_enable` 以取代 `block_cache_enable`。 + - 新增 `datacache_mem_size` 以取代 `block_cache_mem_size`。 + - 新增 `datacache_disk_size` 以取代 `block_cache_disk_size`。 + - 新增 `datacache_disk_path` 以取代 `block_cache_disk_path`。 + - 新增 `datacache_meta_path` 以取代 `block_cache_meta_path`。 + - 新增 `datacache_block_size` 以取代 `block_cache_block_size`。 + - 新增 `datacache_checksum_enable` 以取代 `block_cache_checksum_enable`。 + - 新增 `datacache_direct_io_enable` 以取代 `block_cache_direct_io_enable`。 + - 新增 `datacache_max_concurrent_inserts` 以取代 `block_cache_max_concurrent_inserts`。 + - 新增 `datacache_max_flying_memory_mb`。 + - 新增 `datacache_engine` 以取代 `block_cache_engine`。 + - 删除 `block_cache_max_parcel_memory_mb`。 + - 删除 `block_cache_report_stats`。 + - 删除 `block_cache_lru_insertion_point`。 + + Block Cache 更名为 Data Cache 后,StarRocks 引入一套新的以 `datacache` 为前缀的 BE 参数以取代原有以 `block_cache` 为前缀的参数。升级后,原有参数仍然生效,新参数在启用后将覆盖原有参数。但不支持新老参数混用,否则可能会导致部分配置不生效。未来,StarRocks 计划弃用原有以 `block_cache` 为前缀的参数,所以建议用户使用新的以 `datacache` 为前缀的参数。 + +- 新增以下 BE 配置项: + - `spill_max_dir_bytes_ratio` + - `streaming_agg_limited_memory_size` + - `streaming_agg_chunk_buffer_size` +- 删除以下 BE 配置项: + - 动态参数 `tc_use_memory_min` + - 动态参数 `tc_free_memory_rate` + - 动态参数 `tc_gc_period` + - 静态参数 `tc_max_total_thread_cache_bytes` +- 默认值修改: + - `disable_column_pool` 默认值从 `false` 变为 `true`。 + - `thrift_port` 默认值从 `9060` 变为 `0`。 + - `enable_load_colocate_mv` 默认值从 `false` 变为 `true`。 + - `enable_pindex_minor_compaction` 默认值从 `false` 变为 `true`。 + +#### 系统变量 + +- 新增以下会话变量: + - `enable_per_bucket_optimize` + - `enable_write_hive_external_table` + - `hive_temp_staging_dir` + - `spill_revocable_max_bytes` + - `thrift_plan_protocol` +- 删除以下会话变量: + - `enable_pipeline_query_statistic` + - `enable_deliver_batch_fragments` +- 变量更名: + - `enable_scan_block_cache` 更名为 `enable_scan_datacache`。 + - `enable_populate_block_cache` 更名为 `enable_populate_datacache`。 + +#### 保留关键字 + +新增保留关键字 `OPTIMIZE` 和 `PREPARE`。 + +### 问题修复 + +修复了如下问题: + +- 调用 libcurl 会引起 BE Crash。[#31667](https://github.com/StarRocks/starrocks/pull/31667) +- 如果 Schema Change 执行时间过长,会因为 Tablet 版本被垃圾回收而失败。[#31376](https://github.com/StarRocks/starrocks/pull/31376) +- 通过文件外部表无法读取存储在 MinIO 上的 Parquet 文件。[#29873] (https://github.com/StarRocks/starrocks/pull/29873) +- `information_schema.columns` 视图中无法正确显示 ARRAY、MAP、STRUCT 类型的字段。[#33431](https://github.com/StarRocks/starrocks/pull/33431) +- Broker Load 导入数据时某些路径形式下会报错 `msg:Fail to parse columnsFromPath, expected: [rec_dt]`。[#32720](https://github.com/StarRocks/starrocks/pull/32720) +- BINARY 或 VARBINARY 类型在 `information_schema.columns` 视图里面的 `DATA_TYPE` 和 `COLUMN_TYPE` 显示为 `unknown`。[#32678](https://github.com/StarRocks/starrocks/pull/32678) +- 包含大量 Union 以及表达式且查询列很多的复杂查询,容易导致单个 FE 节点带宽或者 CPU 短时间内占用较高。[#29888](https://github.com/StarRocks/starrocks/pull/29888) [#29719](https://github.com/StarRocks/starrocks/pull/29719) +- 某些情况下,物化视图刷新可能会出现死锁问题。[#35736](https://github.com/StarRocks/starrocks/pull/35736) + +### 升级注意事项 + +- 系统默认不开启**随机分桶**优化。如需启用该优化,需要在建表时新增 PROPERTIES `bucket_size`,从而允许系统根据集群信息及导入中的数据量大小按需动态调整 Tablet 数量。但需要注意的是,一旦开启开启该优化后,如需回滚到 v3.1 版本,必须删除开启该优化的表并手动执行元数据 Checkpoint(`ALTER SYSTEM CREATE IMAGE`)成功后才能回滚。 +- 从 v3.2.0 开始,StarRocks 禁用了非 Pipeline 查询。因此,在从低版本升级到 v3.2 版本之前,需要先全局打开 Pipeline 引擎(即在 FE 配置文件 **fe.conf** 中添加设置项 `enable_pipeline_engine=true`),否则非 Pipeline 查询会报错。