Apache Doris 3.0.3 just released

Release Notes
2024/12/02
Apache Doris

Dear community members, the Apache Doris 3.0.3 version was officially released on December 02, 2024, this version further enhances the performance and stability of the system.

Behavioral Changes

  • Prohibited column updates on MOW tables with synchronous materialized views. #40190
  • Adjusted the default parameters of RoutineLoad to improve import efficiency. #42968
  • When StreamLoad fails, the return value of LoadedRows is adjusted to 0. #41946 #42291
  • Adjusted the default memory limit of Segment cache to 5%. #42308 #42436

New Features

  • Introduced the session variable enable_cooldown_replica_affinity to control the affinity of cold and hot tiered replicas. #42677

  • Added table$partition syntax for querying partition information of Hive tables. #40774

  • Supported creation of Hive tables in Text format. #41860 #42175

Asynchronous Materialized Views

  • Introduced new materialized view attribute use_for_rewrite. When use_for_rewrite is set to false, the materialized view does not participate in transparent rewriting. #40332

Query Optimizer

  • Supported correlated non-aggregate subqueries. #42236

Query Execution

  • Added functions ngram_search, normal_cdf, to_iso8601, from_iso8601_date, SESSION_USER(), last_query_id. #38226 #40695 #41075 #41600 #39575 #40739
  • The aes_encrypt and aes_decrypt functions support GCM mode. #40004
  • Profile outputs the changed session variable values. #41016 #41318

Semi-structured Data Management

  • Added array functions array_match_all and array_match_any. #40605 #43514
  • The array function array_agg supports nesting ARRAY/MAP/STRUCT within ARRAY. #42009
  • Added approximate aggregate statistical functions approx_top_k and approx_top_sum. #44082

Improvements

Storage

  • Supported bitmap_empty as the default value. #40364
  • Introduced the session variable insert_timeout to control the timeout of DELETE statements. #41063
  • Improved some error message prompts. #41048 #39631
  • Improved the priority scheduling of replica repair. #41076
  • Enhanced the robustness of timezone handling when creating tables. #41926 #42389
  • Checked the validity of partition expressions when creating tables. #40158
  • Supported Unicode-encoded column names in DELETE operations. #39381

Compute-Storage Decoupled

  • Supported ARM architecture deployment in storage and compute separation mode. #42467 #43377
  • Optimized the eviction strategy and lock competition of file cache, improving hit rate and high concurrency point query performance. #42451 #43201 #41818 #43401
  • S3 storage vault supported use_path_style, solving the problem of using custom domain names for object storage. #43060 #43343 #43330
  • Optimized storage and compute separation configuration and deployment, preventing misoperations in different modes. #43381 #43522 #43434 #40764 #43891
  • Optimized observability and provided an interface for deleting specified segment file cache. #38489 #42896 #41037 #43412
  • Optimized Meta-service operation and maintenance interface: RPC rate limiting and tablet metadata correction. #42413 #43884 #41782 #43460

Lakehouse

  • Paimon Catalog supported Alibaba Cloud DLF and OSS-HDFS storage. #41247 #42585

  • Supported reading of Hive tables in OpenCSV format. #42257 #42942

  • Optimized the performance of accessing the information_schema.columns table in External Catalog. #41659 #41962

  • Used the new Max Compute open storage API to access Max Compute data sources. #41614

  • Optimized the scheduling policy of the JNI part of Paimon tables, making scan tasks more balanced. #43310

  • Optimized the read performance of small ORC files. #42004 #43467

  • Supported reading of parquet files in brotli compressed format. #42177

  • Added file_cache_statistics table under the information_schema library to view metadata cache statistics. #42160

Query Optimizer

Query Execution

  • Optimized the memory usage of the sort operator. #39306
  • Optimized the performance of computations on ARM. #38888 #38759
  • Optimized the computational performance of a series of functions. #40366 #40821 #40670 #41206 #40162
  • Used SSE instructions to optimize the performance of the match_ipv6_subnet function. #38755
  • Supported automatic creation of new partitions during insert overwrite. #38628 #42645
  • Added the status of each PipelineTask in Profile. #42981
  • IP type supported runtime filter. #39985

Semi-structured Data Management

  • Output the real SQL of prepared statements in audit logs. #43321
  • The filebeat doris output plugin supports fault tolerance and progress reporting. #36355
  • Optimized the performance of inverted index queries. #41547 #41585 #41567 #41577 #42060 #42372
  • The array function array overlaps supports acceleration using inverted indexes. #41571
  • The IP function is_ip_address_in_range supports acceleration using inverted indexes. #41571
  • Optimized the CAST performance of the VARIANT data type. #41775 #42438 #43320
  • Optimized the CPU resource consumption of the Variant data type. #42856 #43062 #43634
  • Optimized the metadata and execution memory resource consumption of the Variant data type. #42448 #43326 #41482 #43093 #43567 #43620

Permissions

  • Added a new configuration item ldap_group_filter in LDAP for custom group filtering. #43292

Other

  • Supported displaying connection count information by user in FE monitoring items. #39200

Bug Fixes

Storage

  • Fixed the issue with using IPv6 hostnames. #40074
  • Fixed the inaccurate display of broker/s3 load progress. #43535
  • Fixed the issue where queries might hang from FE. #41303 #42382
  • Fixed the issue of duplicate auto-increment IDs under exceptional circumstances. #43774 #43983
  • Fixed occasional NPE issues with groupcommit. #43635
  • Fixed the inaccurate calculation of auto bucket. #41675 #41835
  • Fixed the issue where FE might not correctly plan multi-table flows after restart. #41677 #42290

Compute-Storage Decoupled

  • Fixed the issue that MOW primary key tables with large delete bitmaps might cause coredump. #43088 #43457 #43479 #43407 #43297 #43613 #43615 #43854 #43968 #44074 #41793 #42142
  • Fixed the issue that segment files, when being a multiple of 5MB, would fail to upload objects. #43254
  • Fixed the issue that the default retry policy of aws sdk did not take effect. #43575 #43648
  • Fixed the issue that altering storage vault could continue execution even when the wrong type was specified. #43489 #43352 #43495
  • Fixed the issue that tablet_id might be 0 during the delayed commit process of large transactions. #42043 #42905
  • Fixed the issue that constant folding RCP and FE forwarding SQL might not be executed in the expected computation group. #43110 #41819 #41846
  • Fixed the issue that meta-service did not strictly check instance_id upon receiving RPC. #43253 #43832
  • Fixed the issue that FE follower information_schema version did not update in time. #43496
  • Fixed the issue of atomicity in file cache rename and inaccurate metrics. #42869 #43504 #43220

Lakehouse

  • Prohibited implicit conversion predicates from being pushed down to JDBC data sources to avoid inconsistent query results. #42102
  • Fixed some read issues with high-version Hive transactional tables. #42226
  • Fixed the issue that the Export command might cause deadlocks. #43083 #43402
  • Fixed the issue of being unable to query Hive views created by Spark. #43552
  • Fixed the issue that Hive partition paths containing special characters led to incorrect partition pruning. #42906
  • Fixed the issue that Iceberg Catalog could not use AWS Glue. #41084

Asynchronous Materialized Views

  • Fixed the issue that asynchronous materialized views might not refresh after the base table is rebuilt. #41762

Query Optimizer

  • Fixed the issue that partition pruning results might be incorrect when using multi-column range partitioning. #43332
  • Fixed the issue of incorrect calculation results in some limit offset scenarios. #42576

Query Execution

  • Fixed the issue that hash join with array types larger than 4G could cause BE Core. #43861
  • Fixed the issue that is null predicate operations might yield incorrect results in some scenarios. #43619
  • Fixed the issue that bitmap types might produce incorrect output results in hash join. #43718
  • Fixed some issues where function results were calculated incorrectly. #40710 #39358 #40929 #40869 #40285 #39891 #40530 #41948 #43588
  • Fixed some issues with JSON type parsing. #39937
  • Fixed issues with varchar and char types in runtime filter operations. #43758 #43919
  • Fixed some issues with the use of decimal256 in scalar and aggregate functions. #42136 #42356
  • Fixed the issue that arrow flight reported Reach limit of connections errors upon connection. #39127
  • Fixed the issue of incorrect memory usage statistics for BE in k8s environments. #41123

Semi-structured Data Management

  • Adjusted the default values of segment_cache_fd_percentage and inverted_index_fd_number_limit_percent. #42224
  • logstash now supports group_commit. #40450
  • Fixed the issue of coredump when building index. #43246 #43298
  • Fixed issues with variant index. #43375 #43773
  • Fixed potential fd and memory leaks under abnormal compaction circumstances. #42374
  • Inverted index match null now correctly returns null instead of false. #41786
  • Fixed the issue of coredump when ngram bloomfilter index bf_size is set to 65536. #43645
  • Fixed the issue of potential coredump during complex data type JOINs. #40398
  • Fixed the issue of coredump with TVF JSON data. #43187
  • Fixed the precision issue of bloom filter calculations for dates and times. #43612
  • Fixed the issue of coredump with IPv6 type storage. #43251
  • Fixed the issue of coredump when using VARIANT type with light_schema_change disabled. #40908
  • Improved cache performance for high-concurrency point queries. #44077
  • Fixed the issue that bloom filter indexes were not synchronized when columns were deleted. #43378
  • Fixed instability issues with es catalog under special circumstances such as mixed array and scalar data. #40314 #40385 #43399 #40614
  • Fixed coredump issues caused by abnormal regular pattern matching. #43394

Permissions

Other