Comparisons

Apache Doris vs ClickHouse

Apache Doris and ClickHouse are both leading global real-time data warehouses, supporting columnar storage and fast queries. Doris has advantages like higher concurrency, more efficient joins, easier maintenance, and MySQL-like SQL syntax, making it simpler to use and deploy.

Featured Migration Cases

youzan

“Apache Doris has faster query response times than ClickHouse in the vast majority of scenarios, especially in complex join scenarios, where its performance is significantly superior to that of ClickHouse.”

icon

Core business queries 2-3x

icon

Complex join queries 2-10x

icon

Can run all ClickHouse OOM queries

kwai

“By replacing ClickHouse with Doris, Kwai successfully upgraded to a lakehouse architecture, simplifying the data pipeline and eliminating the need for data import, as Doris can directly access data lake data.”

icon

Directly query of data lake data

icon

Improved query performance

icon

Flexible data governance with materialized views

tencent-music

“Tencent Music's data platform has migrated from ClickHouse to Apache Doris, improving data timeliness and reducing maintenance costs. Doris' flexible ingestion methods and robust consistency protocol ensure high availability and reliability.”

icon

Massive boost in multi-table join performance

icon

Easy scaling and maintenance.

icon

Efficient data processing and real-time updates

Why Choose Apache Doris

Apache Doris

  • Architecture & SQL
    Based on MPP architecture
    Standard SQL, MySQL-compatible
  • Queries
    Distributed joins
    Cost-Based Optimization (CBO)
    Query rewrites and multi-table materialized views
    Higher concurrent performance
  • Real-time Updates
    Features a strongly consistent primary key storage model, supporting synchronous data updates and deletions
  • Data API
    Offers high-throughput read APIs based on Arrow-flight, facilitating integration with other engines such as data science/AI tools
  • Building Open Lakehouse
    Serves as a Lakehouse SQL engine, supporting queries on Hive, Hudi, Iceberg, and Parquet data lake formats
  • Operations & Maintenance
    Supports automatic scaling in, scaling out, and replica balancing
  • Performance
    In wide table benchmarks (ClickBench), Doris ranked top 1 or top 2 in October 2022 and October 2024, outperforming ClickHouse
    In large TPC-H and TPC-DS tests, Doris achieved leading performance
  • Deployment

    Doris is supported by a commercial company,VeloDB, which provides cloud-native services on AWS, Azure, and GCP, as well as mature SaaS and Bring Your Own Cloud (BYOC) versions

    VeloDB, the commercial entity behind Doris, also offers an on-premises enterprise version

ClickHouse

  • Architecture & SQL
    Uses Scatter-Gather architecture
    SQL-like capabilities but with non-standard SQL
  • Queries
    Poor join implementation
    Lacks a Cost-Based Optimizer (CBO)
    Only supports single-table materialized views
    Lower concurrency performance
  • Real-time Updates
    Only supports asynchronous updates, allowing old values to be read after an update.
  • Data API
    Only inefficient data reading via JDBC API
  • Building Open Lakehouse
    Limited Lakehouse integration capabilities.
  • Operations & Maintenance
    Requires manual rebalancing during scaling operations
  • Performance
    In terms of ClickBench performance, ClickHouse and Doris have been taking turns leading
    Experiences many OOM (Out of Memory) queries in large TPC-H and TPC-DS tests
  • Deployment
    Currently, ClickHouse's BYOC version is still in invite-only beta testing
    No on-premises enterprise version is available for ClickHouse

Performance Comparison

ClickBench Benchmark

ClickBench is a benchmarking tool created and maintained by the ClickHouse team to evaluate the performance of analytical databases.

It focuses on testing the performance of large, flat tables rather than complex multi-table joins. It uses real-world data from a major web analytics platform, covering typical scenarios such as clickstream analysis and structured logs.

The benchmark consists of a set of queries that test aggregation operations and single-table performance, without involving complex joins. This makes it especially useful for evaluating databases optimized for real-time analytics and large-scale data processing.

ClickBench  Benchmark

SSB-Flat SF100 Benchmark

SSB-Flat SF100 is a benchmark designed to test the performance of analytical databases in handling large, wide tables.

It is derived from the Star Schema Benchmark (SSB) but flattens the star schema into a single wide table to focus on the performance of single-table queries.

The SF100 indicates that the data scale is 100 times the base size, making it a significant test for evaluating query performance and system scalability.

ClickBench  Benchmark

TPC-H SF100 Benchmark

The TPC-H benchmark with a scale factor of 100 (SF100) is a widely used standard for evaluating database performance. It includes a set of complex SQL queries designed to simulate real-world business intelligence workloads.

The SF100 indicates that the data size is 100 times the base size, making it a large-scale test to measure query performance and system scalability.

Note: Since ClickHouse failed to execute 7 queries, the total execution time refers to the time taken by Doris to run all 22 queries, and by ClickHouse to run only 15 queries.

ClickBench  Benchmark

TPC-DS 1TB Benchmark

TPC-DS 1TB is a widely recognized benchmark for evaluating the performance of data warehouses and analytical databases. It involves a dataset of approximately 1TB in size, containing around 6.35 billion records spread across 24 tables.

The benchmark includes 99 complex queries designed to test various aspects of database performance, such as joins, aggregations, and subqueries.

The TPC-DS schema is based on a snowflake schema, representing real-world scenarios like web, catalog, and store sales. The 1TB scale is considered a moderate size for data warehouses but is still challenging due to the complexity of the queries and the large number of records

Note:TPC-DS makes heavy use of correlated subqueries which are at the time of testing (September 2024) not supported by ClickHouse. As a result, about 50% of benchmark queries will fail with errors.

TPC-DS 1TB Benchmark