Comparisons

Apache Doris vs Trino/Presto

Apache Doris and Trino/Presto are both popular data lakehouse query engines, but Doris outperforms Trino/Presto in terms of performance. While Trino/Presto are primarily query engines, Doris can also function as a standalone data warehouse. This enables enterprises to unify their data warehouse and Lakehouse query engine into one with Doris, simplifying their data architecture

Featured Migration Cases

Unified

Doris unifies data warehouse and Lakehouse query engine, simplifying the tech stack

10x

Doris native table boosts query performance by up to 10x compared to Presto/Trino

2-3x

Doris as a Lakehouse engine is 2-3x faster than Presto/Trino

Cisco WebEx’s early data platform used Trino, Pinot, Iceberg, and Kyuubi, but faced complexity, redundancy, and poor performance. By replacing them with Apache Doris, WebEx unified its data lakehouse and query engine, boosting performance and reducing costs by 30%.

webex logo

After switching from Presto to Doris, query performance significantly improved,reducing query time from 20-40 seconds to 1-2 seconds. By designing 2-3 materialized views based on common data dimensions, Doris can automatically match the optimal view for queries, further enhancing performance.

NetEase Games

Using Trino and SparkSQL, query latency was at the minute level, and performance was low. After switching to Doris, performance improved 2 times. Doris also unified the tech stack, simplifying the management of real-time and interactive analytics tools.

zto

Why Choose Apache Doris

Apache Doris

  • Architecture

    Unified Architecture: Combines the capabilities of a data warehouse and a Lakehouse query engine

  • Execution Engine
    Fully vectorized execution engine implemented in C++, for high-performance data processing
  • Query Optimizer
    Advanced query optimizer with cost-based optimization for complex SQL operations like joins, aggregations, and sorting
  • Caching Mechanisms

    Metadata Caching: In-memory metadata caching with TTL, auto-refresh, and incremental synchronization

    Data Caching: Hot data caching on local SSDs for reduced network I/O

    Query Caching: SQL Cache and Partition Cache for query result caching

  • Materialized Views

    Incremental Refresh: Supports incremental refresh and multiple update strategies

    Transparent Acceleration: Query optimizer automatically routes queries to the most suitable materialized views

  • User Cases
    High-concurrency real-time analytics
    Interactive analytics

Trino/Presto

  • Architecture

    Federated Querying: Excels in querying across multiple heterogeneous data sources without data movement,but lacks built-in storage

  • Execution Engine
    Implemented primarily in Java, with vectorization currently in development as part of the Hummingbird project
  • Query Optimizer
    Supports cost-based optimization but with less advanced statistics collection and manual full collection
  • Caching Mechanisms

    Data Caching: Relies on external caching solutions like Alluxio

  • Materialized Views

    Manual Refresh: Limited to manual, full refresh with less advanced features

  • User Cases
    Only Interactive analytics

Performance Comparison

TPC-DS 1TB Benchmark

The TPC-DS 1TB Benchmark evaluates data warehouse performance using a 1TB dataset with 6.35 billion records across 24 tables. It includes 99 complex queries to test joins, aggregations, and subqueries. Based on a snowflake schema, it simulates real-world sales scenarios. The 1TB scale is challenging due to query complexity.

The test environment consists of:

  • 1 FE/Coordinator node and 5 BE/Worker nodes.
  • Each node has 64 cores, 1.5TB of memory, and SSD storage.
  • HDFS is co-located on these nodes, and Hive tables are created.

In this test, using the same dataset and equal computing service, the results shows that:

  • When data is imported into Doris' internal tables and queried using Doris, it achieves the shortest execution time.
  • When Doris and Trino are used separately to query data directly from Hive tables, Doris demonstrates superior query acceleration performance in the data lake.
TPC-DS 1TB Benchmark