Briefing
Rayner Chen, Apache Doris PMC Chair, introduces how to build a fast data lake analysis solution based on Apache Iceberg and Apache Doris at the Singapore Apache Iceberg Community Meetup in December. The event is an opportunity to exchange ideas and spark innovation within the Apache Iceberg ecosystem.
What to expect:
Apache Doris can act as an OLAP query engine or a data processing engine to access data sources like Hive, Iceberg, Hudi, Delta Lake, and databases using JDBC.
Apache Iceberg is an excellent open table format for data analytics. It provides the ability of data insertion, update and deletion, schema evolution, partition evolution, and time travel, so it is one of the best big data storage formats and can act as a single source of truth.
The upstream data can be loaded in to Iceberg table using data pipeline tools such as Spark and Flink. On top of Iceberg, users can use Doris to query and manage Iceberg tables.
In the slides, you can learn from a detailed example of a quick start guide to building a Doris + Iceberg solution covering everthing from table creation and data insertion to querying and time travel.