Meetup

Building Fast Data Lake Analysis on Top of Apache Iceberg & Apache Doris

dateDecember 18, 2024 | 5:00 PM - 8:00 PM GMT+8
address16 Collyer Quay, Level 12, Singapore

Briefing

Rayner Chen, Apache Doris PMC Chair, introduces how to build a fast data lake analysis solution based on Apache Iceberg and Apache Doris at the Singapore Apache Iceberg Community Meetup in December. The event is an opportunity to exchange ideas and spark innovation within the Apache Iceberg ecosystem.

What to expect:

Apache Doris can act as an OLAP query engine or a data processing engine to access data sources like Hive, Iceberg, Hudi, Delta Lake, and databases using JDBC.

Apache Iceberg is an excellent open table format for data analytics. It provides the ability of data insertion, update and deletion, schema evolution, partition evolution, and time travel, so it is one of the best big data storage formats and can act as a single source of truth.

The upstream data can be loaded in to Iceberg table using data pipeline tools such as Spark and Flink. On top of Iceberg, users can use Doris to query and manage Iceberg tables.

In the slides, you can learn from a detailed example of a quick start guide to building a Doris + Iceberg solution covering everthing from table creation and data insertion to querying and time travel.

Speakers

Rayner (Mingyu) Chen

Rayner (Mingyu) Chen

Apache Doris PMC Chair

Rayner (Mingyu) Chen is the Apache Doris PMC Chair and Vice President of Technology at VeloDB. Throughout his years of contributions to Apache Doris, he has nurtured the project's flourishing development and community growth while helping to advance the technical innovations of Apache Doris.

Download Slides