Why I have written this series? I was worked 6 projects in hadoop(hortonworks/Clouera) and AWS technology. As per data ingestion layer used for HDFS & S3 storage but it will not support following feature like ACID, incremental data loading, data duplicate etc .. so we were used HBase, dynamodb and some scripts to achieve those functionality and it is good amount development effect involved with some bug. We were faced server performance issue in Hbase databases (good for random Read/Write Operations) like row key, delete etc.. and need provide separate infrastructure those framework including maintains. As per my suggestion try to use Apache Hudi/Delta Lake in your project and there is any heavy read operation consider Apache iceberg.

A Quick Comparison

Please check below table comparison for the entire four frameworks.

Apache Hive

The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax.

Apache Iceberg

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table.

User experience

· Iceberg avoids unpleasant surprises. Schema evolution works and won’t inadvertently un-delete data. …

Apache Hudi

Apache Hudi is an open-source data management framework used to simplify incremental data processing and data pipeline development by providing record-level insert, update, upsert, and delete capabilities. Upsert refers to the ability to insert records into an existing dataset if they do not already exist or to update them if…

Delta Lake Framework

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

Specifically, Delta Lake…

In RDBMS databases world there is no ACID function issue but Cloud/Hadoop technology world one of the major issue was ACID function implementation.

I decided to compare different and similar open-source projects like Delta Lake, Hudi, Iceberg and Hive. The idea is simple: prepare environment for all four technologies and…

Selvam Rangasamy-Big Data Engineer & Solution Arch

I am Big Data Engineer & Solution Architect experience in various Cloud & Big data distribution systems, primarily on Hadoop & AWS Cloud services.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store