What is ACID function and how it was impacting into Data lake storage environments? –Part5

Apache Hive

The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax.

  • Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
  • A mechanism to impose structure on a variety of data formats
  • HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools — including Pig and MapReduce — to more easily read and write data on the grid.
  • WebHCat provides a service that you can use to run Hadoop MapReduce (or YARN), Pig, Hive jobs. You can also perform Hive metadata operations using an HTTP (REST style) interface.

Insert Data

Let’s perform some row-level transactions available in Hive 0.14. Before creating a Hive table that supports transactions, the transaction features present in Hive needs to be turned on, as by default they are turned off.

Update Data

The below command is used to update a row in Hive table and we have successfully updated the data.

Delete Data

The above command will delete a single row in the Hive table and We have now successfully deleted a row from the Hive table.

Merge Data

Use the SQL MERGE command to insert, update, or delete rows in a target table using data from a source such as a table, view, or sub-query and based on rules specified in a matching condition in the merge statements.

Conclusion

Hive is a data warehouse system which is used for querying and analyzing large datasets stored in HDFS. Hive uses a query language call HiveQL which is similar to SQL. Hadoop uses MapReduce for processing data. MapReduce required users to write long codes. Not all users were well versed with Java and other coding languages. This proved to be a disadvantage for them. Hive was developed with a vision to incorporate the concepts of tables, columns just like SQL. Apache hive does not offer real-time queries and Latency of Apache Hive queries are generally very high.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Selvam Rangasamy-Big Data Engineer & Solution Arch

Selvam Rangasamy-Big Data Engineer & Solution Arch

2 Followers

I am Big Data Engineer & Solution Architect experience in various Cloud & Big data distribution systems, primarily on Hadoop & AWS Cloud services.