NOLOCK in Hive and Impala

748 Views Asked by At

I want to understand why we don't use the NOLOCK keyword at the end SQL queries that run in Impala, Hive or MySQL. For example, I use both Hive and Impala through the Cloudera workbench at work. Cloudera workbench does not recognise the NOLOCK keyword. Similarly, I have used MySQL before as well. Even there, NOLOCK wouldn't work. Why don't we need to use NOLOCK with Hive, Impala, MySQL or Big Query?

Disclaimer: I want to clarify that I have a decent understanding of the NOLOCK feature and its use cases. This question is related to SQL flavours that do not require NOLOCK.

2

There are 2 best solutions below

3
leftjoin On BEST ANSWER

First of all, let's clarify what is NOLOCK. NOLOCK table hint allows to override default transaction isolation level(READ COMMITTED). NOLOCK = READ UNCOMMITTED transaction isolation level, that allows the query to see the data changes before committing the transaction which is changing the data.

Hive does not support such isolation level (READ UNCOMMITTED) even in transaction mode. Oracle also does not support READ UNCOMMITTED, so no need in NOLOCK hint in these databases.

In Hive you can turn off concurrency.

0
Greg Frair On

Further to @leftjoin:

At this time only snapshot level isolation is supported. When a given query starts it will be provided with a consistent snapshot of the data. There is no support for dirty read, read committed, repeatable read, or serializable. With the introduction of BEGIN the intention is to support snapshot isolation for the duration of transaction rather than just a single query. Other isolation levels may be added depending on user requests.

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration