Decimals stored in scientific format in Hive table while loading it from Apache Spark

883 Views Asked by At

I am facing a problem with a hive table where decimal number such as 0.00000000000 is stored as 0E-11. Even though they are representing the same value 0, I do not understand why it is getting stored in scientific format. This is one of the percentage fields used for numeric calculation so the scale of the decimal number should be high. Even though it is in scientific format, it is not impacting our calculation in any way. We are able to do numeric operations but the representation in scientific format might cause some confusion for the people who are using this table. This issue is happening only when the percentage is 0. In other cases where there are valid percentages like 0.123456789, the value is stored as is without any epsilon.

Can you please explain why 0.00000000000 is represented in scientific format?. Also, I would like to know how can I store the decimal number as is without the epsilon like 0.00000000000. For our purpose, we want the solution to be in terms of Hive Query Language(HQL) only since we have a framework that takes hql file and writes the result of hql file to the hive table.

To demonstrate this issue, I followed the below steps.

  1. I created a temp table with a decimal and string column.
  2. It uses parquet as a file format.
  3. Inserted 0.00000000000 as a string as well as decimal(12,11).
  4. Displayed both the columns and both are displayed in scientific format.
  5. Tried using parquet-tools to inspect the file contents but even in the parquet file, it is stored in scientific format.
  6. Tried with plain text format also but the behavior is the same.

Spark-shell screenshot

parquet-tools screenshot

I am using Spark 2.3 for the run. I looked at various StackOverflow threads such as this, this, and this but they are using Spark Dataframe API to preserve the natural number format but I want the solution to be in terms of HQL.

Please let me know if there are any questions.

1

There are 1 best solutions below

3
On

I reckon format_number function should do the trick for you.

Please have a look at the below post

How to show decimal point in hive?

Thanks to user https://stackoverflow.com/users/4681341/vk-217?tab=profile

I checked it and it is working.

select format_number(0.00000000000,11);

Note: Don't have enough reputations to comment so adding it as an answer here.