On renaming a column in hive table, it removed all values of that column for its previous data prior to deployment

52 Views Asked by Toxicboy At 04 October 2023 at 02:55

We just went ahead with a deployment for one of our Hive based table. We renamed our column risk_old to risk_new (renamed). The table is period partitioned. However post deployment, we saw a strange behaviour that all data for that renamed column for all prior periods have been deleted/removed by null. Only newer period data is having values. How is this possible? How can we get back the data.

Sample data prior deployment:

id	risk_old	period
1234	high	20230927
2345	low	20230927
1234	high	20230928
2345	low	20230928

Steps for deployment:

Killing old schedule
Deployment of new code (with changes in code sourcing from new source).
ALTER statement for renaming column from risk_old to risk_new
Creating schedule for the new job

Post deployment data:

id	risk_new	period
1234		20230927
2345		20230927
1234		20230928
2345		20230928
1234	high	20230929
2345	low	20230929

We are looking on how the new column risk_new can also populate all periods data? We cannot revert back changes since all downstream apps have also changed their code renaming columns to fetch data from our table. One catch in UAT, if we try altering back table name from risk_new back to risk_old, it is showing older data but not the newer one.

Any leads in how to retain the name column name with all periods of data? Thanks in advance

Original Q&A

There are 1 best solutions below

Koushik Roy On 04 October 2023 at 05:20

Renaming columns in hive with tables stored as parquet doesnt work properly. Because the schema info is stored in a file and the schema info changes when you fire alter table statement but underlying data doesnt change. This shows odd/old values for that column.
Clean solution would be -

Create a table with new definition from old table and rename columns while selecting
create table newtable as select id, risk_old as risk_new, period from oldtable ;
Drop old table drop table oldtable.
Alter new table and rename to oldtable. alter table newtable rename to oldtable;

On renaming a column in hive table, it removed all values of that column for its previous data prior to deployment

There are 1 best solutions below

Related Questions in HIVE

Related Questions in BIGDATA

Related Questions in OOZIE

Related Questions in OOZIE-COORDINATOR

Related Questions in OOZIE-WORKFLOW

Trending Questions

Popular # Hahtags

Popular Questions