Replace Digits Using SubString and lpad/rpad

434 Views Asked by At

is there any way to replace digits using substring in Hive.

scenario:

1. I need to check 0's from right to left and as soon as i find 0 before any digit then i need to replace all the trailing 0's by 9.
2. In output at-least 3  digit should be there before 9.
3. In Input if 2 or less digits are available in input then I need to skip some 0's and make sure that at-least 3 digits are there before 9.
4. If more than 3 digits are available before trailing 0's then only need to replace 0.No need to replace digits.

see the below table

input     output

123000    123999
120000    120999
123400    123499
101010    101019

I have tried using below query, and it is working as expected.(Hive Join with CTE)

with mytable as (
select '123000' as input 
union all 
select '120000' as input 
union all 
select '123400' as input 
union all 
select '101010' as input
) 
select input,lpad(concat(splitted[0], translate(splitted[1],'0','9')),6,0) as output 
from (
select input, split(regexp_replace(input,'(\\d{3,}?)(0+)$','$1|$2'),'\\|') splitted from mytable )s;

but In my actual query which is more than 500+ lines,it is very difficult to adjust this logic (with CTE) for the sigle column. so wondering if is there any way to achieve the same using only lpad/rpad and substring/length and can achieve by adding the functions without using CTE queries.

so let say if length of digits before trailing 0's is less than 6 then can skip the substring 
from (input,1,6) and will replace the remaining 0's and if  length of digits before trailing 0's is  6 
or more then 6 then just keep digits as it is and replace remaining trailing 0's by 9.

Kindly Suggest.

My Actual Query Looks like.

with mytable as
(
select lpad(input,13,9) as output from mytable where code='00'
union
select output  from mytable where code='01'
)
select t1.*,m1.output from table1 t1 , mytable m1 where  
(t1.card='00' and substr(t1.low,1,13)<=m1.low and m1.output <= substr(t1.output,1,13) and m1.card='00' )
or
(t1.card='01' and substr(t1.low,1,16)<=m1.low and m1.output <=  substr(t1.output,1,16) and m1.card='01' )

I want to Replace above logic for 2nd output where code=01 in union query.

2

There are 2 best solutions below

0
On BEST ANSWER

It Worked Now.I modified My query as below.

with mytable as
(
select lpad(input,13,9) as output from mytable where code='00'
union
select lpad(concat(split(regexp_replace('(\\d{6,}?)(0+)$','$1|$2'),'\\|') [0], 
translate(split(regexp_replace(input,'(\\d{6,}?)(0+)$','$1|$2'),'\\|')[1],'0','9')),16,0 )
output  from mytable where code='01'
)
select t1.*,m1.output from table1 t1 , mytable m1 where  
(t1.card='00' and substr(t1.low,1,13)<=m1.low 
 and m1.output <= substr(t1.output,1,13) and m1.card='00' )
 or
(t1.card='01' and substr(t1.low,1,16)<=m1.low 
and output <=  substr(output,1,16) and m1.card='01')
6
On

Instead of UNION you can do the same using single query:

Instead of this

select lpad(output,13,9) as output from mytable where code='00'
union
select output  from mytable where code='01'

use this

select distinct
       case code when '00' then lpad(output,13,9)
                 when '01' then output  
        end output  
  from mytable 

If you need to filter codes, add where code in ('00','01')