how to get distinct count on 13 billion records in ABINITIO

105 Views Asked by At

I have 13 billion records as mfs file in abinito. I need to count distinct imsis that are grouped by date,city,district. I tried the two things coming to my mind but the operation is soo slow. How to count distinct values faster ?

1) length_of(vector_sort_dedup_first(accumulation( in.imsi_4g ))) in rollup having keys {date; city; district}

2) PBK {date; city; district; imsi_4g} , dedup sorted having keys {date_id; city_name; district_name; imsi_max_4g}

1

There are 1 best solutions below

5
X3R0 On

Do the processing in parallel

(each thread would process five hundred million records)

let distinct_count = length_of(in.imsi_max_4g) in rollup keys {date, city, district} parallel 26;