How can I improve runtime in SAS proc SQL with multiple where statements for a very large data set? (billions of records)

Question

How can I improve runtime in SAS proc SQL with multiple where statements for a very large data set? (billions of records)

188 Views Asked by Toni13 At 21 April 2023 at 12:02

I am strugling to query a claims database due to computational efficency and runtime. The server I'm using runs out of resources and does not execute the code bellow as a result. I'm essentially trying to look within a large claims database (2-3 billion rows) and find anyone with hypertension defined by the codes and columns listed bellow. I was hoping someone could provide a suggestion as how to achieve the same goal but improve runtime and not have the server crash.

proc sql;
create table hyper as 
select var1, var2
from claim_db       
WHERE SUBSTR(col1, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col2, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col3, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col4, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col5, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col6, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col7, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col8, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col9, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col10, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col11, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col12, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405') or
 SUBSTR(col13, 1, 3) IN ('I10', 'I11', 'I12', 'I13', 'I15', 'I16', '401','402','403','404','405');
quit;

I've even tried to do just one column at a time in hopes of just having 13 tables and then appending the tables later with the same logic as the code bellow, but the server still ran out of resources after just trying the first where statement. Does anyone have any suggestions how to achieve the same thing in this code with maximum efficency in mind?

Original Q&A

There are 1 best solutions below

**Quentin** · Answer 1 · 2023-04-21T12:53:24.530000

For billions of records, you probably need to work with the DBA who owns the database. You'd want to understand what database it is (Oracle? SQL server?), if there is an index on the columns, etc. Then try to figure out how to write the query efficiently in the native SQL. Then change to use an explicit pass-through query, to let the database do all the work before passing to SAS.

Your current code is probably requiring that the billions of records all be transferred from the database to SAS, so that SAS can subset them. You'll definitely want to change to explicit pass-through, so that the database runs the query and subsets the data, then only returns the selected records to SAS.

How can I improve runtime in SAS proc SQL with multiple where statements for a very large data set? (billions of records)

There are 1 best solutions below

Related Questions in SQL

Related Questions in PERFORMANCE

Related Questions in SAS

Related Questions in WHERE-IN

Trending Questions

Popular # Hahtags

Popular Questions