I want to divide a table into two half such that the number of rows and sum of amount column is similar

133 Views Asked by At

For example please find the below data:

enter image description here

account balance
9999 110
9998 111
9997 112
9996 113
9995 114
9994 115
9993 116
9992 117
9991 118
9990 119

The output should be in such a way that there are 5 rows in Table_A and 5 Rows in Table_B and sum of balance column should almost be similar.

Want the output in SAS or PROC SQL.

I tried many ways in proc sql but not able to generate an output

2

There are 2 best solutions below

0
On

Here is a base SAS solution.

You can sort by balance then output odd rows to one data set and even rows to the other. This should ensure that they have roughly equal number of observations, and the sum of balance should be as equal as possible.

* create the input data;
data have;
    input account $ balance;
    datalines;
9999 110
9998 111
9997 112
9996 113
9995 114
9994 115
9993 116
9992 117
9991 118
9990 119
;
run;

* sort by balance;
proc sort data=have;
    by balance;
run;

* output odd rows to split1, even rows to split2;
data split1 split2;
    set have;
    if mod(_n_,2) then output split1;
    else output split2;
run;
0
On

Here's a pretty cool way to do it with MILP if you have OPTMODEL! This program:

  • Adds each account to a bucket with a 1/0 binary flag. These are denoted by b1 and b2
  • Minimizes the absolute difference between balances
  • Linearizes the absolute value function
  • Constrains the difference between the two sizes to be within 3 observations

Data:

data have;
    input account balance;
    datalines;
9999 110
9998 111
9997 112
9996 113
9995 114
9994 115
9993 116
9992 117
9991 118
9990 119
;
run;

Optimization code:

proc optmodel;
    set ACCOUNTS;

    num balance{ACCOUNTS};
    
    read data have into ACCOUNTS=[account] balance;

    var b1{ACCOUNTS} binary,
        b2{ACCOUNTS} binary,
        z integer /* To linearize absolute value in the objective function */
    ;

    /* n of buckets 1 and 2 */
    impvar n1 = sum{a in ACCOUNTS} (1*b1[a]);
    impvar n2 = sum{a in ACCOUNTS} (1*b2[a]);

    /* Total balance of each bucket */
    impvar total_balance1 = sum{a in ACCOUNTS} (b1[a]*balance[a]);
    impvar total_balance2 = sum{a in ACCOUNTS} (b2[a]*balance[a]);

    /* Difference of balances between buckets */
    impvar dif = total_balance1 - total_balance2;

    /* Linearize absolute value: abs(n1 - n2) must be <= 3 */
    con n1 - n2    <= 3;
    con -(n1-n2)   <= 3;

    /* Linearize absolute value in objective function */
    con dif  <= z;
    con -dif <= z;

    /* Must use all observations */
    con n1 + n2 = sum{a in ACCOUNTS} (1);

    /* Both cannot be 1 */
    con use_once {a in ACCOUNTS}: (b1[a] + b2[a]) <= 1;

    min total = z;

    solve;

    create data want
        from [account] = ACCOUNTS
             balance[account]
             b1[account]
             b2[account]
    ;

    print balance b1 b2 n1 n2 dif total_balance1 total_balance2;
quit;

Then use a data step to separate them out:

data want1
     want2
;
    set want;

    if(b1) then output want1;
        else output want2;

    drop b1 b2;
run;

Output Want1:

account balance
9998    111
9996    113
9994    115
9993    116
9991    118

Output Want2:

account balance
9999    110
9997    112
9995    114
9992    117
9990    119

enter image description here

Totally overkill, but a fun exercise.