Syncsort Sum Fields=None not removing duplicates

4.4k Views Asked by At

I'm trying to run a SYNCSORT job that will remove duplicate entries and when I run it, I'm still getting duplicates. The following is the SYNCSORT code I'm using:

INCLUDE COND=(((61,1,CH,EQ,C'P'),OR, 
              (61,1,CH,EQ,C'V')),AND,
              (8,2,CH,EQ,C'FL'))     
OUTREC FIELDS=(1:12,20,              
               30:36,20,             
               55:61,1)              
SORT FIELDS=(30,20,CH,A,             
             01,20,CH,A)             
SUM FIELDS=NONE                      

The input is as follows:

----+----1----+----2----+----3----+----4----+----5----+----6
      FL                           AMELIA CITY              
32034 FL  NASSAU                  FERNANDINA BEACH         P
32034 FL  NASSAU                  AMELIA CITY              V
32034 FL  NASSAU                  AMELIA ISLAND            S
32034 FL  NASSAU                  FERNANDINA               S

I'm getting most of the expected output, except that I'm still getting duplicates. The output that I have is as follows:

----+----1----+----2----+----3----+----4----+----5----+
MANATEE                      BRADENTON                P
MANATEE                      BRADENTON                P
MANATEE                      BRADENTON                P
MANATEE                      BRADENTON                P
MANATEE                      BRADENTON                P
MANATEE                      BRADINGTON               V
POLK                         BRADLEY                  P
HILLSBOROUGH                 BRANDON                  P
SUWANNEE                     BRANFORD                 P
MIAMI-DADE                   BRICKELL                 V

Any help would be appreciated as I'm not able to find my error.

2

There are 2 best solutions below

1
On BEST ANSWER

This is what you are sort summing on:

< ------------ Sort Field ----------------------->  
----+----1----+----2----+----3----+----4----+----5----+----6
      FL                           AMELIA CITY              
32034 FL  NASSAU                  FERNANDINA BEACH         P
32034 FL  NASSAU                  AMELIA CITY              V
32034 FL  NASSAU                  AMELIA ISLAND            S
32034 FL  NASSAU                  FERNANDINA               S

the Duplicate records will be different in the first 11 bytes which you can not see. Try removing the outrec to check.

Possible changes -

  • Change the outrec to an inrec
  • re-code the sort with fields associated with the output, see the following:

The following sort sorts based on the output records:

INCLUDE COND=(((61,1,CH,EQ,C'P'),OR, 
               (61,1,CH,EQ,C'V')),AND,
              (8,2,CH,EQ,C'FL'))     
OUTREC FIELDS=(1:12,20,              
              30:36,20,             
              55:61,1)              
SORT FIELDS=(42,20,CH,A,             
             12,20,CH,A)             
SUM FIELDS=NONE
0
On

It does not matter what order you code the different stages of a "sort", they will be executed in the order that SORT wants.

In your case this will be INCLUDE, then SORT, then SUM, then OUTREC. You can check that this is the case by entirely inverting the control cards, you will get identical output.

If you want to do something before SORT you use INREC, not just try to locate OUTREC before the SORT statement. Here, since you are SORTing, you only want to include the data you need. You do not want to include the spacing for formatting. Why would you want to load up your file to SORT with extra identical data on each record?

On INREC and OUTREC please don't use FIELDS. On OUTFIL please don't use OUTREC. It should be obvious that FIELDS is "overloaded" (see how many times you used FIELDS, and see how many are "the same") and OUTREC is "overloaded". More than 10 years ago BUILD was introduced to allow things to be much clearer - it describes what it is doing, and every time you see BUILD it only only means BUILD.

 INCLUDE COND=(((61,1,CH,EQ,C'P'), 
                OR, 
                (61,1,CH,EQ,C'V')),
               AND, 
                (8,2,CH,EQ,C'FL')) 
 INREC BUILD=(36,20, 
              12,20, 
              61,1) 
 SORT FIELDS=(1,40,CH,A) 
 OUTREC BUILD=(21,10, 
               10X, 
               1,20, 
               5X, 
               41,1) 

The INREC selects only the data you want, and in an order where you need specify only one SORT key.

The OUTREC then formats the data how you want it. For each record in the SORT 15 bytes were saved (the blanks). 10X is 10 blanks, 5X is five blanks.

Note that it is much easier, to code and understand, and more maintainable therefore, if you include "explicit" blanks rather than implicit ones using column numbers. Imaging 10 columns of a report, and the spacing between columns one and two are incorrect. Do you want to change all the column references, just to add one extra space, or would you prefer to change 7X to 8X and the rest works itself out? Even if you enjoy tedious changes, remember your colleagues :-)

If your data is already in order don't use SUM FIELDS=NONE. Use OUTFIL reporting features, REMOVECC, NODETAIL and SECTIONS with TRAILER3. NEVER SORT data just to allow you to remove duplicates with SUM FIELDS=NONE.