The input file have records as: 8712351,8712353,8712353,8712354,8712356,8712352,8712355 8712352,8712355
Using COBOL, I need to remove duplicates from the above file and write to an output file. I wrote simple logic to read records and write to an output file.
Where do I need to put the logic of removing duplicates (say, 8712353, 8712352) from the above file?
Here is the program logic:
IDENTIFICATION DIVISION.
PROGRAM-ID.RemoveDup.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'
ORGANIZATION IS LINE SEQUENTIAL.
SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'
ORGANIZATION IS LINE SEQUENTIAL.
DATA DIVISION.
FILE SECTION.
FD INPUTFILEDUP.
01 INPUTFILEDUPREC.
88 EOFINPUTFILEDUP VALUE HIGH-VALUES.
02 INPUTFILEID PIC 9(07).
FD OUTFILEDUP.
01 OUTFILEDUPREC PIC 9(07).
WORKING-STORAGE SECTION.
77 WS-VARIABLE PIC 9(09).
77 REC-NOT-MATCH PIC 9(01).
77 CUR-VARIABLE PIC 9(09).
PROCEDURE DIVISION.
BEGIN.
OPEN INPUT INPUTFILEDUP
OPEN OUTPUT OUTFILEDUP
READ INPUTFILEDUP
AT END SET EOFINPUTFILEDUP TO TRUE
END-READ
PERFORM UNTIL (EOFINPUTFILEDUP)
WRITE OUTFILEDUPREC FROM INPUTFILEID
READ INPUTFILEDUP
AT END SET EOFINPUTFILEDUP TO TRUE
PERFORM UNTIL (EOFINPUTFILEDUP)
END-READ
END-PERFORM
CLOSE INPUTFILEDUP
CLOSE OUTFILEDUP
STOP RUN.
I sorted the tnput file in ascending order as:
8712351,8712353,8712353,8712354,8712356,8712352,8712355,8712352,8712355
And it worked, and below is the modified code:
But suppose if my file is not in either ascending or descending order the where I need to write the sort logic before removing duplicates. How can update the below code for this? As I tried, but I was not successful in doing this if the input file structure is like:
8712351,8712353,8712353,8712354,8712356,8712352,8712355,8712352,8712355
IDENTIFICATION DIVISION.
PROGRAM-ID.RemoveDup2.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'
ORGANIZATION IS LINE SEQUENTIAL.
SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'
ORGANIZATION IS LINE SEQUENTIAL.
DATA DIVISION.
FILE SECTION.
FD INPUTFILEDUP.
01 INPUTFILEDUPREC.
88 EOFINPUTFILEDUP VALUE HIGH-VALUES.
02 INPUTFILEID PIC 9(07).
FD OUTFILEDUP.
01 OUTFILEDUPREC PIC 9(07).
WORKING-STORAGE SECTION.
77 WS-VARIABLE PIC 9(09) VALUE ZERO.
77 REC-NOT-MATCH PIC 9(01).
77 CUR-VARIABLE PIC 9(7) VALUE ZERO.
PROCEDURE DIVISION.
BEGIN.
OPEN INPUT INPUTFILEDUP
OPEN OUTPUT OUTFILEDUP
READ INPUTFILEDUP
AT END SET EOFINPUTFILEDUP TO TRUE
END-READ
PERFORM UNTIL (EOFINPUTFILEDUP)
IF INPUTFILEID NOT EQUAL TO WS-VARIABLE
MOVE INPUTFILEID TO WS-VARIABLE
WRITE OUTFILEDUPREC FROM INPUTFILEID
READ INPUTFILEDUP
AT END SET EOFINPUTFILEDUP TO TRUE
PERFORM UNTIL (EOFINPUTFILEDUP)
ELSE
DISPLAY "dUPLICATE FOUND" INPUTFILEID
READ INPUTFILEDUP
AT END SET EOFINPUTFILEDUP TO TRUE
END-READ
END-PERFORM
CLOSE INPUTFILEDUP
CLOSE OUTFILEDUP
STOP RUN.
When
OrganizationisSequential, the record deleted is the last record read. TheDeletestatement is valid only when the last operation against the file is a successfulReadstatement. If not, theDeletereturns aFile Statusvalue of 43. Because aDeletecannot returnFile Statusvalues beginning with a 2 when the file isOpenwithSequentialAccess, codingInvalid Keyon such aDeleteis not allowed.When
DynamicorRandomaccess is selected for the file, theDeletestatment, like theRewrite, becomes a little less restrictive. The record being deleted need not have bene previously read. Simply fill in the primaryKeyinformation in the record description for the fle and issue theDeletestatement. If the record does not exist, aFile Statusof 23 is returned and anInvalid Keycondition exists.From page 274 of
Sams Teach Yourself COBOL in 24 Hours
page 274 (which I have just dusted down from off my bookshelf). So in your case you'll presumably set up your records to be sorted by
INPUTFILEID, make a record as you go through of occurences of a givenINPUTFILEIDpast its first occurence, andDeleteaccordingly (after you have written it to your output file).