PowerShell: Code to extract data length and precision from Cobol Copybooks

142 Views Asked by At

=== Problem Statement === Code to extract data length in digits (not in bytes) and precision from Cobol Copybooks considering all the different data types for PIC format

Will this code be sufficient to handle all cases of PIC data ? Please let me know if there are other formats ?

=== List of the cases that the code is handling ===

PIC X(n): Extracts the length from format "PIC X(n)" where n is the number of characters. PIC 9(n): Extracts the length and integer length from format "PIC 9(n)" where n is the number of digits. PIC 9(n)V9(m): Extracts the length, integer length, and precision from format "PIC 9(n)V9(m)" where n is the number of integer digits and m is the number of decimal digits. PIC -9(n).99: Extracts the length, integer length, and precision from format "PIC -9(n).99" where n is the number of integer digits. PIC 9(n).99: Extracts the length, integer length, and precision from format "PIC 9(n).99" where n is the number of integer digits. PIC -9(n).9(m): Extracts the length, integer length, and precision from format "PIC -9(n).9(m)" where n is the number of integer digits and m is the number of decimal digits. PIC 9(n).9(m): Extracts the length, integer length, and precision from format "PIC 9(n).9(m)" where n is the number of integer digits and m is the number of decimal digits. PIC 9(n) OCCURS m TIMES: Extracts the length and integer length from format "PIC 9(n) OCCURS m TIMES" where n is the number of digits and m is the number of times the field occurs. Simple data types: PIC A, PIC 9: Sets the length to 1 for simple data types.

=== PowerShell Code ===

$copybookDataType = "PIC 9(09)V99."
$totalLength = ""
$integerLength = "N/A"
$precision = "N/A"

# 1. Handling format: PIC X(n).
if ($copybookDataType -match "^PIC\s+X\((\d+)\).$") {
    $totalLength = $Matches[1]
}
# 2. Handling format: PIC 9(n).
elseif ($copybookDataType -match "^PIC\s+(\d+)\((\d+)\).$") {
    $totalLength = $Matches[2]
    $integerLength = $Matches[1]
}
# 3. Handling format: PIC 9(09)V99.
elseif ($copybookDataType -match "^PIC\s+9\((\d+)\)V(\d+)\.$") {
    $integerLength = $Matches[1]
    $precision = ($Matches[2] -split "9").Count
    $totalLength = $integerLength + $precision + 1
}
# 4. Handling format: PIC -9(n).
elseif ($copybookDataType -match "^PIC\s+-9\((\d+)\).$") {
    $integerLength = $Matches[1]
    $totalLength = $integerLength + 1
}
# 5. Handling format: PIC 9(n)V9(m).
elseif ($copybookDataType -match "^PIC\s+(\d+)\(\d+V(\d+)\).$") {
    $integerLength = $Matches[1]
    $precision = $Matches[2]
    $totalLength = $integerLength + $precision + 1
}
# 6. Handling format: PIC -9(n).99.
elseif ($copybookDataType -match "^PIC\s+-9\((\d+)\)\.(\d+).$") {
    $integerLength = $Matches[1]
    $precision = ($Matches[2] -split "9").Count
    if ($copybookDataType -match "\.$") {
        $precision -= 1
    }
    $totalLength = $integerLength + $precision + 3
}
# 7. Handling format: PIC 9(n).99.
elseif ($copybookDataType -match "^PIC\s+9\((\d+)\)\.(\d+).$") {
    $integerLength = $Matches[1]
    $precision = ($Matches[2] -split "9").Count
    if ($copybookDataType -match "\.$") {
        $precision -= 1
    }
    $totalLength = $integerLength + $precision + 2
}
# 8. Handling format: PIC -9(n).9(m).
elseif ($copybookDataType -match "^PIC\s+-9\((\d+)\)\.9\((\d+)\).$") {
    $integerLength = $Matches[1]
    $precision = $Matches[2]
    if ($copybookDataType -match "\.$") {
        $precision -= 1
    }
    $totalLength = $integerLength + $precision + 4
}
# 9. Handling format: PIC 9(n).9(m).
elseif ($copybookDataType -match "^PIC\s+9\((\d+)\)\.9\((\d+)\).$") {
    $integerLength = $Matches[1]
    $precision = $Matches[2]
    if ($copybookDataType -match "\.$") {
        $precision -= 1
    }
    $totalLength = $integerLength + $precision + 3
}
# 10. Handling format: PIC 9(n) OCCURS m TIMES.
elseif ($copybookDataType -match "^PIC\s+(\d+)\(\d+\)\s+OCCURS\s+(\d+)\s+TIMES.$") {
    $dataTypeLength = $Matches[1]
    $occursTimes = $Matches[2]
    $totalLength = $dataTypeLength * $occursTimes
    $integerLength = $dataTypeLength
}
# 11. Handling simple data types: PIC A, PIC 9.
elseif ($copybookDataType -match "^PIC\s+(\w+).$") {
    $totalLength = 1
}
# 12. Handling format: PIC .9(n).
elseif ($copybookDataType -match "^PIC\s+\.9\((\d+)\).$") {
    $precision = $Matches[1]
    $totalLength = $precision + 1
}
# 13. Handling format: PIC -.9(n).
elseif ($copybookDataType -match "^PIC\s+-\.9\((\d+)\).$") {
    $precision = $Matches[1]
    $totalLength = $precision + 2
}

Write-Host "Total Length: $totalLength"
Write-Host "Integer Length: $integerLength"
Write-Host "Precision: $precision"
1

There are 1 best solutions below

0
On

Question: Will this code be sufficient to handle all cases of PIC data? Answer: No, for some of the reasons (mostly about formats) see the comments.

The main issue is that the match won't be done correctly according to COBOL rules, because you at least need to add records as well as consider line breaks and comments anywhere; in theory also word-continuation (VERY seldom used), conditional compilation (more often used), COPY REPLACING / REPLACE and use of constants that are possibly defined outside of the copybook or even with the compile options specified. Also: for some USAGEs different compilers will use different sizes (and SYNCHRONIZED may also change the size of the records due to padding, also different per compiler/environment/options).

Conclusion: while you can parse something in powershell - it likely should not be a COBOL copybook. Either gets much more complicated as you need to "parse COBOL" (the whole compilation unit, not only a single copybook) and be compiler/environment specific or will be usable only for "simple" inputs.

It would be most reasonable to put the compilation unit to a COBOL compiler that generates a symbol listing (most compilers have an option for that and can also syntax-check only), then parse that symbol listing (which includes the actual lengths used) with powershell (can also be done using pipes, no temporary files needed).

As an example: with GnuCOBOL that would be something like cobc $yourflags_like_dialect_and_copybook_directories -fsyntax-only -frelax-syntax -w -t sym.lst --tlines=0 -fno-tsource -ftsymbols source.cob (use - as output name instead of sym.lst if you want the result to be on stdout).