IBM 1600/6250 Mainframe EBCDIC and ASCII format data import to R

122 Views Asked by At

Has anyone successfully imported TX RRC commission data into R? Here

If so, how is it done? The dataset I downloaded is one of the ASCII format sets.

I got a sample of 1000 records from the UIC dataset into R. The first two bytes are the record type. Each record is reportedly 622 characters, but I find this is not the case in the ASCII files. Each record type (KEY) has unique fields.

It looks like the start of a series of records, comprising a well file, begins with "01", followed by several rows of information in various fixed width formats, and the well file ends with the start of the next one beginning with "01".

FYI, I am finding LaF package to be the most efficient on this one.

1

There are 1 best solutions below

3
Rui Barradas On

Those ASCII files have no column headers and the fields are separated by a "}" character. At least the two I have checked.

The following example is a file found in section Oil & Gas Field Data, row Gas Annual Report Field Table. I have chosen file "gsf384b" because it is the smallest at 8.03 KB.

Note that base read.table throws an error, the data is read in with readr::read_delim.

d1 <- "~/so_temp"
d2 <- "documents_20230401"
path <- file.path(d1, d2)
fl <- list.files(path, full.names = TRUE)

df1 <- read.table(fl, sep = "}")
#> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 71 did not have 31 elements

df1 <- readr::read_delim(fl, delim = "}", col_names = FALSE)
#> Rows: 91 Columns: 31
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "}"
#> chr  (9): X2, X3, X4, X5, X6, X7, X8, X16, X17
#> dbl  (3): X1, X9, X10
#> num  (3): X12, X13, X15
#> lgl (16): X11, X14, X18, X19, X20, X21, X22, X23, X24, X25, X26, X27, X28, X...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
readr::spec(df1)
#> cols(
#>   X1 = col_double(),
#>   X2 = col_character(),
#>   X3 = col_character(),
#>   X4 = col_character(),
#>   X5 = col_character(),
#>   X6 = col_character(),
#>   X7 = col_character(),
#>   X8 = col_character(),
#>   X9 = col_double(),
#>   X10 = col_double(),
#>   X11 = col_logical(),
#>   X12 = col_number(),
#>   X13 = col_number(),
#>   X14 = col_logical(),
#>   X15 = col_number(),
#>   X16 = col_character(),
#>   X17 = col_character(),
#>   X18 = col_logical(),
#>   X19 = col_logical(),
#>   X20 = col_logical(),
#>   X21 = col_logical(),
#>   X22 = col_logical(),
#>   X23 = col_logical(),
#>   X24 = col_logical(),
#>   X25 = col_logical(),
#>   X26 = col_logical(),
#>   X27 = col_logical(),
#>   X28 = col_logical(),
#>   X29 = col_logical(),
#>   X30 = col_logical(),
#>   X31 = col_logical()
#> )

# columns with all NA
i_na <- which(sapply(df1, \(x) all(is.na(x))))
# remove those columns and print the data
df1[-i_na]
#> # A tibble: 91 × 15
#>       X1 X2    X3    X4          X5    X6    X7    X8       X9   X10   X12   X13
#>    <dbl> <chr> <chr> <chr>       <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#>  1  2022 8A    <NA>  WILDCAT     N/A   M     N/A   N/A       5     0     0     0
#>  2  2022 8A    <NA>  ACKERLY (D… MART… <NA>  11-2… 8419      1     0     0     0
#>  3  2022 8A    <NA>  ALAN JOHNS… COTT… <NA>  2-13… 3684      1     0     0     0
#>  4  2022 8A    <NA>  ANNE TANDY… KING  <NA>  10-0… 5106      0     0     0     0
#>  5  2022 8A    <NA>  ARICK (YAT… FLOYD <NA>  9-04… 1345      0     0     0     0
#>  6  2022 8A    <NA>  ARMSTRONG … COTT… <NA>  6-25… 6248      0     0     0     0
#>  7  2022 8A    <NA>  BALE (YATE… GAIN… <NA>  2-13… 3422      0     0     0     0
#>  8  2022 8A    <NA>  BECKER (YA… TERRY <NA>  2-03… 3128      3     0     0     0
#>  9  2022 8A    <NA>  BIRNIE (CO… MOTL… <NA>  8-26… 8457      2     0     0     0
#> 10  2022 8A    <NA>  BIRNIE (ST… MOTL… <NA>  11-0… 8527      0     0     0     0
#> # … with 81 more rows, and 3 more variables: X15 <dbl>, X16 <chr>, X17 <chr>

Created on 2023-04-01 with reprex v2.0.2