readlines() and writelines() with read.delim

436 Views Asked by At

I have text files that contain behavioral data from a task. However, the first 18 lines in each file are descriptive information (date, time, ID numbers, etc.) all in a big block of text. The actual column names/data begin on the 19th line. Not an ideal format, but one I have to keep.

While researching the readlines() and writelines() function, it seems to be what I would need to read in a text file into R to reorganize the data then write it back out as a text file with the same block of text in the first 18 rows. I'm not sure actally how this would work- Do I need to combine readlines() and read.delim() somehow, or will readlines() also read in all my data under the 18th line as if I were to read.delim(location, skip=18)?

For reference, here is an example of what a text file I am working with looks like:


 # Non-editable header begin --------------------------------------------------------------------------------

#  data format...............: continuous
#  setname...................: 200ICAready
#  filename..................: none_specified
#  filepath..................: none_specified
#  nchan.....................: 29
#  pnts......................: 666445
#  srate.....................: 500
#  nevents...................: 1792
#  generated by (bdf)........: 
#  generated by (set)........: 200ICAready
#  reported in ..............: 
#  prog Version..............: 7.0.0
#  creation date.............: 10-Sep-2021 16:21:24
#  user Account..............: 
# 
#  Non-editable header end --------------------------------------------------------------------------------




# item   bepoch   ecode             label         onset           diff       dura   b_flags    a_flags    enable        bin
#                                                 (sec)           (msec)     (msec)    (binary)   (binary)


1       0            13               ""          9.9980          0.00      0.0     00000000     00000000      1    [       ]
2       0             4               ""         10.9990       1001.00      0.0     00000000     00000000      1    [       ]
3       0            10               ""         11.1990        200.00      0.0     00000000     00000000      1    [       ]
4       0            14               ""         11.3990        200.00      0.0     00000000     00000000      1    [       ]
5       0            13               ""         12.7320       1333.00      0.0     00000000     00000000      1    [       ]
6       0             1               ""         13.7320       1000.00      0.0     00000000     00000000      1    [       ]
7       0             7               ""         13.9320        200.00      0.0     00000000     00000000      1    [       ]

And here is what the result will look like:


 # Non-editable header begin --------------------------------------------------------------------------------

#  data format...............: continuous
#  setname...................: 200ICAready
#  filename..................: none_specified
#  filepath..................: none_specified
#  nchan.....................: 29
#  pnts......................: 666445
#  srate.....................: 500
#  nevents...................: 1792
#  generated by (bdf)........: 
#  generated by (set)........: 200ICAready
#  reported in ..............: 
#  prog Version..............: 7.0.0
#  creation date.............: 10-Sep-2021 16:21:24
#  user Account..............: 
# 
#  Non-editable header end --------------------------------------------------------------------------------




# item   bepoch   ecode             label         onset           diff       dura   b_flags    a_flags    enable        bin
#                                                 (sec)           (msec)     (msec)    (binary)   (binary)


1       0            13               ""          9.9980          0.00      0.0     00000000     00000000      1    [       ]
2       0             4               ""         10.9990       1001.00      0.0     00000000     00000000      1    [       ]
3       0            10               ""         11.1990        200.00      0.0     00000000     00000000      1    [       ]
4       0            15               ""         11.2500       200.00       0.0     00000000     00000000      1    [       ]
5       0            14               ""         11.3990        200.00      0.0     00000000     00000000      1    [       ]
6       0            13               ""         12.7320       1333.00      0.0     00000000     00000000      1    [       ]
7       0             1               ""         13.7320       1000.00      0.0     00000000     00000000      1    [       ]
8       0             19              ""         13.9320        200.00      0.0     00000000     00000000      1    [       ]

So, I need R to temporarily store the non-editable header section while I work with the data, then write it out as a text file with the header included.

Edit: I have the header and the data file read in separately and am now trying to find a way to merge them correctly. c(header, datafile) and merge(header, datafile) did not work.

1

There are 1 best solutions below

0
On

Check out my code. It should be very quick.

library(tidyverse)
library(data.table)
library(fs)

dataRead = function(file) fread(
  file = file, skip=26, 
  col.names = c("item","bepoch","ecode","label","onset","diff",
                "dura","b_flags","a_flags","enable","bin","bin2"),
  colClasses = c("integer", "integer", "integer", "character",
                 "double", "double", "double", "character",
                 "character", "integer", "character", "character")) %>% 
  as_tibble() %>% 
  mutate(bin = str_c(bin, "    ", bin2)) %>% select(-bin2)
  
width = c(1, 5, 9, 10, 11, 9, 6, 11, 11, 5, 8)
files = dir_ls("txtfiles", regexp = "\\.txt$")
if(length(files)>0){
  for(i in 1:length(files)){
    header = fread(file = files[i], nrows=24, sep = "|", header=FALSE)
    df = dataRead(files[i])
    df = df %>% mutate(bin = "[xxxx]")
    df = df %>% mutate(across(everything(), 
                              ~str_pad(.x, width[which(names(df)==cur_column())])))
    fwrite(header, files[i], append = FALSE, quote = FALSE, col.names = FALSE)
    fwrite(df, files[i], append = TRUE, col.names = FALSE, sep = " ", quote = FALSE)
  }
}

The program processes every txt file in the txtfiles folder. Reads header and data into tibble, mutate the tibble then write back to text file.