I am trying to import 9000 .csv files into R to create one master file and would like to be able to do them much more efficiently than

read.csv(file="filename',header=TRUE, sep="\t")

Furthermore I want to skip the first 7 lines in each .csv as they contain information about the .csv file but not before i retrieve information from those lines and add them as new columns in the data file so that i can identify each subsequent file later on.

ive used the skip=7 option when importing individual .csv's before with no issue but I haven't been able to import multiple files at once let alone with taking some information from those first 7 lines first.

I've also tried reading in many .csv files from the one folder using the following code

temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)

every .csv takes the following format

Program 5.5.3
"rawFileName=""C:\...."""
From=0:00.0, To=3:32:13.7
Date=24May2014
Athlete=John Smith
EventDescription=Round 10 v Team B
Time Var1 Var2 Var3 Var4 Var5
0:00  0    0    0    0    0
0:01  1    1    4    0    0

and i want my code to make them look like this

Time   Var1 Var2 Var3 Var4 Var5 From    To         Date       Athlete     Event Description
0:00.0  0    0    0    0    0   0:00.0  3:32:13.7  24May2014  John Smith  Round 10 v Team B
0:00.1  1    1    4    0    0   0:00.0  3:32:13.7  24May2014  John Smith  Round 10 v Team B

The next athlete would be added below folowing the same format and so on

Has anyone else had a similar thing they've wanted to achieve and if so how did you do it?

2

There are 2 best solutions below

0
On

you want to manually extract the first 7 lines and leave the rest for read.delim. you can do that by using textConnection which allows you to pass strings to functions like read.table.

    allLines = readLines(address)
    metaData = allLines[1:7]
    data = read.delim(textConnection(paste0(allLines[8:length(allLines)],
                                            collapse='\n')))

then parse the metaData as you would do normally. I would put all these in a function that outputs a table and the metadata in a list. Through that you can have a list of lists that you can merge after.

0
On

This is very much a brute-force method since I didn't use any clever regex or anything, but if you say all the files are constructed in this way, the following might work:

I used readLines to get the input looking like this:

# [[1]]
# [1] "Program 5.5.3"
# 
# [[2]]
# [1] "\"rawFileName"      "\"\"C:\\....\"\"\""
# 
# [[3]]
# [1] "From"      "0:00.0"    "To"        "3:32:13.7"
# 
# [[4]]
# [1] "Date"      "24May2014"
# 
# [[5]]
# [1] "Athlete"    "John Smith"
# 
# [[6]]
# [1] "EventDescription"  "Round 10 v Team B"
# 
# [[7]]
# [1] "Time Var1 Var2 Var3 Var4 Var5"
# 
# [[8]]
# [1] "0:00  0    0    0    0    0"
# 
# [[9]]
# [1] "0:01  1    1    4    0    0"

And then made a simple function to process the data by selecting the proper list items and elements:

f <- function(filepath) {

  dat <- readLines(con <- file(filepath), warn = FALSE)
  close(con)
  x <- strsplit(dat, ', |=')

  res <- read.table(text = do.call(rbind, x[7:9]), header = TRUE, 
                    stringsAsFactors = FALSE)
  res <- within(res, {
    'Event Description' <- x[[6]][2]
    Athlete <- x[[5]][2]
    Date <- x[[4]][2]
    To <- x[[3]][4]
    From <- x[[3]][2]
  })
  return(res)
}

So now I give the file name and get this

f('~/desktop/tmp.csv')

# Time Var1 Var2 Var3 Var4 Var5   From        To      Date    Athlete   Event Description
# 1 0:00    0    0    0    0    0 0:00.0 3:32:13.7 24May2014 John Smith 1 Round 10 v Team B
# 2 0:01    1    1    4    0    0 0:00.0 3:32:13.7 24May2014 John Smith 2 Round 10 v Team B

And now you can repeat the process for all the files and merge them

## untested
do.call(rbind.data.frame, Map(f, all_file_paths))