Unable to read in .xls from R

4.6k Views Asked by At
  • File: it is a file with about 100 rows and 7 columns. XLS extension (MS Excel 97-2003).
  • Issue: unable to read in the file with R through read.xlsx or read.xlsx2.
  • Code used to try to read the file:

    library(xlsxjars)
    library(rJava)
    library(xlsx) 
    excel <- read.xlsx("File.xls",sheetIndex=1,startRow=1,stringsAsFactor=F)
    
  • Error prompted:

Error in .jcall ("RJavaTools","Ljava/lang/object;","invokeMethod",cl, : java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream.

PD: for the record, I did try to read it with read.csv and it does read it but since it's an Excel file and it's not separated by commas or dots, R reads it as if everything is in 1 column. Maybe anyone can suggest a way to read it through read.csv?

2

There are 2 best solutions below

1
On BEST ANSWER

There are two things you have stated that make me think that you are not dealing with the type of file you think you are dealing with:

  1. Your InputStream was neither an OLE2 stream, nor an OOXML stream"
    
  2. "I did try to read it with read.csv() and it does read it ..."

If it really was a file in the MS Excel 97-2003 native format(s), read.csv would not have worked at all on it. The error from read.xlsx supports this. This appears to be a text file, and your comment that using sep="\t" in read.table allows it to be read correctly confirms it. What you have is a tab separated text file that happens to have the suffix .xls.

It should be noted that the suffix of a file is not a guarantee that the file is of the purported format. A much better indicators is running the file shell command on it (if that command is available on your operating system). File extensions are sometimes used by GUIs to determine which application to use to open the file with. As Excel is capable of reading a tab delimited file (and because Excel prorbably is not set to be the default application for a .tsv file, which would be the more traditional file extension), by using a .xls extension, the file will (generally) be opened by Excel.

In summary, just because the filename ends in .xls does not mean it really is an Excel native format file.

0
On

You can use this function, to read XML spreadsheets saved in 2003 .XLS format

    readExcelXML <- 
    function(filename) {
    library(xml2)
    library(magrittr)
    doc <- read_xml(filename)
    ns <- xml_ns(doc)
    rows <- xml_find_all(doc, paste0(".//ss:Worksheet/ss:Table/ss:Row"), ns = ns)
    values <- lapply(rows, . %>% xml_find_all(".//ss:Cell/ss:Data", ns = ns) %>%                 
    xml_text %>% unlist)

    columnNames <- values[[1]]

    dat <- do.call(rbind.data.frame, c(values[-1], stringsAsFactors = FALSE))
    names(dat) <- columnNames

    dat
    }