How to read a text file whose variables are not stored on the same row, and that lacks a standard delimiter from column to column, into R?

31 Views Asked by Anthony Colavito At 29 July 2025 at 05:49

I am trying to read a text file (https://www.bls.gov/bdm/us_age_naics_00_table5.txt) into R, but I am not sure how to go about parsing it. As you can see, the column names (years) are not located all on the same row, and the space between data is not consistent from column to column. I am familiar with using read.csv() and read.delim(), but I'm not sure how to go about reading a complex file like this one.

Original Q&A

There are 1 best solutions below

VitaminB16 On 27 May 2021 at 16:59

Here is a manual parse:

require(readr)
string = read_lines(file="https://www.bls.gov/bdm/us_age_naics_00_table5.txt")
string = string[nchar(string) != 0]
string = string[-c(1,2)]  # don't contain information
string = string[string != " "]
string = string[-151]     # footnote
sMatrix = matrix(string, nrow = 30)
dfList = sapply(1:ncol(sMatrix), function(x) readr::read_table(paste(sMatrix[,x])))
df = do.call(cbind,dfList)
df = df[,!duplicated(colnames(df))] # removes columns with duplicate names

If you then want to recode "_" as NA, and format the numbers:

df[df == "_"] = NA
df = as.data.frame(sapply(df, function(x) gsub(",","",x)))
i <- apply(df, 2, function(x) !any(is.na(as.numeric(na.omit(x))))) # if a column can be converted to numeric without any NAs, e.g. column 1 can't
df[,i] = lapply(df[,i], as.numeric)

How to read a text file whose variables are not stored on the same row, and that lacks a standard delimiter from column to column, into R?

There are 1 best solutions below

Related Questions in R

Related Questions in IMPORT

Related Questions in READ-DATA

Trending Questions

Popular # Hahtags

Popular Questions