I am trying to make a variable called "combo". I want the county in all lowercase, including a space if there is one between two words, and NO SPACE between the county name and state abbreviation.
So far I have this:
county <- c("Abbeville County", "Aleutians West Census Area",
"Cerro Gordo County", "Lonoke County")
state <- c("West Virginia", "Wisconsin", "Wyoming", "Alabama")
trialdat <- data.frame(county, state)
trialdat$state <- sapply(trialdat$state, tolower)
# deal with trailing spaces
trim.trailing <- function (x) sub("\\s+$", "", x)
trialdat$state2 <- as.factor(trim.trailing(as.factor(trialdat$state)))
trialdat$StateAbbrev <- stateFromLower(trialdat$state2)
trialdat$county2 <- as.factor(trim.trailing(as.factor(trialdat$county)))
# make combo variable
trialdat = mutate(trialdat, combo=paste(tolower(gsub("County", "",county2)),
StateAbbrev, sep=""))
The desired output is a column with
combo
1 abbevilleWV
2 aleutians west census areaWI
3 cerro gordoWY
4 lonokeAL
Weird things are happening. With the county with spaces in the name, I get what I want. But with the other counties, a space remains after the county name. I can't simply gsub-out all of the spaces because I need them between the county name. Any ideas? Thank you!
Note: The statefromLower function is as follows, slightly tweaked from Chris' code. I include it because maybe the problem stems from this part, not sure.
stateFromLower <- function(x) {
# read 52 state codes into local variable [includes DC
# (Washington D.C. and PR (Puerto Rico)]
st.codes <- data.frame(state1 = as.factor(c("AK", "AL", "AR",
"AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI",
"IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD",
"ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE",
"NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA",
"PR", "RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT",
"WA", "WI", "WV", "WY")), full = as.factor(c("alaska",
"alabama", "arkansas", "arizona", "california", "colorado",
"connecticut", "district of columbia", "delaware", "florida",
"georgia", "hawaii", "iowa", "idaho", "illinois", "indiana",
"kansas", "kentucky", "louisiana", "massachusetts", "maryland",
"maine", "michigan", "minnesota", "missouri", "mississippi",
"montana", "north carolina", "north dakota", "nebraska",
"new hampshire", "new jersey", "new mexico", "nevada",
"new york", "ohio", "oklahoma", "oregon", "pennsylvania",
"puerto rico", "rhode island", "south carolina", "south dakota",
"tennessee", "texas", "utah", "virginia", "vermont",
"washington", "wisconsin", "west virginia", "wyoming")))
# create an nx1 data.frame of state codes from source column
st.x <- data.frame(full = x)
# match source codes with codes from 'st.codes' local
# variable and use to return the full state name
refac.x <- st.codes$state1[match(st.x$full, st.codes$full)]
# return the full state names in the same order in which they
# appeared in the original source
return(refac.x)
}
Thanks for your patience with formatting issues, this is my first question!
Fixed! In the mutate command, I had to add in a space before County.