Conditional trailing space deletion in R

145 Views Asked by At

I am trying to make a variable called "combo". I want the county in all lowercase, including a space if there is one between two words, and NO SPACE between the county name and state abbreviation.

So far I have this:

county <- c("Abbeville County", "Aleutians West Census Area",
           "Cerro Gordo County", "Lonoke County")
state <- c("West Virginia", "Wisconsin", "Wyoming", "Alabama")

trialdat <- data.frame(county, state)
trialdat$state <- sapply(trialdat$state, tolower)
# deal with trailing spaces 
trim.trailing <- function (x) sub("\\s+$", "", x)
trialdat$state2 <- as.factor(trim.trailing(as.factor(trialdat$state)))
trialdat$StateAbbrev <- stateFromLower(trialdat$state2)
trialdat$county2 <-     as.factor(trim.trailing(as.factor(trialdat$county)))
# make combo variable
trialdat = mutate(trialdat, combo=paste(tolower(gsub("County", "",county2)),
            StateAbbrev, sep=""))

The desired output is a column with

                       combo
1                  abbevilleWV
2 aleutians west census areaWI
3                cerro gordoWY
4                     lonokeAL

Weird things are happening. With the county with spaces in the name, I get what I want. But with the other counties, a space remains after the county name. I can't simply gsub-out all of the spaces because I need them between the county name. Any ideas? Thank you!

Note: The statefromLower function is as follows, slightly tweaked from Chris' code. I include it because maybe the problem stems from this part, not sure.

 stateFromLower <- function(x) {
  # read 52 state codes into local variable [includes DC
  # (Washington D.C. and PR (Puerto Rico)]
  st.codes <- data.frame(state1 = as.factor(c("AK", "AL", "AR", 
    "AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", 
    "IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", 
    "ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE", 
    "NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", 
    "PR", "RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT", 
    "WA", "WI", "WV", "WY")), full = as.factor(c("alaska", 
    "alabama", "arkansas", "arizona", "california", "colorado", 
    "connecticut", "district of columbia", "delaware", "florida", 
    "georgia", "hawaii", "iowa", "idaho", "illinois", "indiana", 
    "kansas", "kentucky", "louisiana", "massachusetts", "maryland", 
    "maine", "michigan", "minnesota", "missouri", "mississippi", 
    "montana", "north carolina", "north dakota", "nebraska", 
    "new hampshire", "new jersey", "new mexico", "nevada", 
    "new york", "ohio", "oklahoma", "oregon", "pennsylvania", 
    "puerto rico", "rhode island", "south carolina", "south dakota", 
    "tennessee", "texas", "utah", "virginia", "vermont", 
    "washington", "wisconsin", "west virginia", "wyoming")))

  # create an nx1 data.frame of state codes from source column
  st.x <- data.frame(full = x)
  # match source codes with codes from 'st.codes' local
  # variable and use to return the full state name
  refac.x <- st.codes$state1[match(st.x$full, st.codes$full)]
  # return the full state names in the same order in which they
  # appeared in the original source
  return(refac.x)
}

Thanks for your patience with formatting issues, this is my first question!

1

There are 1 best solutions below

0
On BEST ANSWER

Fixed! In the mutate command, I had to add in a space before County.

trialdat = mutate(trialdat, combo=paste(tolower(gsub(" County", "",     county2)), StateAbbrev, sep=""))