I have data frame as below. This is a sample set data with uniform looking patterns but whole data is not very uniform:
locationid address
1073744023 525 East 68th Street, New York, NY 10065, USA
1073744022 270 Park Avenue, New York, NY 10017, USA
1073744025 Rockefeller Center, 50 Rockefeller Plaza, New York, NY 10020, USA
1073744024 1251 Avenue of the Americas, New York, NY 10020, USA
1073744021 1301 Avenue of the Americas, New York, NY 10019, USA
1073744026 44 West 45th Street, New York, NY 10036, USA
I need to find the city and country name from this address. I tried the following:
1) strsplit This gives me a list but I cannot access the last or third last element from this.
2) Regular expressions finding country is easy
str_sub(str_extract(address, "\\d{5},\\s.*"),8,11)
but for city
str_sub(str_extract(address, ",\\s.+,\\s.+\\d{5}"),3,comma_pos)
I cannot find comma_pos
as it leads me to the same problem again.
I believe there is a more efficient way to solve this using any of the above approached.
Split the data
Then
will give the number of elements (so you can work backward). Then
gives you the last element. Or you could do
to get the last element.
To get the second-to-last (or more generally) you need