I have a dataset that contains information on events taken place around world. My intention is to aggregate this data to country-year level. But before doing that, I want to create a variable "capital.city", indicating whether an event has taken place in a capital city or not.
What I've done so far - consulting the AI Bing – is this:
library(countrycode)
library(maps)
# Load the world cities dataset
data("world.cities")
# Create a list of capital cities
capital_cities <- unique(world.cities$capital)
# Create a new variable indicating whether a city is a capital or not
dt_protest$capital_city <- ifelse(dt_protest$city %in% capital_cities, "capital", "non-capital")
But this doesn't work really - I get only non-capital values. What am I doing wrong?
Here's the sample of my data:
date month year city country
4/4/2006 4 2006 Lyon France
5/23/2021 5 2021 Abeokuta Nigeria
3/19/1996 3 1996 Kuala Lumpur Malaysia
11/30/2006 11 2006 Moscow Russia
11/30/2011 11 2011 Tinsukia India
1/4/2014 1 2014 Saharsa India
11/23/2016 11 2016 Venezuela Cuba
9/27/2019 9 2019 Shanghai China
5/22/2003 5 2003 Bonn Germany
12/7/2006 12 2006 Thetford United Kingdom
9/10/2010 9 2010 New Delhi India
11/17/2020 11 2020 Helsinki Finland
1/22/2011 1 2011 Berlin Germany
3/19/1993 3 1993 Jerusalem Israel
8/2/2004 8 2004 Mumbai India
12/9/2000 12 2000 Mumbai India
8/29/2001 8 2001 Guelph Canada
4/7/2003 4 2003 Seoul South Korea
9/11/2003 9 2003 Brussels Belgium
4/5/2006 4 2006 Hong Kong China
2/1/2007 2 2007 Kathmandu Nepal
10/4/2007 10 2007 Moscow Russia
9/3/2008 9 2008 Luanda Angola
10/21/2009 10 2009 JohannesburgSouth Africa
2/20/2010 2 2010 TashkentUzbekistan
7/20/2010 7 2010 Singur India
10/24/2011 10 2011 SrinagarIndia
11/14/2012 11 2012 Delhi India
1/2/2015 1 2015 Cairo Egypt
10/13/2015 10 2015 TinsukiaIndia
Bing's AI suggestion of
capital_cities <- unique(world.cities$capital)doesn't create a list of capital cities (surprise, AI led you astray!) - it creates a vector of integers of length 4 (c(0, 1, 3, 2)) which are the unique values for that column and do not take on any city names.You are getting all non-capital values because the city value will never take on the values of 0, 1, 2, or 3, so defaults to the "else" aspect of
ifelse, which is "not capital".If just using the city as the indicator, you should do:
Then you can use ab
ifelsestatement to create the new variable:However, this may cause a problem if there is a city in two countries where one is a capital and one is not. Paris, France and Paris, Indiana, USA are very different places. A "safer" approach may be to use
mergeon both the city and the country:In these data, the output are both:
Note the
world.citiesdataset indicates additional administrative capitals in China as 2 (municipal capital) or 3 (provincial capital) - see?world.cities. If you dont want to include those, change tounique(world.cities[world.cities$capital == 1, "name"]).