I would like to do very simple thing, but cannot figure out how to do it in Python/Spark(1.5)/Dataframe (it's all new for me).
original dataset:
code| ISO | country
1 | AFG | Afghanistan state
2 | BOL | Bolivia Plurinational State
new dataset:
code| ISO | country
1 | AFG | Afghanistan
2 | BOL | Bolivia
I would like to do something like this (in pseudo Python?):
iso_to_country_dict = {'AFG': 'Afghanistan', 'BOL': 'Bolivia'}
def mapCountry(iso,country):
if(iso_to_country_dict[iso] is not empty):
return iso_to_country_dict[iso]
return country
dfg = df.select(mapCountry(df['ISO'],df['country']))
Just for simplicity the mapCountry could look like this:
def mapCountry(iso,country):
if(iso=='AFG'):
return 'Afghanistan'
return country
but with this is there error: ValueError: Cannot convert column into bool:
Well, I found solution, but don't know if this is the cleanest way how to do that. Any other ideas?
iso_to_country_dict = {'BOL': 'Bolivia', 'HTI': 'Cape Verde','COD':'Congo','PRK':'Korea','LAO':'Laos'}
note: C1,C2,..C5 are names of all other columns