I am trying to work on the adult dataset, available at this link.
At the moment I'm stuck since the data I am able to crawl are in formats which are not completely known to me. Therefore, after downloading the files, I am not able to correcly get a pandas dataframe with the downloaded files.
I am able to download 3 files from UCI using the following links:
data = 'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'
names = 'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names'
test = 'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test'
They are respectively of formats .data
, .names
and .test
. I have always worked using .csv
format, therefore I am a little confused about these ones.
How can I get a pandas dataframe with the train data (= data + names) and a pandas dataframe with the test data (= test + names)?
This code won't completely work:
train_df = pd.read_csv(r'./adult.data', header=None)
train_df.head() # WORKING (without column names)
df_names = df = pd.read_csv(r'./adult.names')
df_names.head() # ERROR
test_df = pd.read_csv(r'./adult.test')
test_df.head() # ERROR
Use: