I am trying to combine multiple Excel spreadsheets from two different directories into one directory using Pandas.
I created a function that will change the Census population data from a .csv data into an excel spreadsheet.
import requests, re, logging, sys
from os import path
import os
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from pathlib import Path
import shutil
def pop_data_clean():
pop_direc = 'Path A'
for pop_file in os.listdir(pop_direc):
pop_filename = os.fsdecode(pop_file)
if pop_filename.endswith('.csv'):
population = pd.read_csv(pop_file)
population = population[[ 'STATE', 'COUNTY', 'STNAME', 'CTYNAME' ,'YEAR' ,'POPESTIMATE',
'POPEST_MALE','POPEST_FEM' ,'AGE18PLUS_TOT', 'AGE18PLUS_MALE',
'AGE18PLUS_FEM','AGE65PLUS_TOT', 'AGE65PLUS_MALE','AGE65PLUS_FEM',
'MEDIAN_AGE_TOT', 'MEDIAN_AGE_MALE','MEDIAN_AGE_FEM']]
population = population.rename(columns={
'STNAME':'STATE NAME',
'CTYNAME':'CITY NAME',
'POPESTIMATE':'POPULATION ESTIMATE',
'POPEST_MALE':'TOTAL MALE POPULATION',
'POPEST_FEM':'TOTAL FEMALE POPULATION',
'AGE18PLUS_TOT':'TOTAL POPULATION ABOVE 18',
'AGE18PLUS_MALE':'TOTAL MALE POPULATION ABOVE 18',
'AGE18PLUS_FEM':'TOTAL FEMALE POPULATION ABOVE 18',
'AGE65PLUS_TOT':'TOTAL POPULATION ABOVE 65',
'AGE65PLUS_MALE':'TOTAL MALE POPULATION ABOVE 65',
'AGE65PLUS_FEM':'TOTAL FEMALE POPULATION ABOVE 65',
'MEDIAN_AGE_TOT':'MEDIAN AGE',
'MEDIAN_AGE_MALE':'MEDIAN MALE AGE',
'MEDIAN_AGE_FEM':'MEDIAN FEMALE AGE'})
population.to_excel(pop_filename[0:6] + '.xlsx', index=False)
I then did the same function for the ethnicity data and saved it into a different folder.
def ethnic_data_clean():
ethnic_direc = 'Path B'
for ethn_file in os.listdir(ethnic_direc):
ethn_filename = os.fsdecode(ethn_file)
if ethn_filename.endswith('.csv'):
Ethnic = pd.read_csv(ethn_file)
for i in range(0,18):
Ethnic = Ethnic[Ethnic.AGEGRP != i+1]
Ethnic = Ethnic[['WA_MALE', 'WA_FEMALE', 'BA_MALE', 'BA_FEMALE', 'IA_MALE', 'IA_FEMALE', 'AA_MALE', 'AA_FEMALE',
'NA_MALE', 'NA_FEMALE', 'NH_MALE', 'NH_FEMALE', 'H_MALE', 'H_FEMALE', ]]
Ethnic = Ethnic.rename(columns={
'WA_MALE': 'White Alone Male Population',
'WA_FEMALE':'White Alone Female Population',
'BA_MALE': 'Black or African American Male Population',
'BA_FEMALE':'Black or African American Female Population',
'IA_MALE':'American Indian and Alaska Native Male Population',
'IA_FEMALE':'American Indian and Alaska Native Female Population',
'AA_MALE':'Asian Male Population',
'AA_FEMALE':'Asian Female Population',
'NA_MALE':'Native Hawaiian and Other Pacific Islander Male Population',
'NA_FEMALE':'Native Hawaiian and Other Pacific Islander Female Population',
'NH_MALE': 'Non-Hispanic Male population',
'NH_FEMALE': 'Non-Hispanic Female population',
'H_MALE': 'Hispanic Male population',
'H_FEMALE':'Hispanic Female population'})
Ethnic.to_excel(ethn_filename[0:9] + '.xlsx', index=False)
I am using the same naming conventions between folder A "Path A" and folder B "Path B". The file starts with the 2 state name Abbreviation then an underscore then what it is. I want to combine the excel sheet from Path A and Path B into one excel sheet into Path C and loop through without having to code each individual state.
Example:
Path A has: AL_CensusData SC_CensusData
Path B has: AL_Ethnic SC_Ethnic
I want to combine AL_CensusData with AL_Ethnic data into AL_Master and save it in Path C and do that for each file in each folder. The cleaning process ensures that each file has the same number of rows so there won't be any Pandas errors. I can't figure out how to alternately loop between two separate directories and combining a file, saving it to a third folder, then moving to the next alternate file in the two folders.
I would first get the file names from both pathA and pathB directories, index them by state abbreviation, and then loop over states reading/writing appropriate files
Here is the code to do it, you may need to tweak it for your requirements but hopefully the idea is clear. I assume the direcotiries are './test/A', etc as defined in the code
Note the debugging print statement -- it produces, for me,
(I only created three states' worth of files)