I have researched this simple problem extensively but can't find an answer. I am trying to merge two files using pandas' pd.merge
based on a common column named "JN". I believe it is treating my 'joined' (os.path.join
) filename as a string instead of a dataframe/csv file. After I call the pd.merge
function the error says "string indices must be integers, not str".
import pandas as pd
import os
path = r"C:/Users/St/Documents/House/m2"
dirs = os.listdir(path)
for file in dirs:
if file.endswith("J.csv"):
J = file
if len(J) is 12: #some filenames are 12 chars others 11
jroot = J[:7]
else:
jroot = J[:6]
for file in dirs:
if file.endswith("2.csv"):
W = file
if len(W) is 12:
root2 = W[:7]
else:
root2 = W[:6]
JJ = os.path.join(path, J)
WW = os.path.join(path, W)
if jroot == root2: # if the first 7 (or 6) characters match, then merge
JW = pd.merge(JJ, WW, on="JN")
In associated with the above pd.merge function call, I am getting this error:
TypeError: string indices must be integers, not str
I am wondering how to make it read my filename string as an actual file or dataframe. JJ and WW are the equivalent to full paths when printed out. I tried make these 'filenames' dataFrames using pd.DataFrame
but wasn't able to do so.
You cannot
merge
two strings. I think you're confused about whatos.path.join
returns. It returns a string. You have to actually read in theDataFrame
s from the files namedJJ
andWW
, then perform themerge
.Here's a full example of writing 2
DataFrame
s, reading them back withread_csv
and then merging them on a columngroup
:Couple of other things:
Don't use
is
to compare objects for equality, use==
. Only in the case of small integers will this work reliably, and even then you shouldn't rely on it because that's an implementation detail of CPython.Instead of checking the file name with
str.endswith
, just iterate over what you want by first globbing: