I have a dataframe that I am outputting to a text file (generates database insert statements). I want to align the column to the left and strip all whitespace from the left and right. I found someone who wrote a function that does this - however, it appears that the resulting outputted file contains a load of padding, whereby the column is padded to the maximum column length:
directory = "/dbfs/mnt/testdata/troubleshooting"
filename = "test.txt"
# function to align the columns in the DataFrame to the left
def align_left(s):
s = s.str.strip()
return s.str.ljust(s.str.len().max())
data = [[",1,'This is a bit of data in this row'"]
,[",2,'This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row. This is a lot of data in this row.'"]
,[",3,NULL"]
,[",4,'Little bit of data'"]
,[",5,'Tiny bit of data'"]
]
schema = ["TextData"]
df = spark.createDataFrame(data, schema)
df_table = df.toPandas()
# use function to align column to left and strip of whitespace
sql_values = df_table.apply(align_left).to_string(header=False, index=False, justify='left')
# remove the characters up to and including the first comma ',' of the string
index = sql_values.find(',')
sql_values = sql_values[index+1:]
sql_values = f"INSERT INTO dbo.table (Id, TextData) \n VALUES({sql_values})"
save_path = os.path.join(directory, filename)
# write contents to file
with open(save_path, "w") as f:
f.write(sql_values)
Resulting file:
The additional spaces more than double the file size. How can I get rid of them? This is what I want the file to look like (all extra spaces removed):

