I have stupid data coming out of a system, it needs to be flattened.
The main csv has these columns: hostname, program_name, version_name
However, there is only one row per host, so the last two fields look like this:
program_name contents:
Word
Excel
Cognos
Mozilla
version contents (not real, just for illustrative purposes):
2.3.2
121.3.0
build 22
What's the best way to ensure things match up and to more concisely and pythonically do this.
Here is what the real code looks like, the above is mainly for demo purposes:
for row in tan_output.programs:
names = row["Name"].splitlines()
versions = row["Version"].splitlines()
if(len(names) != len(versions)):
print("NAME and VERSION from tan_programs are not equal... Exiting")
exit()
else:
for name in names:
#tan_programs.append({"Count": row["Count"], "Hostname": row["Hostname"], "Name": row["Name"], "Version": row["Version"]})
I am stuck on the bottom for loop because I feel like I should be looping thru both lists simultaneously instead of looping thru one, and then what I was going to do, use a counter to reference the second one and form the flattened data.
PS, the file is 7 gigs... so the more efficient the better e.g. if I have to use the counter, I know from experience i += 1 is 100 times more efficient than i = i + 1
Just use the Counter... unless someone has a better idea:
Its actually very fast... the slow part is inserting 8 million records into a DB on another server over the network.