Convert a dictionary of lists to two-column csv

55 Views Asked by At

I have a dictionary of lists as follows:

{'banana': [1,2],
 'monkey': [5],
 'cow': [1,5,0],
 ...}

I want to write a csv that contains one number and word as follows:

1 | banana
2 | banana
5 | monkey
1 | cow
5 | cow
0 | cow
...

with | as the delimiter.

I tried to convert it to a list of tuples, and write it as follows:

for k, v in dic.items():
    for ID in v: 
        rv.append((ID, k))

with open(index_filename,'wb') as out:
    csv_out=csv.writer(out, delimiter='|')
    csv_out.writerow(['identifier','descriptor'])
    for row in rv:
        csv_out.writerow(row)

but ran this error:

a bytes-like object is required, not 'str'

Is there a more efficient way of doing this than converting to a tuple, and if not, what's wrong with my code?

Thanks.

2

There are 2 best solutions below

0
On

You are opening the file in binary/bytes mode, which is specified by the "b" in "wb". This is something many people did in the python2 days, when "str" and "bytes" was the same thing, so many older books still teach it this way.

If you open a file in bytes mode, you must write bytes to it, not strings. A str can be converted to bytes with the str.encode() method:

f.write(some_str_variable.encode()

However, what you probably want instead is to not open the file in bytes mode.

with open(index_filename, 'w') as out:
    ...
0
On

If you want to make your code more efficient, it is important, that you state with respect to what you want to make it more efficient. Besides terrible solutions, there is often a trade-off between space (memory) and time (cycles, functions calls) among the reasonable solutions.

Aside from efficiency, you should also take readability and maintainability into account. Before doing any kind of optimizations.

Tuples like dicts in Python are very efficient, because they are used internally all over place. Most function calls in Python involve tuple creation (for positional arguments) under the hood.

As to your concrete example, you can use a generator expression to avoid the temporary list:

entries = ((k, v) for k, l in dic.items() for v in l)

You still have the intermediate tuples, but they are computed on the fly, while you iterate over the dictionary items. This solution would be more memory efficient than an explicit list, especially if you have lots of entries.

You could also just put the nested loop directly into the with body:

with open(index_filename,'wb') as out:
    csv_out=csv.writer(out, delimiter='|')
    csv_out.writerow(['identifier','descriptor'])
    for k, v in dic.items():
        for ID in v: 
            csv_out.writerow((k, ID))

To avoid the repeated function calls to writerow, you could also resort to writerows, which might be faster.

with open(index_filename,'wb') as out:
    csv_out=csv.writer(out, delimiter='|')
    csv_out.writerow(['identifier','descriptor'])
    csv_out.writerows((k, v) for k, l in dic.items() for v in l)

If you are really interested in, which method is the fastest, you can use Python's timeit module to make measurements.