How to iterate over dic and list

86 Views Asked by At

I've a folder with a bunch of .SQL files that I want match against an extraction of ALL_TAB_COLUMNS (only TABLE_NAME and COLUMN_NAME columns that are on column_list.csv).

My objective is to have a relation of tables/columns used for each report, generating an output like:
sql_file_name.sql | table_name | column_name

It seems that when I moved from second to third nested loop the variables got 'out of reach', Python (2.7.13) treats them as new:

import os
import csv

path = "./search_files"
folder = os.listdir(path)

search_dict = csv.DictReader(open("column_list.csv"))

for files in folder:
    files = os.path.join(path, files)
    file = open(files, "r")
    sql = file.readlines()

    sql = [x.strip() for x in sql] # Remove \n

    hit_count = 0

    for i in sql:
        i = i.upper() # Make SQL code UPPERCASE
        print str(hit_count) + ": " + i # i has values here
        hit_count = hit_count + 1

        for row in search_dict:
            print str(hit_count) + ": " + i # i is blank here
            hit_count = hit_count + 1

            if row['TABLE_NAME'] in i or row['COLUMN_NAME'] in i:
                print row['TABLE_NAME'] + " | " + row['COLUMN_NAME']
1

There are 1 best solutions below

1
On BEST ANSWER

Your issue is that search_dict is an iterator, and you're trying to iterate on it several times. That doesn't work. After the first time you reach the end of the input file, the iterator will remain empty forever.

There are a few ways you can fix that. The easiest is probably to dump the values from the iterator into a list:

search_dict = list(csv.DictReader(open("column_list.csv")))

The only real downside to this approach is that it might use a lot of memory if your CSV file is huge.

Alternatively, you could "cheat" a bit with the iterator protocol. Normally an iterator that has raised a StopIteration exception should continue to do so forever. But files are iterators that let you cheat a bit. If you seek the file back to the start, you can iterate over its contents again. This may let you continue iterating other iterators that are taking their inputs from the file you rewound. There is no guarantee that this will always work. It does for csv.reader and csv.DictReader, but wouldn't work for a generator function written in pure Python.

So here's one way you could adapt your code:

search_file = open("column_list.csv")        # keep a reference to this file for later
search_dict = csv.DictReader(search_file)

for files in folder:
    #...
    for i in sql:
        #...

        search_file.seek(0)                  # ensure we're at the start of the file
        for row in search_dict:
            #...

Another approach would be to simply reopen the file each time you start the inner loop.

Another suggestion unrelated to your looping issue: You should probably be a bit more careful about closing files. This is pretty easy to do in Python if you use the with statement. Rather than doing something like var = open(filename), use with open(filename) as var: and indent all the code that uses the file into a block under it. The file will be closed automatically when you exit the indented block (even if you exit due to an exception). My examples above kept your current style of opening files, but if I was writing them for my own code, I'd write:

with open("column_list.csv") as search_file:
    search_dict = list(csv.DictReader(search file))

# rest of the code is not indented extra

Or for the seek(0) version:

with open("column_list.csv") as search_file:   # the whole code is indented under this block
    search_dict = csv.DictReader(search_file)

    for files in folder:    # files is a really misleading variable name
        files = os.path.join(path, files)
        with open(files, "r") as file:
            sql = file.readlines()  # only a single line needs to be indented extra this time

        #...

And one final suggestion: Use better variable names. Part of the confusing from this issue was due to the name search_dict. That name makes it sound like it should contain a single dictionary. But that's not what it actually was in the code (it was actually a csv.DictReader instance, which you were using as an iterator). Similarly, your files variable (created by the outermost for loop) is also misleading. It conatins a single filename, not several files (which the plural name suggests).