I've a folder with a bunch of .SQL
files that I want match against an extraction of ALL_TAB_COLUMNS
(only TABLE_NAME
and COLUMN_NAME
columns that are on column_list.csv
).
My objective is to have a relation of tables/columns used for each report, generating an output like:
sql_file_name.sql | table_name | column_name
It seems that when I moved from second to third nested loop the variables got 'out of reach', Python (2.7.13)
treats them as new:
import os
import csv
path = "./search_files"
folder = os.listdir(path)
search_dict = csv.DictReader(open("column_list.csv"))
for files in folder:
files = os.path.join(path, files)
file = open(files, "r")
sql = file.readlines()
sql = [x.strip() for x in sql] # Remove \n
hit_count = 0
for i in sql:
i = i.upper() # Make SQL code UPPERCASE
print str(hit_count) + ": " + i # i has values here
hit_count = hit_count + 1
for row in search_dict:
print str(hit_count) + ": " + i # i is blank here
hit_count = hit_count + 1
if row['TABLE_NAME'] in i or row['COLUMN_NAME'] in i:
print row['TABLE_NAME'] + " | " + row['COLUMN_NAME']
Your issue is that
search_dict
is an iterator, and you're trying to iterate on it several times. That doesn't work. After the first time you reach the end of the input file, the iterator will remain empty forever.There are a few ways you can fix that. The easiest is probably to dump the values from the iterator into a list:
The only real downside to this approach is that it might use a lot of memory if your CSV file is huge.
Alternatively, you could "cheat" a bit with the iterator protocol. Normally an iterator that has raised a
StopIteration
exception should continue to do so forever. But files are iterators that let you cheat a bit. If youseek
the file back to the start, you can iterate over its contents again. This may let you continue iterating other iterators that are taking their inputs from the file you rewound. There is no guarantee that this will always work. It does forcsv.reader
andcsv.DictReader
, but wouldn't work for a generator function written in pure Python.So here's one way you could adapt your code:
Another approach would be to simply reopen the file each time you start the inner loop.
Another suggestion unrelated to your looping issue: You should probably be a bit more careful about closing files. This is pretty easy to do in Python if you use the
with
statement. Rather than doing something likevar = open(filename)
, usewith open(filename) as var:
and indent all the code that uses the file into a block under it. The file will be closed automatically when you exit the indented block (even if you exit due to an exception). My examples above kept your current style of opening files, but if I was writing them for my own code, I'd write:Or for the
seek(0)
version:And one final suggestion: Use better variable names. Part of the confusing from this issue was due to the name
search_dict
. That name makes it sound like it should contain a single dictionary. But that's not what it actually was in the code (it was actually acsv.DictReader
instance, which you were using as an iterator). Similarly, yourfiles
variable (created by the outermostfor
loop) is also misleading. It conatins a single filename, not several files (which the plural name suggests).