How can I troubleshoot why this Python loop won't properly load all files?

96 Views Asked by At

As a personal project, I am trying to load all of the VA pay data into a database. I wrote a code that will loop through a table that stores the state abbreviations and read this to populate the url for each file.

I use panda to read the url into a DF and then insert it into a database.

I was only able to successfully load the first file, but it looks like the loop stops after that.

How can I identify the issue?

#portion of script starts below
states = list()
states = cur.execute('SELECT abbr FROM ref_states')

for st in states:
    test = st[0]
    url = 'https://www.va.gov/OHRM/pay/2023/SSR/'+test+'.txt'

    # Open file as a dataframe
    data = pd.read_csv (url)
    df = pd.DataFrame(data)

    # Insert DataFrame to Table
    for row in df.itertuples():

        #define values that will be inserted
        values = (row[1],row[2],row[3],row[4],row[5],row[6],row[7],row[8],row[9],row[10])

        #print(values)
        cur.execute(insert_query, values)

conn.commit()

I wrote a loop that generate the URL for each file that I want to load.

I plan on looping through each file, read it into a DF, insert it into a DB, and loop onto the next file.

1

There are 1 best solutions below

1
Andrej Kesely On BEST ANSWER

Here is an example how you can load all CSV from the URL, concatenate all dataframes to one final dataframe and store it under the table my_data:

import sqlite3

import pandas as pd

# create sample memory sqlite DB
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()

create_table_query = """
    CREATE TABLE ref_states (
        abbr TEXT
    )
"""
cursor.execute(create_table_query)
conn.commit()
insert_query = "INSERT INTO ref_states (abbr) VALUES ('AK'), ('AL'), ('AS');"
cursor.execute(insert_query)

# select abbreviation from ref_states
cursor.execute("SELECT abbr FROM ref_states")
rows = cursor.fetchall()

all_dfs = []
for row in rows:
    url = f"https://www.va.gov/OHRM/pay/2023/SSR/{row[0]}.txt"
    df = pd.read_csv(url)

    print(df.head(3))
    print()

    all_dfs.append(df)

# create final dataframe and save it to DB:
final_df = pd.concat(all_dfs)
final_df.to_sql("my_data", conn, if_exists="replace")

# test it:
cursor.execute("SELECT * FROM my_data")
rows = cursor.fetchall()

for row in rows:
    print(row)