How can I troubleshoot why this Python loop won't properly load all files?

63 Views Asked by At

As a personal project, I am trying to load all of the VA pay data into a database. I wrote a code that will loop through a table that stores the state abbreviations and read this to populate the url for each file.

I use panda to read the url into a DF and then insert it into a database.

I was only able to successfully load the first file, but it looks like the loop stops after that.

How can I identify the issue?

#portion of script starts below
states = list()
states = cur.execute('SELECT abbr FROM ref_states')

for st in states:
    test = st[0]
    url = 'https://www.va.gov/OHRM/pay/2023/SSR/'+test+'.txt'

    # Open file as a dataframe
    data = pd.read_csv (url)
    df = pd.DataFrame(data)

    # Insert DataFrame to Table
    for row in df.itertuples():

        #define values that will be inserted
        values = (row[1],row[2],row[3],row[4],row[5],row[6],row[7],row[8],row[9],row[10])

        #print(values)
        cur.execute(insert_query, values)

conn.commit()

I wrote a loop that generate the URL for each file that I want to load.

I plan on looping through each file, read it into a DF, insert it into a DB, and loop onto the next file.

1

There are 1 best solutions below

1
On BEST ANSWER

Here is an example how you can load all CSV from the URL, concatenate all dataframes to one final dataframe and store it under the table my_data:

import sqlite3

import pandas as pd

# create sample memory sqlite DB
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()

create_table_query = """
    CREATE TABLE ref_states (
        abbr TEXT
    )
"""
cursor.execute(create_table_query)
conn.commit()
insert_query = "INSERT INTO ref_states (abbr) VALUES ('AK'), ('AL'), ('AS');"
cursor.execute(insert_query)

# select abbreviation from ref_states
cursor.execute("SELECT abbr FROM ref_states")
rows = cursor.fetchall()

all_dfs = []
for row in rows:
    url = f"https://www.va.gov/OHRM/pay/2023/SSR/{row[0]}.txt"
    df = pd.read_csv(url)

    print(df.head(3))
    print()

    all_dfs.append(df)

# create final dataframe and save it to DB:
final_df = pd.concat(all_dfs)
final_df.to_sql("my_data", conn, if_exists="replace")

# test it:
cursor.execute("SELECT * FROM my_data")
rows = cursor.fetchall()

for row in rows:
    print(row)