huge amount of csv, using sqlite in python

71 Views Asked by At

I have more than 1000 CSV files (let's call them SECONDARY files), with the same structure (compressed size varies from 400 KB to 3531 KB)

The SECONDARY files' names correspond to the key that joins them to the MAIN CSV document that has another type of info.

I need to perform some functions on several columns of the SECONDARY files (some values are repeated throughout the files and need to be grouped by).

Later, I will need to get some information from the SECONDARY files taking into account some info from the MAIN file.

I tried:

import pandas as pd
import sqlite3
import os
import re

# Set the directory where your CSV files are located
csv_directory = r"path"

# Connect to SQLite database (or create a new one if it doesn't exist)
conn = sqlite3.connect("db.db")
cursor = conn.cursor()

# Get a list of all CSV files in the directory
csv_files = [file for file in os.listdir(csv_directory) if file.endswith(".csv")]

# Define the master table name
master_bus = "master_table"
# Replace or remove invalid characters from the master table name
master_bus = re.sub(r'\W+', '', master_bus)

# loop through each CSV file and create a table in the database
for csv_file in csv_files:
    table_name = os.path.splitext(csv_file)[0] 
    csv_path = os.path.join(csv_directory, csv_file)

# Read CSV file into a pandas DataFrame
    df = pd.read_csv(csv_path)

# Add a new column with the filename
    df['filename'] = os.path.basename(csv_file)

# Write the DataFrame to SQLite
    df.to_sql(table_name, conn, index=False)

# Concatenate all tables into the master table
tables = cursor.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
tables = [table[0] for table in tables]

# Create the master table
cursor.execute(f"CREATE TABLE '{master_bus}' AS SELECT * FROM {tables[0]} WHERE 0;")

# Concatenate data from all tables into the master table
for table in tables:
    cursor.execute(f"INSERT INTO {master_bus} SELECT * FROM {table};")

# Commit changes and close connection
conn.commit()
conn.close()

It kind of worked. I could create a DB with a table for each csv file with another code. But when I tried to add the name of the file to a new column, and create a master table this happens:

OperationalError Traceback (most recent call last)

c:\Users\acuna\Desktop\Computer Science\Project_Bus.ipynb Cell 17 line 4

37 tables = [table[0] for table in tables]

39 # Create the master table

---> 40 cursor.execute(f"CREATE TABLE '{master_bus}' AS SELECT * FROM {tables[0]} WHERE 0;")

42 # Concatenate data from all tables into the master table

43 for table in tables:

OperationalError: near "-": syntax error
1

There are 1 best solutions below

0
On

The error in your code is is the invalid characters in the table name that you are trying to create. In SQLite, table names cannot contain special characters such as -, so you need to remove or replace them before creating the table.

master_bus = "master_table"

master_bus = re.sub(r'\W+', '', master_bus)

cursor.execute(f"CREATE TABLE {master_bus} AS SELECT * FROM {tables[0]} WHERE `0;")`