(UPDATED 3/10)
Based on this duckdb docs page on profiling, I would have thought that my code snippet below should save a json file of profiling/timing stats to a query_profile.json, which I should be able to use to generate an html file with python -m duckdb.query_graph query_profile.json
However, my code below (reproducable as it just hits a public s3 bucket, though you'll need your own aws creds in your own .env file) does not produce such a query_profile.json file:
import duckdb
import s3fs
from dotenv import dotenv_values
# load environment variables from .env file
ENV = dotenv_values(".env")
# Configurable query params
TAXI_COLOR = "yellow"
YEAR = 2023
PROFILE = True
# where to save result (data) locally
dbfile = 'taxi_data.duckdb'
# where to save profiling results
profile_file = 'query_profile.json'
# Define the S3 glob pattern to match the desired parquet files
s3_glob_path = f"s3://nyc-tlc/trip data/{TAXI_COLOR}_tripdata_{YEAR}*.parquet"
# query the s3 parquet data using duckdb
with duckdb.connect(database=dbfile) as con:
# load extension required for reading from s3
con.execute("INSTALL 'httpfs';")
con.execute("LOAD 'httpfs';")
# Set the AWS credentials to access the S3 bucket
con.execute("SET s3_region='us-east-1';")
con.execute(f"SET s3_access_key_id = '{ENV['AWS_ACCESS_KEY_ID']}';")
con.execute(f"SET s3_secret_access_key = '{ENV['AWS_SECRET_ACCESS_KEY']}';")
# Enable profiling and save the profiling results directly to a file
con.execute(f"SET profiling_output='{profile_file}'")
con.execute("SET profiling_mode='detailed'")
# Execute the query to load and save the data directly to the specified DuckDB file
tablename = f'{TAXI_COLOR}_tripdata_{YEAR}'
ea = "EXPLAIN ANALYZE " if PROFILE else ""
query = f"""{ea}CREATE OR REPLACE TABLE {tablename} AS
SELECT * FROM read_parquet(['{s3_glob_path}'])
"""
print(query)
con.execute(query)
print(f"Data saved to {dbfile} as {tablename}")
print(f"Profiling results saved to {profile_file}")
I think the issue here is that when you run
DuckDB will output the profiling information for
SELECT/UPDATE/DELETEstatements into the file. If you run anEXPLAINorEXPLAIN ANALYZEquery, the profiling information will not be written to the desired file, but will rather show up in the result. If you run a straightSELECT/UPDATE/DELETEquery, then the profiling information is written to the file. and the result is the result of the query.Let me know if that helps.