If I have the following code:
import awswrangler
#df = some dataframe with year, date and other columns
wr.s3.to_parquet(
df=df,
path=f's3://some/path/',
index=False,
dataset=True,
mode="append",
partition_cols=['year', 'date'],
database=f'series_data',
table=f'signal_data'
)
What exactly is happening when database and table are specified? I know that the table will be created (if it is not), but are Glue Crawlers run or something?
Should I use database and table only the first time I run this piece of code, or I can leave it like that (will it run any Crawlers or processes that may cause additional AWS charge?)
For example, if a new partition appears (a new date), how will the table understand the new partition? Usually, this is done when a Glue crawler is run to find a new partition.