The problem is as the question says. Using AWS athena server with pyathena engine.
Here is some context. I am basically filtering on a table containing all the bookings, to include only most recent ones based on an earliest_history date (anything earlier than that will be dropped).
I need two columns after this, the date and the customer id.
There is another table mapping customer id to some other data, that I need to join with the previous result. Here is how I am implementing it in sql alchemy ORM.
# SELECT date, customer_id FROM bookings
query: Query = session.query(func.date(bookings.columns[TIMESTAMP]).label(name=TIMESTAMP),
bookings.columns[CUSTOMER_ID])
# Alias as I need to get some filter
all_history: Alias = cast(typ=Alias, val=aliased(element=query.subquery()))
# noinspection PyTypeChecker
# SELECT * FROM all_history WHERE timestamp>=earliest_history
query = session.query(all_history).filter(
all_history.columns[TIMESTAMP] >= func.date(earliest_history)).distinct()
# noinspection PyTypeChecker
customer_results: Alias = aliased(element=query.subquery())
mapping_table: str = config.get(section='db', option='MAPPING_TABLE', raw=True)
mappings: Table = metadata.tables[mapping_table]
# noinspection PyTypeChecker
# Inner join customer results with the mappings table
join = customer_results.join(right=mappings,
onclause=mappings.columns[CUSTOMER_ID_MAPPING] == customer_results.columns[CUSTOMER_ID])
query=session.query(join)
raw_query: str = str(query.statement.compile(compile_kwargs={'literal_binds': True}))
print(raw_query)
The problem is, this is the raw_query (formatting and comments mine)
SELECT
anon_1.booking_dt,
anon_1.cust_id,
customer_mapping.ustb_customer_id,
customer_mapping.ustb_customer_create_time,
customer_mapping.esc_customer_id,
customer_mapping.esc_customer_create_time
FROM
(
SELECT
DISTINCT anon_2.booking_dt AS booking_dt,
anon_2.cust_id AS cust_id
FROM
(
SELECT
date(esc_booking.booking_dt) AS booking_dt,
esc_booking.cust_id AS cust_id
FROM
esc_booking
) AS anon_2
WHERE
anon_2.booking_dt >= date('2023-11-29')
) AS anon_1, -- Should I expect the join clause here?
customer_mapping; -- What does it mean?
Basically, it seems alchemy is totally removing my join clause for some reason. How to get sql alchemy to really perform the join?
Related, is the filter method in SQL Alchemy an ORM counterpart of SELECT * WHERE or of JOIN?