Redshift slow sub query (if it includes certain columns)

619 Views Asked by Morten Petersen At 27 June 2025 at 23:55

I have a query as follows:

SELECT
    e.event_id,
    (
        SELECT
            event_id
        FROM atomic.events
        WHERE
            domain_userid = e.domain_userid
        ORDER BY collector_tstamp
        LIMIT 1
    ) AS parent_event_id
FROM snowplow_intermediary.events_enriched e
LIMIT 1

I'm trying to find the first event for each user. This is fairly fast ~5s. If I'm trying to find the user by user_ipaddress instead of domain_userid it's way slower. After 300s it's not done.

SELECT
    e.event_id,
    (
        SELECT
            event_id
        FROM atomic.events
        WHERE
            user_ipaddress = e.user_ipaddress
        ORDER BY collector_tstamp
        LIMIT 1
    ) AS parent_event_id
FROM snowplow_intermediary.events_enriched e
LIMIT 1

The data type is domain_userid varchar(36) encode runlength and user_ipaddress varchar(45) encode runlength.

Here's EXPLAIN's for the queries:

https://gist.github.com/mortenstarfly/4ce3be9b3a19aac2601a

https://gist.github.com/mortenstarfly/2008b0f737259df30695

I really would like to speed up the second query. Any suggestions?

Original Q&A

There are 1 best solutions below

Subhajit Podder On 02 July 2017 at 11:18

It could be because of your short key..if your data is sorted based on user id then data will be retrieved fast(for the first query) as redshifts will know in which segment your data is residing (based on zone map) and can skip lot of slices and your io will be significantly low.

Redshift slow sub query (if it includes certain columns)

There are 1 best solutions below

Related Questions in SQL

Related Questions in AMAZON-REDSHIFT

Related Questions in SNOWPLOW

Trending Questions

Popular # Hahtags

Popular Questions