I have a Customer
, Travel
topics. I want a analysis that show customer first travel travelid
in the table.
for e.g.
create table customer_first_travel as
select t.custid custid, earliest_by_offset(travelid) travelid
from stream_travel t
join table_customer c on t.custid = c.custid
group by t.custid;
For the problem is, if the topic over the retention period
, will the earliest travelid
changed? As travelId is not PK at this, how can i tell the travelid has been deleted?
Yes. But this is not a problem if your consumer is not 7 days behind the data being produced. (Also, this assumes that data is actually deleted on the 7th day, but Kafka can retain data longer if there are no closed log segments...)
If you have a low-lag consumer (i.e. building a table), then data is retained on a completely different, compacted topic.
If the stream being consumed has a null-value event for a matching key in the table, it'll automatically be deleted from the table. There will be no "notification" for this, other than the event itself.