How to use Clickhouse Token Bloomfilters with Arrays

302 Views Asked by At

How can I leverage bloomfilters on array(string) in Clickhouse.

For example, I have this data structure:

create table test_schema.test_table (
partition_time            DateTime64(9
,keys                     Array(String)
) ENGINE = ReplicatedReplacingMergeTree('some/path', '{replica}')
PARTITION BY toYYYYMMDD(partition_time)
TTL toDateTime(partition_time) + toIntervalDay(1)
SETTINGS index_granularity = 8192;

and I build a token bloomfilter on keys

alter table test_schema.test_table INDEX test_bf_idx `keys` TYPE tokenbf_v1(4096, 2, 0) GRANULARITY 1;

When I use the condition has(keys,'some_key'), I can see it leveraging the bloom filter. But I could not use the bloomfilter for like queries. I have tried array filter and array join, but still it is not using bloomfilter:

where notEmpty(arrayFilter(x -> x LIKE '%some_key%', keys) )

and

select * from 
(select arrayJoin(keys) as val from schema.table)
where val like '%some_key%'

Can someone show me how I can use the token bloomfilter built on an array for LIKE statements, (i.e. use bloomfilter for like statement on an array)

1

There are 1 best solutions below

1
Denny Crane On BEST ANSWER

maybe an index over arrayStringConcat ?

create table test (
    x Int64,
    k Array(String),
    INDEX test_bf_idx arrayStringConcat(k, ',') TYPE tokenbf_v1(4096, 2, 0) GRANULARITY 1
) ENGINE = MergeTree
order by x as 
select number, ['a', 'b'] k
from numbers(1e6);

select * from test
where hasToken( arrayStringConcat(k, ',') , 'xxx')