How to get aggregate row count for sharded tables in citus postgres

678 Views Asked by At

Requirement is to get sum of reltuples for all distributed tables in Citus Postgres database with a certain format which is explained below.

When i run the below command, the query propagates to workers and depending on the shard counts on each worker, results are fetched for each table:

SELECT result from run_command_on_workers( $cmd$ select json_agg(json_build_object(reltuples, relname)) from pg_class c join pg_catalog.pg_namespace n on n.oid=c.relnamespace where n.nspname not in ('citus', 'pg_toast', 'pg_catalog')

For ex, the results consists of <row_count, table_name_and_shardid> {10, table_A_shardid0}, {20,table_A_shardid1}, {15, table_B_shardid0} Is it possible to combine counts from table_A_shardid0 and table_A_shardid1 so that the result is {30, table_A}

1

There are 1 best solutions below

0
On

You can use run_command_on_shards to query on all shards of a table. This way you can get all reltuple values from shards and combine them later in the coordinator to get your desired format. You can learn more about the command here at the Citus Official Documentation

The catalog table pg_dist_partition stores metadata about which tables in the database are distributed.

Using what I have explained above, the query can look something like this:

WITH shards_reltuples AS (
  SELECT logicalrelid,
  (run_command_on_shards(logicalrelid,
     $$ SELECT reltuples FROM pg_class WHERE oid = '%s'::regclass::oid $$ 
  )).result::real AS reltuples
  FROM pg_dist_partition)

SELECT logicalrelid, sum(reltuples) AS reltuples
FROM shards_reltuples GROUP BY logicalrelid;

You can elaborate on top of it to get your desired json format.