I'm trying to create a kudu table partitioned by hash and by range with 2 variables (year, month), My problem is that I want to make biannual range partitions, without add more columns in the table.
In the link:
https://kudu.apache.org/docs/kudu_impala_integration.html
In section Specifying Tablet Partitioning
They propouse range partition by one column, but below we can read:
If you have multiple primary key columns, you can specify partition bounds using tuple syntax: ('va',1), ('ab',2). The expression must be valid JSON.
Then, I tried with the next query:
CREATE TABLE pruebas.partwithrang (
year int COMMENT 'año',
month int COMMENT 'mes',
day int COMMENT 'dia',
id string COMMENT 'id',
name string COMMENT 'nombre',
PRIMARY KEY (year, month, day, id))
PARTITION BY HASH (id) PARTITIONS 3,
RANGE(year, month) (
PARTITION (2020, 1) <= VALUES <= (2020, 6),
PARTITION (2020, 7) <= VALUES <= (2020, 12),
PARTITION (2021, 1) <= VALUES <= (2021, 6),
PARTITION (2021, 7) <= VALUES <= (2021, 12)
)
COMMENT "Probando particion por rango"
STORED AS KUDU tblproperties ('kudu.master_addresses'='localhost:7051', 'kudu.num_tablet_replicas'='1')
But I recieved next error:
ERROR: AnalysisException: Syntax error in line 10:
PARTITION (2020, 1) <= VALUES <= (2020, 7),
' ^
Encountered: COMMA
Expected: AND, BETWEEN, DIV, ILIKE, IN, IREGEXP, IS, LIKE, NOT, OR, REGEXP, RLIKE CAUSED BY: Exception: Syntax error
I didn't found any information about this kind of range partitions. Could you help me please?
I review too this link: https://docs.cloudera.com/documentation/enterprise/5-12-x/topics/impala_create_table.html
In section kudu_partition_clause
I'm not sure but I understand that just this 2 ways are possible to define a range partition:
PARTITION constant_expression range_comparison_operator VALUES range_comparison_operator constant_expression
|
PARTITION VALUE = constant_expression_or_tuple
Then, it is possible the way of partitioning that I'm suggesting?
Thanks you!!!
The problem is there are a bug in impala https://issues.apache.org/jira/browse/IMPALA-6929 and you can't use '<=' or '<'
A workaround could be using the '=' instead of '<=' or '<', for example: PARTITION VALUE = (2020, 12),
In CDH 6.x the problems is fix it (Apache Impala 3.0.0)