Time based directory structure Apache Drill

248 Views Asked by TriCore At 17 August 2025 at 21:36

I have CSV files organized by date and time as follows

logs/YYYY/MM/DD/CSV files...

I have setup Apache Drill to execute SQL queries on top of these CSV files. Since there are many CSV files; the organization of the files can be utilized to optimize the performance. For example,

SELECT * from data where trans>='20170101' AND trans<'20170102';

In this SQL, the directory logs/2017/01/01 should be scanned for data. Is there a way to let Apache Drill do optimization based on this directory structure? Is it possible to do this in Hive, Impala or any other tool?

Please note:

SQL queries will almost always contain the time frame.
Number of CSV files in a given directory is not huge. Combined all years worth of data, it will be huge
There is a field called 'trans' in every CSV file, which contains the date and time.
The CSV file is put under appropriate directory based on the value of 'trans' field.
CSV files do not follow any schema. Columns may or may not be different.

Original Q&A

There are 1 best solutions below

InfamousCoconut On 16 September 2017 at 07:20

Querying using column inside the data file would not help in partition pruning.

You can use dir* variables in Drill to refer to partitions in table.

create view trans_logs_view as 
select
 `dir0` as `tran_year`,
 `dir1` as `trans_month`,
 `dir2` as `tran_date`, * from dfs.`/data/logs`;

You can query using tran_year,tran_month and tran_date columns for partition pruning.

Also see if below query helps for pruning.

select count(1)  from dfs.`/data/logs` 
where concat(`dir0`,`dir1`,`dir2`) between '20170101' AND '20170102';

If so , you can define view by aliasing concat(dir0,dir1,dir2) to trans column name and query.

See below for more details.

https://drill.apache.org/docs/how-to-partition-data/

Time based directory structure Apache Drill

There are 1 best solutions below

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in HIVE

Related Questions in APACHE-DRILL

Related Questions in PRESTO

Trending Questions

Popular # Hahtags

Popular Questions