SQL Query to filter TimescaleDB data based on other table

70 Views Asked by At

I am having an issue in creating the optimal query for filtering data based on an other table. My java function queries one table with variable data at a time for the variables that we want to query and exist in it. This is an example of a query without filtering which works well:

SELECT var_id, time_bucket('1 hour', data_time) AS bucket, 
       AVG(data_value) AS avg_value, 
       AVG(data_status) AS avg_status 
FROM data."table_1" 
WHERE var_id IN (1, 2, 3) 
      AND data_time BETWEEN 'start time' AND 'end time' 
GROUP BY bucket, var_id 
ORDER BY bucket, var_id

In some cases, I want to filter this data based on e.g. a variable X with var_id = 10 that is found in table_2 based on X > 0 and variable Y with var_id = 12 that is found in table_3 based on Y < 90. The amount of filters can be 0 to N, and the filtering variables can be in the same table as the non-filtering variables or not, e.g. variable X could be in table_1 as well and Y in table_1 as well or X and Y both in table_2.

The filtering data also has to be in the the 1 hour average format for the filtering to work correctly.

I have not found an optimal solution to this.

My java function creates the queries based on input parameters, but the missing part is that I cannot get the filtering statement optimal. As an example, the java function dynamically first creates the part:

SELECT var_id, time_bucket('1 hour', data_time) AS bucket, 
       AVG(data_value) AS avg_value, 
       AVG(data_status) AS avg_status 
FROM data."table_1" 
WHERE var_id IN (1, 2, 3) 
      AND data_time BETWEEN 'start time' AND 'end time'

And after this I should add my filtering subquery, e.g. var_id = 10 in table_2 > 0 and var_id = 12 in table_3 < 90 for the same time span and using the hourly averages. And then my java function already adds the part:

GROUP BY bucket, var_id 
ORDER BY bucket, var_id

To the end. The part that should create the subquery looks the following way:

if (!conditions.isEmpty()) {
for (Condition condition : conditions) {
String conditionTableName = 
conditionVariableTableMap.get(condition.variable);
Integer conditionVarId = 
conditionVariableIdMap.get(condition.variable);
queryBuilder.append(" AND EXISTS (SELECT 1 FROM data.\"")
.append(conditionTableName)
.append("\" WHERE var_id = ")
.append(conditionVarId)
.append(" AND time_bucket('")
.append(interval)
.append("', data_time) = time_bucket('")
.append(interval)
.append("', ")
.append(tableName)
.append(".data_time) AND data_value ")
.append(condition.operator)
.append(" ")
.append(condition.value)
.append(")");
}
}

Which does not currently work and results in a query such as:

SELECT var_id, time_bucket('1 hour', data_time) AS bucket,
AVG(data_value) AS avg_value, AVG(data_status) AS avg_status 
FROM data."table_1" WHERE var_id IN (1, 2, 3) 
AND data_time BETWEEN '2014-01-01T00:00' AND '2014-01-31T23:59:59' 
AND EXISTS (SELECT 1 FROM data."table_2" WHERE var_id = 4 AND 
time_bucket('1 hour', data_time) = time_bucket('1 hour', table_2.data_time) 
AND data_value > 500.0) GROUP BY bucket, var_id ORDER BY bucket, var_id

What kind of subquery should I create for cases when filtering is used?

1

There are 1 best solutions below

0
On

It seems like you are trying to dynamically generate SQL queries with filtering conditions based on certain variables in different tables. The generated subquery currently has a couple of issues that need to be addressed. Here's a modified version of your subquery creation logic:

if (!conditions.isEmpty()) {
queryBuilder.append(" AND (");
boolean isFirstCondition = true;

for (Condition condition : conditions) {
    String conditionTableName = conditionVariableTableMap.get(condition.variable);
    Integer conditionVarId = conditionVariableIdMap.get(condition.variable);
    
    if (!isFirstCondition) {
        queryBuilder.append(" OR ");
    } else {
        isFirstCondition = false;
    }
    
    queryBuilder.append("(EXISTS (SELECT 1 FROM data.\"")
                .append(conditionTableName)
                .append("\" WHERE var_id = ")
                .append(conditionVarId)
                .append(" AND time_bucket('")
                .append(interval)
                .append("', data_time) = time_bucket('")
                .append(interval)
                .append("', ")
                .append(tableName)
                .append(".data_time) AND data_value ")
                .append(condition.operator)
                .append(" ")
                .append(condition.value)
                .append("))");
}
queryBuilder.append(")");

}

Changes made:

I have made some modifications to the logic that should help you generate a subquery when filtering is needed. To properly group the OR conditions, I added an opening parenthesis before the conditions and a closing parenthesis after them. Additionally, I introduced a boolean variable called

isFirstCondition

to determine whether to add the OR before the condition. This ensures the correct SQL syntax is used.

The generated query will combine multiple conditions with OR, making sure that at least one of the conditions is satisfied. Remember to adjust the logic further based on your specific requirements and SQL dialect.