GemFire OQL Query - How do I use the count of a SELECT statement in the WHERE clause?

1.1k Views Asked by At

I am trying to query all the IDs of the records from /ExampleRegion. I want to retrieve records if the count of the ID is only 1, so there is only 1 record in the region with that ID.

SELECT COUNT(*), id from /ExampleRegion group by id --> Only if the count for that id is 1.

How can I use COUNT as a Condition in the WHERE Condition?

I have tried the following but it doesn't work:

SELECT * from /ExampleRegion a where (SELECT count(*) as c, b.id from /ExampleRegion b where b.id = a.id and c = 1)

SELECT * from /ExampleRegion a where (SELECT count(*) as c from /ExampleRegion b where b.id = a.id ) = 1

I would think GROUP BY would work, although I still can't seem to find the correct OQL.

Much appreciated.

1

There are 1 best solutions below

2
On BEST ANSWER

NOTE: To help address and (try to) resolve this question, I created a test class along with a simple User application domain model class to put this problem into context.

In short...

Regarding...

"How can I use COUNT as a Condition in the WHERE Condition?"

You cannot use an aggregate OQL query function like count in a predicate of the WHERE clause (as I suspect you already found out), such as:

SELECT x.id, count(*) AS cnt FROM /Users x WHERE count(*) = 1 GROUP BY x.id

This results in the following Exception:

Caused by: org.apache.geode.cache.query.QueryInvalidException: Aggregate functions can not be used as part of the WHERE clause.
    at org.apache.geode.cache.query.internal.QCompiler.checkWhereClauseForAggregates(QCompiler.java:204)
    at org.apache.geode.cache.query.internal.QCompiler.checkWhereClauseForAggregates(QCompiler.java:214)
    at org.apache.geode.cache.query.internal.QCompiler.select(QCompiler.java:260)
...

Additionally, and unfortunately, the following OQL query:

SELECT x.id, count(*) AS cnt FROM /Users x WHERE cnt = 1 GROUP BY x.id

Returns no results!

The opposite OQL query used to find duplicates also returns no results:

SELECT x.id, count(*) AS cnt FROM /Users x WHERE cnt = 1 GROUP BY x.id

Although, I am not entirely certain why, I suspect it is due to the same limitation as the first OQL query above where the count aggregate function was used in an OQL query predicate inside the WHERE clause, except this later form is less informative (e.g. like I suspect it might be eating an Exception somewhere since, according to GemFire, the OQL query is syntactically correct).

However, if you only care about the IDs then you can simply run a similar OQL query:

SELECT x.id, count(*) AS cnt FROM /Users x GROUP BY x.id

Of course, this OQL query is returning a projection (or GemFire Struct (Javadoc)) that returns a count of all User IDs (duplicate and unique). Clearly, if the count for a User ID is 1, then it is unique, and if it is greater than 1, a duplicate (i.e. not unique).

In detail...

Typically though, users want to get access to the actual object (e.g. User) when the User instance either has a unique ID (in your case) or a duplicate ID. Users do this to perform some operation on the Region entry value (e.g. User) returned by the OQL query, which is particularly common inside Functions used to process a PARTITION Regions in a parallel and distributed fashion.

But, I have to admit, I am bit dumbfounded by not being able to (completely) solve this problem.

I honestly thought this problem should have been solvable with the following GemFire OQL query:

SELECT u 
FROM /Users u, (SELECT DISTINCT x.id AS id, count(*) AS cnt FROM /Users x GROUP BY x.id) v
WHERE v.cnt = 1 
AND u.id = v.id 
ORDER BY u.name ASC

Essentially, this OQL query selects all Users where their ID is unique because they are 1 of a kind.

Strangely, this results in a GemFire QueryInvalidException:

org.springframework.data.gemfire.GemfireQueryException: ; nested exception is org.apache.geode.cache.query.QueryInvalidException: 

    at org.springframework.data.gemfire.GemfireCacheUtils.convertGemfireAccessException(GemfireCacheUtils.java:303)
    at org.springframework.data.gemfire.GemfireCacheUtils.convertQueryExceptions(GemfireCacheUtils.java:325)
    at org.springframework.data.gemfire.GemfireAccessor.convertGemFireQueryException(GemfireAccessor.java:109)
    at org.springframework.data.gemfire.GemfireTemplate.find(GemfireTemplate.java:326)
    at org.springframework.data.gemfire.repository.query.StringBasedGemfireRepositoryQuery.execute(StringBasedGemfireRepositoryQuery.java:159)
    at org.springframework.data.repository.core.support.RepositoryMethodInvoker.doInvoke(RepositoryMethodInvoker.java:135)
    at org.springframework.data.repository.core.support.RepositoryMethodInvoker.invoke(RepositoryMethodInvoker.java:119)
    at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:151)
    at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:130)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
    at io.stackoverflow.questions.apache.geode.query.$Proxy45.findUsersWithDuplicateId(Unknown Source)
    at io.stackoverflow.questions.apache.geode.query.QueryCountEqualToOneIntegrationTests.duplicateCountQueryIsCorrect(QueryCountEqualToOneIntegrationTests.java:112)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74)
    at org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
    at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
    at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
    at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
    at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
    at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
Caused by: org.apache.geode.cache.query.QueryInvalidException: 
    at org.apache.geode.cache.query.internal.DefaultQuery.<init>(DefaultQuery.java:172)
    at org.apache.geode.cache.query.internal.DefaultQueryService.newQuery(DefaultQueryService.java:150)
    at org.springframework.data.gemfire.GemfireTemplate.find(GemfireTemplate.java:313)
    ... 43 more
Caused by: org.apache.geode.cache.query.TypeMismatchException: Exception in evaluating the Collection Expression in getRuntimeIterator() even though the Collection is independent of any RuntimeIterator
    at org.apache.geode.cache.query.internal.CompiledIteratorDef.evaluateCollectionForIndependentIterator(CompiledIteratorDef.java:143)
    at org.apache.geode.cache.query.internal.CompiledIteratorDef.getRuntimeIterator(CompiledIteratorDef.java:117)
    at org.apache.geode.cache.query.internal.CompiledSelect.computeDependencies(CompiledSelect.java:189)
    at org.apache.geode.cache.query.internal.DefaultQuery.<init>(DefaultQuery.java:170)
    ... 45 more
Caused by: java.lang.NullPointerException
    at org.apache.geode.cache.query.internal.CompiledSelect.applyProjectionAndAddToResultSet(CompiledSelect.java:1309)
    at org.apache.geode.cache.query.internal.CompiledSelect.doNestedIterations(CompiledSelect.java:800)
    at org.apache.geode.cache.query.internal.CompiledSelect.doNestedIterations(CompiledSelect.java:844)
    at org.apache.geode.cache.query.internal.CompiledSelect.doIterationEvaluate(CompiledSelect.java:703)
    at org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:426)
    at org.apache.geode.cache.query.internal.CompiledGroupBySelect.evaluate(CompiledGroupBySelect.java:157)
    at org.apache.geode.cache.query.internal.CompiledGroupBySelect.evaluate(CompiledGroupBySelect.java:42)
    at org.apache.geode.cache.query.internal.CompiledIteratorDef.evaluateCollection(CompiledIteratorDef.java:184)
    at org.apache.geode.cache.query.internal.RuntimeIterator.evaluateCollection(RuntimeIterator.java:104)
    at org.apache.geode.cache.query.internal.CompiledIteratorDef.evaluateCollectionForIndependentIterator(CompiledIteratorDef.java:128)
    ... 48 more

There is nothing more blatantly irritating in software to me than NPEs! They are a clear and present programmer error; not a user error!

Seemingly, GemFire is not happy with the nested OQL query declared in the FROM clause, which would in essence create a queryable collection, or intermediate result set used in the outer query (much like a RDBMS temporary table):

TypeMismatchException: Exception in evaluating the Collection Expression in getRuntimeIterator() even though the Collection is independent of any RuntimeIterator

And perhaps, GemFire/Geode is specifically not happy about the "projection" of this nested (temporary) collection, hence the NPE here:

Caused by: java.lang.NullPointerException
    at org.apache.geode.cache.query.internal.CompiledSelect.applyProjectionAndAddToResultSet(CompiledSelect.java:1309)
    at org.apache.geode.cache.query.internal.CompiledSelect.doNestedIterations(CompiledSelect.java:800)

When I look at the affected GemFire/Geode code, the exact condition really makes no sense to me since I tested with a ClientCache using a LOCAL (only) Region. #sigh

Nevertheless, I even tried to test with a peer Cache instance using a PARTITION Region (with PDX enabled (required for PRs actually)) and that led to the same result! #sigh

Given the GemFire query engine is seemingly having trouble in the projection of the nested OQL query (containing the count and GROUP BY clause) I decided to try to provide more information to the query engine in hopes of better inform the query engine about the projected values. So, I created the UserIdCount projection class type and used it in my OQL query like so:

IMPORT io.stackoverflow.questions.spring.geode.app.model.UserIdCount;
SELECT DISTINCT u 
FROM /Users u, (SELECT DISTINCT x.id AS id, count(*) AS cnt FROM /Users x GROUP BY x.id) v TYPE UserIdCount
WHERE v.cnt = 1 
AND u.id = v.id 
ORDER BY u.name ASC

Of course, and unfortunately, this also did not have the intended effect and only led to the following Exception:

java.lang.IllegalArgumentException: element type must be struct

    at org.apache.geode.cache.query.internal.StructSet.setElementType(StructSet.java:365)
    at org.apache.geode.cache.query.internal.CompiledIteratorDef.prepareIteratorDef(CompiledIteratorDef.java:275)
    at org.apache.geode.cache.query.internal.CompiledIteratorDef.evaluateCollection(CompiledIteratorDef.java:200)
    at org.apache.geode.cache.query.internal.RuntimeIterator.evaluateCollection(RuntimeIterator.java:104)
    at org.apache.geode.cache.query.internal.CompiledSelect.doNestedIterations(CompiledSelect.java:813)
...

It seems I am stuck with a GemFire Struct, which you'd think GemFire would know how to process in a nested query when accessing the projection values in the outer query. But, whatever!

I really feel like the NPE is an unintended consequence from GemFire and that GemFire really ought to be able to (and, possibly can) handle this type of OQL query.

So, what are you left with.

Well, as I stated above, if all you care about is the IDs, then you can return all IDs with their counts and iterate the List of Struts to find the IDs with a count of 1.

Of course, if you are ultimately interested in the objects with unique (or perhaps, duplicate) IDs to perform additional processing, then you will need to break this into 2 individual and separate OQL queries, first to get the IDs of interest, and then use those IDs to get the objects/values (e.g. Users) in another query.

I have demonstrated this 2-phase query approach for your use case (i.e. unique IDs) in this test case.

Anyway, I hope this gives you a few options or things to think about.

Cheers!