We will use Oracle Big Data Spatial and Graph. We need to query our distributed graph using PGQL. (The default/given algorithms with PGX are not enough for us.) The graph will use HBase underneath.
The problem is that PGQL only works on a single node of the CDH cluster. You can query a single node at a time but cannot use the entire memory of the cluster. We need a way to query all the nodes, and then aggregate (combine) the results from the nodes and give it to the user.
Is there any way that Presto can help us tackling this problem?
PGQL at this moment does not run in PGX Distributed Engine (PGX.D). Supporting PGQL in PGX.D is on our road map.
Currently if you need to run a distributed query across a cluster, one way is to use the Spark integration that Oracle Big Data Spatial and Graph supports.
Section 5 of the following dev guide is likely going to help. http://docs.oracle.com/cd/E86005_01/BDSPA/using-property-graphs-big-data.htm#BDSPA-GUID-EFECEBBB-6BD6-4A63-B962-DB5AD7EB4C03
Regarding Presto, it seems that it can consume data in Hive (and a few other data sources). So in theory, you can define in Hive a view (external table) that sits on top of the graph data stored in HBase, and then run Presto. This flow needs to be verified and tested though.