Elasticsearch: sum of operations over date_range bucket

147 Views Asked by At

there is something I have to do with ES (1.7) and I'm not sure how to do it. Let me see if I can show it with sql and a table, maybe it's easier to translate to ES query. Imagine a table points_by_date

DATE    PURCHASE_COUNT  ACCUMULATED_POINTS
day 1         5              548
day 2         8              498
day 3         9              623
day 4         9              635
day 5        13              620

if it was a table, I'd do a query like this to retrieve what I need:

SELECT SUM(q.AVG_POINT_PURCHASE) as POINT_BY_PURCHASE
FROM (
   SELECT (ACCUMULATED_POINTS / PURCHASE_COUNT) as AVG_POINT_PURCHASE
   FROM points_by_date
) q;

Is it possible to do this in ES? I need to add that these operations are done over buckets of date range, something like this:

{
  "timeout": 1500,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "date": {
              "from": "2016-12-01",
              "to": "2016-12-05",
              "include_lower": true,
              "include_upper": true
            }
          }
        }
      ]
    }
  },
  "aggregations": {
    "my_ranges": {
      "date_range": {
        "field": "date",
        "ranges": [
          {
            "from": "2016-12-01",
            "to": "2016-12-05"
          },
          {
            "from": "2016-12-06",
            "to": "2016-12-10"
          }
        ]
      },
      "aggregations": {
        "TOTAL_POINTS" : {
          "sum" : {
            "field" : "ACCUMULATED_POINTS"
          }
        },
        "PURCHASE_COUNT" : {
          "avg" : {
            "field" : "PURCHASE_COUNT"
          }
        }
      }
    }
  }
}

Each date range, would represent a set of N row in the table, from where I need to do that math.

The ES query has a previous approach, where I was getting the average purchase_count per bucket, and dividing later the sum of the accumulated_points of the bucket by that average. Now I need to divide first, then accumulate per bucket. Any idea if possible and how?

Thanks in advance!

1

There are 1 best solutions below

0
On

Just in case someone needs it or something similar, I managed to this:

{
  "timeout": 1500,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "date": {
              "from": "2016-12-01",
              "to": "2016-12-05",
              "include_lower": true,
              "include_upper": true
            }
          }
        }
      ]
    }
  },
  "aggregations": {
    "my_ranges": {
      "date_range": {
        "field": "date",
        "ranges": [
          {
            "from": "2016-12-01",
            "to": "2016-12-05"
          },
          {
            "from": "2016-12-06",
            "to": "2016-12-10"
          }
        ]
      },
      "aggregations": {
        "sum_avg_purchase_point": {
          "sum": {
            "script": {
              "inline": "doc[accumulatedPoints].value.intValue() / (doc[purchaseCount].value.intValue() == 0 ? 1 : doc[purchaseCount].value.intValue())",
              "params": {
                "accumulatedPoints": "ACCUMULATED_POINTS",
                "purchaseCount": "PURCHASE_COUNT"
              }
            }
          }
        }
      }
    }
  }
}

I first tried everything against ES, and once I had it working and doing the calculus I needed, and after verifying the result, I created that query with the Java API.

Hopes it helps some ones! Regards!!!