aggregate in MongoDB group on date-exceeding timeframe

164 Views Asked by At

I have scheduled tasks(parent) which start in the evening and runs an amount of jobs(children) which can be successful or not. I want to know the success rate for each executed task

Each document of collection jobs_per_task_per_day should contain

{
    "_id" : {
        "task" : ObjectId("52b31753d925d5b057000001"),
        "year" : 2013,
        "month" : 12,
        "day" : 22
    },
        "successful" : 12,
        "unsuccessful" : 0,
        "task" : ObjectId("52b31753d925d5b057000001")
}

However, in my current aggregation script, I group per date. This gives an inaccuracy since the execution of an task can endure the whole night. a job started at 01-02-2014 02:00 is part of the task started at 01-01-2014. not 01-02-2014.

If the job ran after midnight but before 12:00 afternoon the next day, it should be counted to the document of a day earlier. The task itself does not have a date, since it is executed repeatedly with cron.

A Job:

 {
    "_id" : ObjectId("52b3751dd925d507db013f0e"),
    "started_at" : ISODate("2013-12-19T22:37:17.900Z"),
    "successful" : true,
    "task_id" : ObjectId("52b31f8bd925d5b057000005")
     }

A scheduled task:

{
    "_id" : ObjectId("52b31f8bd925d5b057000005"),
    "schedule" : {
        "_id" : ObjectId("52b31784d925d5b057000002"),
        "cron" : "30 21 * * *",
        "name" : "Foo Import",
        "worker_class" : "TaskWorker"
    }
}

My current script:

    var scheduled_tasks = db.tasks.find({
    schedule: {
      $exists: true
    }
  }).toArray()

scheduled_tasks.forEach(function(task) {
      var results_total = db.jobs.aggregate({
          $match: {
            task_id: task._id
          }
        },

        {
          $group: {
            _id: {
              year: {
                $year: "$started_at"
              },
              month: {
                $month: "$started_at"
              },
              day: {
                $dayOfMonth: "$started_at"
              },
              task: "$task_id"
            },

            successful: {
              $sum: {
                $cond: ["$successful", 0, 1]
              }
            },
            unsuccessful: {
              $sum: {
                $cond: ["$successful", 1, 0]
              }
            },
            task: {
              $first: "$task_id"
            },

          }
        }, {
          $project: {
            successful: 1,
            unsuccessful: 1,
            task: 1
          }
        }, {
          $sort: {
            date: -1
          }
        }
      );
      db.jobs_per_task_per_day.insert(results_total.result)
    });

to group on timespan instead of date i tried to change it to:

 $group: { 

  _id: {
    task: "$task_id",
    year: { $year: "$started_at" },
    month: { $month: "$started_at" },
    day: { 
      $cond: [
        {$lt: [{$hour: "$started_at"}, 12]},
        {$dayOfMonth: "$started_at"} -1, 
        {$dayOfMonth: "$started_at"}
      ]
    }      
  }

However this did not return the desired result, and i'm doubting if it should be in the group operator.

Any help on this would really be appreciated!

0

There are 0 best solutions below