How to use match_recognize to solve the N threshold requirement

88 Views Asked by At

I want to know if there is a way we can do the below task in SQL. Can we do this with match_ recognize?

enter image description here

Above is the input table. I want to group the role for a specific region where No. of employees is less than 6. So, In Africa, we have to combine Analyst, S.Analyst and manager (2+3+9) to overcome the threshold value of 6. Partner and S. Manager must be combined in India (2+7). S. Analyst can be left as it is. We have to drop the USA or add some flag that we can identify using a where clause. So the output should look like below.

enter image description here

3

There are 3 best solutions below

2
Thorsten Kettner On BEST ANSWER

So, I think I've got it now. As mentioned in my original answer, my new approach is to get all group combinations. I am working with IDs, and as your table doesn't seem to have them, my first step is to genrate them. In the same step I am numbering the rows per region, so as to start building combinations with only one row per region in the recursive part to save time. (I will get duplicate combinations, like (1,2,3) and (1,3,2), but at least avoid the additional (2,1,3), (2,3,1), (3,1,2), (3,2,1)).

I am using an ID string to get the combinations. Once a group is complete with at least 6 employees, I start a new group after a semicolon. So, with IDs 1, 2 and 3, I might end up with groups '1,2,3' (all IDs in one group), '1,2;3' (1 and 2 in one group, 3 in another) and many more, because with the recursive query I also generate incomplete groups like '1,2'. I will later find the complete groups by looking at the digits: '1,2,3' and '1,2;3' have three digits, while '1,2' only has two, so it cannot be complete. As a separator I am not using the comma, but a #, which helped me a bit when looking at the generated rows, as ',' and ';' look very similar. I also not only separate the IDs with #, but also have an # at the beginning and the end. This helps with finding an ID in the string. If I look for 1, I look for '#1#' actually, because I don't want to find the 1 in '12' or '21'. And I will find it in a string like '#1#2#;#3#', because the 1 is surrounded by #.

Once I have all combinations, I detect the best combination per region with a rather complicated ranking. At last I join the original rows to the groups and aggregate.

with
  data (id, region, role, employees, rn) as
  (
    select
      row_number() over (order by region, role, employees),
      region, role, employees,
      row_number() over (partition by region order by employees, role)
    from mytable
  ),
  combinations (region, ids, employees) as
  (
    select region, cast('#' || id || '#' as varchar2(400 byte)), employees
    from data
    where rn = 1
    union all
    select 
      g.region,
      case when g.employees >= 6 
          then g.ids || ';#' || d.id || '#'
          else g.ids || d.id || '#'
      end,
      case when g.employees >= 6 
           then d.employees
           else d.employees + g.employees
      end
    from combinations g
    join data d on d.region = g.region
               and g.ids not like '%#' || d.id || '#%'
  ),
  best_combinations (region, ids, employees) as
  (
    select region, ids, employees
    from combinations
    order by rank() over (partition by region -- per region
                          order by regexp_count(ids, '\d') desc, -- all IDs in combinations
                                   case when employees >= 6 then 1 else 2 end, -- no incomplete combinations left if possible 
                                   regexp_count(ids, ';') desc, -- as many combinations as possible
                                   ids) -- in order to pick only one row per region in case of ties
    fetch first row with ties
  ),
  final_groups(region, ids, rest) as
  (
    select region, regexp_substr(ids, '^[^;]+'), regexp_substr(ids, ';(.*$)', 1, 1, 'i', 1)
    from best_combinations
    union all
    select region, regexp_substr(rest, '^[^;]+'), regexp_substr(rest, ';(.*$)', 1, 1, 'i', 1)
    from final_groups
    where rest is not null
  )
select 
  bgs.region,
  listagg(d.role, ', ') within group (order by d.role) as roles,
  sum(d.employees) as employees
from data d
join final_groups bgs on bgs.ids like '%#' || d.id || '#%'
group by bgs.region, bgs.ids
order by bgs.region, bgs.ids;

Demo: https://dbfiddle.uk/a0x1Z9rd

2
MT0 On

You can use MATCH_RECOGNIZE but it does not support LISTAGG so you will have to aggregate afterwards as well:

SELECT region,
       LISTAGG(role, '_') WITHIN GROUP (ORDER BY total_employees) AS roles,
       SUM(employees) AS employees
FROM   table_name
MATCH_RECOGNIZE(
  PARTITION BY region
  ORDER BY employees
  MEASURES
    MATCH_NUMBER() AS mn,
    SUM(employees) AS total_employees
  ALL ROWS PER MATCH
  PATTERN (less_than_6_employees*? at_least_6_employees)
  DEFINE
    less_than_6_employees AS SUM(employees) < 6,
    at_least_6_employees AS SUM(employees) >= 6
)
GROUP BY
       region,
       mn

Which, for the sample data:

CREATE TABLE table_name (region, role, employees) AS
  SELECT 'Africa', 'Role 1',  2 FROM DUAL UNION ALL
  SELECT 'Africa', 'Role 2',  3 FROM DUAL UNION ALL
  SELECT 'Africa', 'Role 3',  9 FROM DUAL UNION ALL
  SELECT 'India',  'Role 1',  2 FROM DUAL UNION ALL
  SELECT 'India',  'Role 2',  7 FROM DUAL UNION ALL
  SELECT 'India',  'Role 3', 10 FROM DUAL UNION ALL
  SELECT 'USA',    'Role 1',  1 FROM DUAL UNION ALL
  SELECT 'USA',    'Role 2',  2 FROM DUAL UNION ALL
  SELECT 'USA',    'Role 3',  2 FROM DUAL;

Outputs:

REGION ROLES EMPLOYEES
India Role 1_Role 2 9
India Role 3 10
Africa Role 1_Role 2_Role 3 14

For a more generic bin-packing solution, if you are looking at different algorithms for fitting employees into bins, you may need to look a something like my bin-packing answer here.

fiddle

0
Thorsten Kettner On

UPDATE AGAIN:

Please see my other answer for a complete solution. I am not sure, whether I will delete this answer later or not, because it shows a valid approach that simply turns out to be too short, but may still be worth the reading.


UPDATE:

I just noticed that the solution below is not complete. The task is more complicated than it seems. My solution (and MT0's solution, too, for that matter) fails on regions with 1, 1, 5, 5 employees, where obviously it must ideally be two groups of 1+5. See China in this demo: https://dbfiddle.uk/60W9QhCG

I don't have the time to think this through at the moment. Maybe just create all possible group combinations, then take the combinations with the fewest but complete groups.


ORIGINAL ANSWER:

This can be done wirh a recursive query. First number the rows, so you can easily move from one to the next. Then add up employees until you get at least six, so as to get the desired groups.

Here is a query that does that:

with 
  numbered (region, role, employees, rn) as
  (
    select
      region, role, employees,
      row_number() over (partition by region order by employees, role)
    from mytable
  ),
  grouped (region, roles, employees, grp, rn) as
  (
    select region, role, employees, 1, rn from numbered where rn = 1
    union all
    select 
      n.region,
      case when g.employees >= 6 
          then n.role
          else g.roles || ', ' || n.role
      end,
      case when g.employees >= 6 
           then n.employees
           else n.employees + g.employees
      end,
      case when g.employees >= 6 
           then g.grp + 1
           else g.grp
      end,
      n.rn
    from grouped g
    join numbered n on n.region = g.region and n.rn = g.rn + 1
  )
select
  region, max(roles), max(employees),
  case when max(employees) >= 6
       then 'complete'
       else 'incomplete'
  end as complete
from grouped
group by region, grp
order by region, grp;

Result:

REGION ROLES EMPLOYEES COMPLETE
Africa Analyst, S. Analyst, Manager 14 complete
India Partner, S. Manager 9 complete
India S.Analyst 10 complete
USA Analyst, Partner, S. Analyst 5 incomplete

Demo: https://dbfiddle.uk/5_qM3PXo