Optimization of scheduling

Question

Optimization of scheduling

253 Views Asked by HGNY At 01 May 2023 at 22:53

(I have coded a brute-force solution to the following, but am wondering if there is a more elegant path.)

Students in my high school were offered 8 electives. They chose their first, second, third and fourth choices. As there will be two elective periods, each will be able to go to two of their chosen electives.

It has not yet been determined which electives will be scheduled on which of the two periods, however. The scheduling problem is, knowing the preferences, to optimize the allocation of electives to the two days so that as many students as possible will be able to take both of their preferred electives.

For example, if I signed up for IT and Art, I can only take both if they are scheduled different periods. Of course, you might have signed up for Art and French, and someone else for French and IT.

As only two periods are available, it won’t be possible for everyone’s choices to be on different periods. How can we find the optimal allocation of electives to days that allows as many people as possible to get their two top choices?

My brute-force solution was to create a list of all possible allocations (using a combinatorics package) and test each. This works fine, as the data-set is small, but there surely must be a graph theory or matrix solution that is much more elegant—-and doesn’t increase resources in proportion to n! (n being the number of classes offered)!

The following imports a matrix structured as follows:

    Course1   Course2    Course3 ...

Course1 ___a_______b_________c
Course2
Course3

Each cell shows how many students chose that particular pair as their 1st and 2nd choices. For example, the value of b is the # of students who chose Course1 and Course2 (in any order) as their top choices. (The matrix is diagonally symmetrical.) The top row of names is in the file and used to get the course names. The first column of course names does not appear in the file, however. ...

from itertools import combinations
from math import floor, ceil
from pandas import read_excel


def takesuccesses(entry):   #Returns the number of successes for each recorded result
    return entry[0]
def complement(set1):       #Returns a list of all items not in set1
    set2 = []
    for i in Set:
        if i not in set1: set2.append(i)
    return set2
def courselist(set):        #Translates a list of numeric codes to a string of course names
    names = ""
    for i in set:
        if i is not StudyHall: names+=courses[i]+", "   #Exclude Study Hall
    return names[:-2]
def GenerateList(): #Creates a list of possible groupings of courses to meet first elective period
    SetList = []    
    for setsize in range(min,max+1):    #try various numbers of classes meeting per elective period
        for ComboSet in combinations(Set[:-1], setsize):        #Avoid duplicate solutions by always placing last course in second set
            if StudyHall not in ComboSet: SetList.append(ComboSet)  #Ensure Study Hall is not in set (avoids duplicate solutions)
    return SetList
def count2d(matrix):    #total all entries in a 2d matrix
    total = 0
    for row in matrix:
        for entry in row: total+=entry
    return total

for qn in range(1,5):   #Cycle through all four quarters
    quarter = "Q"+str(qn)
    #import matrix tracking how many students chose each combination of electives
    matr = read_excel("WithoutScience.xlsx", sheet_name=quarter)
    matrix=matr.values
    courses = list(matr.columns) 
    StudyHall=courses.index("Study Hall")     
    
    n= len(matrix)          #Pick up number of courses offered from matrix
    min = floor((n-2)/2)    #Set a min number of courses per elective period
    max = ceil((n+3)/2)     #Set a max number of courses per elective period
    Set = range(n)          #list of all possible courses by number
    
    SetList=GenerateList()  #Create a list of plausible groupings of offerings 
    results = []    #List to record the total successes for each possible grouping
    
    for set1 in SetList:    #Check,for each combination, how many people can take both their 1st and 2nd choice
        successes = matrix[StudyHall][StudyHall]    #Start by counting anyone who chose two Study Halls as a success
        
        #Cycle through cells in one half of the symmetrical matrix
        for i in range(n-1):
            for j in range(i+1,n):
                #Check if these courses meet on different days or if one is study hall
                if i==StudyHall or j==StudyHall or (i in set1 and j not in set1) or (j in set1 and i not in set1):
                    successes += matrix[i][j]       #Students can take both these courses
        results.append((successes, set1))
    
    StudentCount = int(count2d(matrix)/2)
    #Put the best combinations up front and print all optimal results
    results.sort(key=takesuccesses, reverse=True)       
    besttotal, bestset = results[0][0],results[0][1]    #The first optimal combination
    firstbest=besttotal     #Any solutions with this same value are also optimal
    i=0
    print("START OF QUARTER", qn)
    while firstbest==besttotal:     #Search for more optimal combinations
        print("In", quarter, besttotal,"of", StudentCount, "students can take both their elective choices by grouping\n", courselist(bestset), "\n and\n", courselist(complement(bestset)), "\n")
        i+=1
        besttotal, bestset = results[i][0],results[i][1]    #Load next possible optimal solution
    print("END OF QUARTER", qn, "\n")

Original Q&A

There are 2 best solutions below

ravenspoint On 02 May 2023 at 00:46

There is no need to test every possible combination of courses in the two slots. You only need test combinations that exist in a students selections. Each students selection of 4 courses creates 6 pairs that would allow the student to take two courses

A,B,C,D => AD,AB,AD,BC,BD,CD

So, enumerate the course pairings that would satisfy at least one student, and for each pairing count how many students include that pair. The pair with the most students is the answer.

- Construct empty map M, keyed by course pair, value count of students
- LOOP S over students
  - LOOP P over the pairs that let S take two courses
     - IF P in M 
          -increment value of P in map
     - ELSE 
         - add P to M with value 1
- SET PMAX to first pair in M
_ SET VMAX to value of first pair in M
- LOOP P over pairs in M
   - SET V to value of P in M
   - IF V > VMAX
        SET PMAX to P
        SET VMAX to V
- OUTPUT PMAX

**Reinderien** · Accepted Answer · 2023-05-03T02:26:45.937000

There are many variations on this problem. It's easy enough to do as a linear program with binary assignment variables for student-class-period triples. This variation places higher weight on classes that a student has ranked as being more preferred.

import pandas as pd
import pulp

classes = pd.Index(
    name='class',
    data=('math', 'science', 'civics', 'biology', 'chemistry', 'shop', 'gym', 'art'),
)
students = pd.Index(
    name='student',
    data=('Roberta', 'Superfly', 'Gallahad', 'Steve', 'Jill'),
)
# higher rank is more preferred
votes = pd.DataFrame(
    index=classes, columns=students,
    data=(
        (0,0,4,1,3),
        (2,0,0,3,0),
        (1,0,2,0,0),
        (0,2,1,0,1),
        (3,0,3,0,2),
        (0,4,0,2,0),
        (4,1,0,4,0),
        (0,3,0,0,4),
    ),
)


def make_student_assignments(cell: pd.Series) -> pulp.LpVariable:
    prefix, = cell
    period, class_, student = cell.name
    asn = pulp.LpVariable(
        name=f'{prefix}p{period}_{class_}_{student}', cat=pulp.LpBinary)

    global preference_value
    preference_value += asn*votes.loc[class_, student]

    return asn


def make_period_assignments(cell: pd.Series) -> pulp.LpVariable:
    prefix, = cell
    period, class_ = cell.name
    period_asn = pulp.LpVariable(
        name=f'{prefix}p{period}_{class_}', cat=pulp.LpBinary)

    # For all students in this period's class, if this
    # period has not been scheduled then they cannot attend
    group = student_assignments[(period, class_, slice(None))]
    for s in group:
        prob.addConstraint(
            name=f'{s.name}_schedule',
            constraint=s <= period_asn)

    return period_asn


prob = pulp.LpProblem(name='class_schedule', sense=pulp.LpMaximize)
preference_value = pulp.LpAffineExpression()

periods = pd.RangeIndex(name='period', start=0, stop=2)
triple_idx = pd.MultiIndex.from_product((periods, classes, students))
student_assignments = pd.DataFrame(
    index=triple_idx, data={'col': 's_'},
).apply(make_student_assignments, axis=1)
student_assignments.name = 'student_asn'

period_assignments = pd.DataFrame(
    index=pd.MultiIndex.from_product((periods, classes)),
    data={'col': 'p_'},
).apply(make_period_assignments, axis=1)
period_assignments.name = 'period_asn'

# For each period and student, they must take exactly one class
for (period, student), classes in student_assignments.groupby(['period', 'student']):
    prob.addConstraint(
        name=f'excl_{student}_p{period}',
        constraint=pulp.lpSum(classes) == 1)

# For each class, it must be assigned to exactly one period
for class_, periods in period_assignments.groupby(level='class'):
    prob.addConstraint(
        name=f'excl_{class_}', constraint=1 == pulp.lpSum(periods))

prob.objective = preference_value

print(votes)
print(period_assignments)
print(student_assignments)
print(prob)
prob.solve()
assert prob.status == pulp.LpStatusOptimal

for period, student_classes in student_assignments.groupby('period'):
    print(f'PERIOD {period}')

    for class_, students in student_classes.groupby('class'):
        selected = students.apply(pulp.LpVariable.value).astype(bool)
        if selected.any():
            print(f'  {class_}')
            print('   ', ', '.join(students[selected].index.get_level_values('student')))
    print()

Result - Optimal solution found

Objective value:                35.00000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.01
Time (Wallclock seconds):       0.01

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.01   (Wallclock seconds):       0.01

PERIOD 0
  gym
    Roberta, Steve
  math
    Gallahad, Jill
  shop
    Superfly

PERIOD 1
  art
    Superfly, Jill
  chemistry
    Roberta, Gallahad
  science
    Steve

(4 + 3)*5 == 35 so all students have their top choices. A more realistic scenario where some compromises need to be made due to the number of students and diversity of votes:

import numpy as np
import pandas as pd
import pulp
from numpy.random import default_rng

classes = pd.Index(
    name='class',
    data=('math', 'science', 'civics', 'biology', 'chemistry', 'shop', 'gym', 'art'),
)
students = pd.RangeIndex(name='student', start=0, stop=100)

# higher rank is more preferred
ranks = np.clip(
    a_min=0, a_max=None,
    a=np.tile(np.arange(-3, 5), (len(students), 1)),
).T
rand = default_rng(seed=0)
votes = pd.DataFrame(
    index=classes, columns=students,
    data=rand.permuted(ranks, axis=0),
)


def make_student_assignments(cell: pd.Series) -> pulp.LpVariable:
    prefix, = cell
    period, class_, student = cell.name
    asn = pulp.LpVariable(
        name=f'{prefix}p{period}_{class_}_{student}', cat=pulp.LpBinary)

    global preference_value
    preference_value += asn*votes.loc[class_, student]

    return asn


def make_period_assignments(cell: pd.Series) -> pulp.LpVariable:
    prefix, = cell
    period, class_ = cell.name
    period_asn = pulp.LpVariable(
        name=f'{prefix}p{period}_{class_}', cat=pulp.LpBinary)

    # For all students in this period's class, if this
    # period has not been scheduled then they cannot attend
    group = student_assignments[(period, class_, slice(None))]
    for s in group:
        prob.addConstraint(
            name=f'{s.name}_schedule',
            constraint=s <= period_asn)

    return period_asn


prob = pulp.LpProblem(name='class_schedule', sense=pulp.LpMaximize)
preference_value = pulp.LpAffineExpression()

periods = pd.RangeIndex(name='period', start=0, stop=2)
triple_idx = pd.MultiIndex.from_product((periods, classes, students))
student_assignments = pd.DataFrame(
    index=triple_idx, data={'col': 's_'},
).apply(make_student_assignments, axis=1)
student_assignments.name = 'student_asn'

period_assignments = pd.DataFrame(
    index=pd.MultiIndex.from_product((periods, classes)),
    data={'col': 'p_'},
).apply(make_period_assignments, axis=1)
period_assignments.name = 'period_asn'

# For each period and student, they must take exactly one class
for (period, student), classes in student_assignments.groupby(['period', 'student']):
    prob.addConstraint(
        name=f'excl_{student}_p{period}',
        constraint=pulp.lpSum(classes) == 1)

# For each class, it must be assigned to exactly one period
for class_, periods in period_assignments.groupby(level='class'):
    prob.addConstraint(
        name=f'excl_{class_}', constraint=1 == pulp.lpSum(periods))

period_assignments = period_assignments.unstack(level='period')

prob.objective = preference_value
print(prob)

prob.solve()
assert prob.status == pulp.LpStatusOptimal

print(f'Solution preference quality: {preference_value.value()/(3+4)/len(students):.1%}')
print()

print('Period schedule:')
print(period_assignments.applymap(pulp.LpVariable.value).astype(int))
print()

print('Student schedule:')
for period, student_classes in student_assignments.groupby('period'):
    print(f'PERIOD {period}')

    for class_, students in student_classes.groupby('class'):
        selected = students.apply(pulp.LpVariable.value).astype(bool)
        if selected.any():
            print(f'  {class_}')
            print('   ', ', '.join(
                students[selected].index.get_level_values('student').astype(str)
            ))
    print()

class_schedule:
MAXIMIZE
4*s_p0_art_0 + 4*s_p0_art_11 + ... + 1*s_p1_shop_99 + 0
SUBJECT TO
s_p0_math_0_schedule: - p_p0_math + s_p0_math_0 <= 0

s_p0_math_1_schedule: - p_p0_math + s_p0_math_1 <= 0

s_p0_math_2_schedule: - p_p0_math + s_p0_math_2 <= 0

...

excl_0_p0: s_p0_art_0 + s_p0_biology_0 + s_p0_chemistry_0 + s_p0_civics_0
 + s_p0_gym_0 + s_p0_math_0 + s_p0_science_0 + s_p0_shop_0 = 1

...

excl_art: p_p0_art + p_p1_art = 1

excl_biology: p_p0_biology + p_p1_biology = 1

excl_chemistry: p_p0_chemistry + p_p1_chemistry = 1

excl_civics: p_p0_civics + p_p1_civics = 1

excl_gym: p_p0_gym + p_p1_gym = 1

excl_math: p_p0_math + p_p1_math = 1

excl_science: p_p0_science + p_p1_science = 1

excl_shop: p_p0_shop + p_p1_shop = 1

VARIABLES
0 <= p_p0_art <= 1 Integer
0 <= p_p0_biology <= 1 Integer
0 <= p_p0_chemistry <= 1 Integer
0 <= p_p0_civics <= 1 Integer
0 <= p_p0_gym <= 1 Integer
0 <= p_p0_math <= 1 Integer
0 <= p_p0_science <= 1 Integer
0 <= p_p0_shop <= 1 Integer
0 <= p_p1_art <= 1 Integer
0 <= p_p1_biology <= 1 Integer
0 <= p_p1_chemistry <= 1 Integer
0 <= p_p1_civics <= 1 Integer
0 <= p_p1_gym <= 1 Integer
0 <= p_p1_math <= 1 Integer
0 <= p_p1_science <= 1 Integer
0 <= p_p1_shop <= 1 Integer
0 <= s_p0_art_0 <= 1 Integer
0 <= s_p0_art_1 <= 1 Integer
...
0 <= s_p1_shop_99 <= 1 Integer

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

At line 2 NAME          MODEL
At line 3 ROWS
At line 1813 COLUMNS
At line 10662 RHS
At line 12471 BOUNDS
At line 14088 ENDATA
Problem MODEL has 1808 rows, 1616 columns and 4816 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 700 - 0.01 seconds
Cgl0004I processed model has 1800 rows, 1608 columns (1608 integer (1608 of which binary)) and 4800 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 408 integers unsatisfied sum - 204
Cbc0038I Pass   1: suminf.  147.16667 (571) obj. -260 iterations 821
Cbc0038I Pass   2: suminf.    0.00000 (0) obj. -248 iterations 260
Cbc0038I Solution found of -248
Cbc0038I Before mini branch and bound, 734 integers at bound fixed and 0 continuous
Cbc0038I Full problem 1800 rows 1608 columns, reduced to 1055 rows 863 columns - 64 fixed gives 0, 0 - ok now
Cbc0038I Full problem 1800 rows 1608 columns, reduced to 0 rows 0 columns
Cbc0038I Mini branch and bound improved solution from -248 to -411 (0.12 seconds)
Cbc0038I Round again with cutoff of -440.8
Cbc0038I Pass   3: suminf.  151.00000 (303) obj. -440.8 iterations 413
Cbc0038I Pass   4: suminf.  151.00000 (303) obj. -440.8 iterations 8
Cbc0038I Pass   5: suminf.    0.20009 (2) obj. -440.8 iterations 232
Cbc0038I Solution found of -441
Cbc0038I Before mini branch and bound, 995 integers at bound fixed and 0 continuous
Cbc0038I Full problem 1800 rows 1608 columns, reduced to 754 rows 562 columns
Cbc0038I Mini branch and bound improved solution from -441 to -630 (0.20 seconds)
Cbc0038I Round again with cutoff of -644.8
Cbc0038I Pass   6: suminf.  175.02961 (480) obj. -644.8 iterations 567
...
Cbc0038I Pass  35: suminf.   10.85424 (474) obj. -644.8 iterations 18
Cbc0038I No solution found this major pass
Cbc0038I Before mini branch and bound, 1049 integers at bound fixed and 0 continuous
Cbc0038I Full problem 1800 rows 1608 columns, reduced to 661 rows 469 columns
Cbc0038I Mini branch and bound improved solution from -630 to -642 (0.38 seconds)
Cbc0038I Round again with cutoff of -661.36
Cbc0038I Pass  35: suminf.  170.48019 (477) obj. -661.36 iterations 33
Cbc0038I Pass  36: suminf.  106.57149 (478) obj. -661.36 iterations 174
...
Cbc0038I Pass  93: suminf.   98.91716 (460) obj. -672.952 iterations 71
Cbc0038I No solution found this major pass
Cbc0038I Before mini branch and bound, 1012 integers at bound fixed and 0 continuous
Cbc0038I Full problem 1800 rows 1608 columns, reduced to 719 rows 527 columns
Cbc0038I Mini branch and bound did not improve solution (0.75 seconds)
Cbc0038I After 0.75 seconds - Feasibility pump exiting with objective of -654 - took 0.69 seconds
Cbc0012I Integer solution of -654 found by feasibility pump after 0 iterations and 0 nodes (0.75 seconds)
Cbc0038I Full problem 1800 rows 1608 columns, reduced to 470 rows 278 columns
Cbc0031I 9 added rows had average density of 50.111111
Cbc0013I At root node, 9 cuts changed objective from -700 to -695 in 13 passes
Cbc0014I Cut generator 0 (Probing) - 0 row cuts average 0.0 elements, 0 column cuts (3 active)  in 0.142 seconds - new frequency is -100
Cbc0014I Cut generator 1 (Gomory) - 73 row cuts average 71.9 elements, 0 column cuts (0 active)  in 0.074 seconds - new frequency is -100
Cbc0014I Cut generator 2 (Knapsack) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.015 seconds - new frequency is -100
Cbc0014I Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.023 seconds - new frequency is -100
Cbc0014I Cut generator 4 (MixedIntegerRounding2) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.008 seconds - new frequency is -100
Cbc0014I Cut generator 5 (FlowCover) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cbc0014I Cut generator 6 (TwoMirCuts) - 1366 row cuts average 31.2 elements, 0 column cuts (0 active)  in 0.091 seconds - new frequency is 1
Cbc0010I After 0 nodes, 1 on tree, -654 best solution, best possible -695 (1.28 seconds)
Cbc0038I Full problem 1800 rows 1608 columns, reduced to 530 rows 338 columns
Cbc0016I Integer solution of -655 found by strong branching after 1752 iterations and 54 nodes (2.28 seconds)
Cbc0038I Full problem 1800 rows 1608 columns, reduced to 434 rows 276 columns
Cbc0010I After 100 nodes, 2 on tree, -655 best solution, best possible -695 (2.61 seconds)
Cbc0001I Search completed - best objective -655, took 4015 iterations and 110 nodes (2.95 seconds)
Cbc0032I Strong branching done 1370 times (18808 iterations), fathomed 15 nodes and fixed 28 variables
Cbc0035I Maximum depth 20, 8295 variables fixed on reduced cost
Cuts at root node changed objective from -700 to -695
Probing was tried 13 times and created 0 cuts of which 3 were active after adding rounds of cuts (0.142 seconds)
Gomory was tried 13 times and created 73 cuts of which 0 were active after adding rounds of cuts (0.074 seconds)
Knapsack was tried 13 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.015 seconds)
Clique was tried 13 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.023 seconds)
MixedIntegerRounding2 was tried 13 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.008 seconds)
FlowCover was tried 13 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 50 times and created 2788 cuts of which 0 were active after adding rounds of cuts (0.269 seconds)
ZeroHalf was tried 1 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.016 seconds)

Result - Optimal solution found

Objective value:                655.00000000
Enumerated nodes:               110
Total iterations:               4015
Time (CPU seconds):             2.98
Time (Wallclock seconds):       2.98

Option for printingOptions changed from normal to all
Total time (CPU seconds):       2.99   (Wallclock seconds):       2.99

Solution preference quality: 93.6%

Period schedule:
period     0  1
class          
art        1  0
biology    0  1
chemistry  1  0
civics     0  1
gym        1  0
math       1  0
science    0  1
shop       0  1

Student schedule:
PERIOD 0
  art
    0, 4, 6, 11, 13, 20, 23, 31, 34, 46, 50, 52, 58, 60, 65, 67, 70, 73, 78, 79, 83, 88, 93, 99
  chemistry
    2, 3, 10, 16, 18, 26, 27, 28, 32, 35, 36, 40, 41, 42, 45, 48, 51, 54, 59, 61, 66, 69, 72, 75, 76, 80, 89, 95
  gym
    9, 17, 19, 21, 25, 30, 33, 39, 49, 53, 56, 63, 64, 68, 71, 85, 97, 98
  math
    1, 5, 7, 8, 12, 14, 15, 22, 24, 29, 37, 38, 43, 44, 47, 55, 57, 62, 74, 77, 81, 82, 84, 86, 87, 90, 91, 92, 94, 96

PERIOD 1
  biology
    0, 2, 4, 15, 21, 25, 28, 31, 33, 34, 37, 38, 39, 46, 47, 50, 54, 60, 61, 62, 66, 68, 73, 76, 77, 78, 81, 83, 87, 93
  civics
    1, 11, 16, 17, 18, 22, 24, 29, 30, 41, 44, 45, 53, 55, 58, 67, 71, 74, 75, 79, 80, 85, 86, 90, 92, 96, 98
  science
    3, 5, 7, 8, 9, 10, 13, 20, 23, 32, 35, 36, 51, 57, 59, 63, 88, 89, 94
  shop
    6, 12, 14, 19, 26, 27, 40, 42, 43, 48, 49, 52, 56, 64, 65, 69, 70, 72, 82, 84, 91, 95, 97, 99

In this scenario, all eight classes are scheduled, and the student preference metric is 93.6% where 100% would mean everyone got their top two choices.

The LP approach is quite flexible - you can apply other constraints such as minimum or maximum class sizes, number of classes scheduled per day, student class eligibility, etc.

Optimization of scheduling

There are 2 best solutions below

Related Questions in OPTIMIZATION

Related Questions in GRAPH-THEORY

Related Questions in RESOURCE-SCHEDULING

Trending Questions

Popular # Hahtags

Popular Questions