I cannot for the life of me figure out how to get a weighted ranking for scores across X categories. For example, the student needs to answer 10 questions across 3 categories (both # of questions and # of categories will be variable eventually). To get a total score the top 1 score in each of the X (3) categories will be added to whatever is left to add up to 10 total question scores.
Here is the data. I used a CASE WHEN Row_Number() to get the TopInCat
http://sqlfiddle.com/#!6/e6e9f/1
The fiddle has more students.
| Question | Student | Category | Score | TopInCat | |----------|---------|----------|-------|----------| | 120149 | 125 | 6 | 1 | 1 | | 120127 | 125 | 6 | 0.9 | 0 | | 120124 | 125 | 6 | 0.8 | 0 | | 120125 | 125 | 6 | 0.7 | 0 | | 120130 | 125 | 6 | 0.6 | 0 | | 120166 | 125 | 6 | 0.5 | 0 | | 120161 | 125 | 6 | 0.4 | 0 | | 120138 | 125 | 4 | 0.15 | 1 | | 120069 | 125 | 4 | 0.15 | 0 | | 120022 | 125 | 4 | 0.15 | 0 | | 120002 | 125 | 4 | 0.15 | 0 | | 120068 | 125 | 2 | 0.01 | 1 | | 120050 | 125 | 3 | 0.05 | 1 | | 120139 | 125 | 2 | 0 | 0 | | 120156 | 125 | 2 | 0 | 0 |
This is how I envision it needs to look, but it doesn't have to be exactly this. I just need to have 10 questions by 3 categories detail data in a way that would allow me to sum and average the Sort 1-10 column below. The 999's could be null or whatever as long as I can sum whats important and present the details.
| Question | Student | Category | Score | TopInCat | Sort | |----------|---------|----------|-------|----------|------| | 120149 | 125 | 6 | 1 | 1 | 1 | | 120138 | 125 | 4 | 0.15 | 1 | 2 | | 120068 | 125 | 2 | 0.01 | 1 | 3 | | 120127 | 125 | 6 | 0.9 | 0 | 4 | | 120124 | 125 | 6 | 0.8 | 0 | 5 | | 120125 | 125 | 6 | 0.7 | 0 | 6 | | 120130 | 125 | 6 | 0.6 | 0 | 7 | | 120166 | 125 | 6 | 0.5 | 0 | 8 | | 120161 | 125 | 6 | 0.4 | 0 | 9 | | 120069 | 125 | 4 | 0.15 | 0 | 10 | | 120022 | 125 | 4 | 0.15 | 0 | 999 | | 120002 | 125 | 4 | 0.15 | 0 | 999 | | 120050 | 125 | 3 | 0.05 | 1 | 999 | | 120139 | 125 | 2 | 0 | 0 | 999 | | 120156 | 125 | 2 | 0 | 0 | 999 |
One last thing, the category no longer matters once the X (3) threshold is met. So a 4th category would just sort normally.
| Question | Student | Category | Score | TopInCat | Sort | |----------|---------|----------|-------|----------|------| | 120149 | 126 | 6 | 1 | 1 | 1 | | 120138 | 126 | 4 | 0.75 | 1 | 2 | | 120068 | 126 | 2 | 0.50 | 1 | 3 | | 120127 | 126 | 6 | 0.9 | 0 | 4 | | 120124 | 126 | 6 | 0.8 | 0 | 5 | | 120125 | 126 | 6 | 0.7 | 0 | 6 | | 120130 | 126 | 6 | 0.6 | 0 | 7 | | 120166 | 126 | 6 | 0.5 | 0 | 8 | | 120050 | 126 | 3 | 0.45 | 1 | 9 |******** | 120161 | 126 | 6 | 0.4 | 0 | 10 | | 120069 | 126 | 4 | 0.15 | 0 | 999 | | 120022 | 126 | 4 | 0.15 | 0 | 999 | | 120002 | 126 | 4 | 0.15 | 0 | 999 | | 120139 | 126 | 2 | 0 | 0 | 999 | | 120156 | 126 | 2 | 0 | 0 | 999 |
I really appreciate any help. Been banging my head on this for a few days.
With such matters I like to proceed with a 'building blocks' approach. Following the maxim of first make it work, then if you need to make it fast, this first step is often enough.
So, given
and your sample data
let's proceed.
The complicated part here is identifying the top three top-in-category questions; the others of the ten questions of interest per student are simply sorted by score, which is easy. So let's start with identifying the top three top-in-category questions.
First, assign to each row a row number giving the ordering of that score within the category, for the student:
Now we are only interested in rows where
SeqInStudentCategory
is1
. Considering only such rows, let's order them by score within student, and number those rows:Now we are only interested in rows where
SeqInStudent
is at most3
. Let's pull them out, so that we know to include it (and exclude it from the simple sort by score, that we will use to make up the remaining seven rows):Now we have the three top-in-category questions for each student. We now need to identify and order by score the not top-in-category questions for each student:
Finally we combine
TopInCat
withNotTopInCat
, applying an appropriate offset and restriction toNotTopInCat.SeqInStudent
- we need to add3
to the raw value, and take the top7
(which is10 - 3
):To get our final results:
You can see the results on sqlfiddle.
Note that here I have assumed that every student will always have answers from at least three categories. Also, the final output doesn't have a
TopInCat
column, but hopefully you will see how to regain that if you want it.Also, "(both # of questions and # of categories will be variable eventually)" should be relatively straightforward to deal with here. But watch out for my assumption that (in this case) 3 categories will definitely be present in the answers of each student.