I have a problem with sql query, maybe someone will be able to help me figure it out.
My tables:
Transaction Table:
| Field Name | Data Type |
|---|---|
| transaction_id | INT |
| amount | FLOAT |
| date | DATETIME |
Each transaction line can have some dimensions assigned. This is a many to many relation, with additional "percentage" field that determines what part of the transaction is marked with dimension
Dimension Item to Transaction Table (Intermediate Table):
| Field Name | Data Type |
|---|---|
| dimension_item_id | INT |
| transaction_id | INT |
| percentage | FLOAT |
I want to get aggregated data, where value for each combination of dimensions will be a initial amount multiplied by the percentages of the dimensions.
So assuming I have data:
- Transaction (only one for this example)
| id | amount |
|---|---|
| 2885776 | 10000 |
- Dimension
| id | name |
|---|---|
| 1 | Dimension 1 |
| 2 | Dimension 2 |
| 3 | Dimension 3 |
- Dimension Item
| id | dimension_id | name |
|---|---|---|
| 57112 | 1 | Item 57112 of dimension 1 |
| 56093 | 1 | Item 56093 of dimension 1 |
| 54232 | 2 | Item 54232 of dimension 2 |
| 54340 | 2 | Item 54340 of dimension 2 |
| 101 | 3 | Item 101 of dimension 3 |
- DimensionItemToTransaction (M2M intermediate)
| transaction_id | dimension_item_id | percentage |
|---|---|---|
| 2885776 | 57112 | 50 |
| 2885776 | 57093 | 50 |
| 2885776 | 54232 | 60 |
| 2885776 | 54340 | 40 |
| 2885776 | 101 | 100 |
So in case I have a
- 50-50 split in dimension 1
- 60-40 split in dimension 2
- no split in dimension 3
I meant to achieve this by
- First getting table like below in a subquery:
| transaction_id | d1 | d1_percentage | d2 | d2_percentage | d3 | d3_percentage | amount |
|---|---|---|---|---|---|---|---|
| 2885776 | 57112 | 50.00 | 54232 | 60.00 | 101 | 100.00 | 10000 |
| 2885776 | 57112 | 50.00 | 54340 | 40.00 | 101 | 100.00 | 10000 |
| 2885776 | 57093 | 50.00 | 54232 | 60.00 | 101 | 100.00 | 10000 |
| 2885776 | 57093 | 50.00 | 54340 | 40.00 | 101 | 100.00 | 10000 |
- Then in the main query I can calculate proper amount for each case.
| transaction_id | d1 | d2 | d3 | amount |
|---|---|---|---|---|
| 2885776 | 57112 | 54232 | 101 | 3000 |
| 2885776 | 57112 | 54340 | 101 | 2000 |
| 2885776 | 57093 | 54232 | 101 | 3000 |
| 2885776 | 57093 | 54340 | 101 | 2000 |
- Then I can do the rest in Python (grouping by dimensions with proper amount per percentage)
However I am struggling a bit with figuring out the proper way to do that.
I have a query currently that works fine, but it is very slow on larger datasets. In my query for each dimension I want to display I make a separate join on a subquery. This makes this query slow and even slower when I'll add more dimensions to the mix.
First table:
(
SELECT distinct
tl.id transaction_id,
dimension_1.id as d1,
dimension_1.percentage as d1_percentage,
dimension_2.id as d2,
dimension_2.percentage as d2_percentage,
dimension_3.id as d3,
dimension_3.percentage as d3_percentage,
tl.amount
FROM transaction tl
left join (
select dtt.transaction_id, dtt.percentage percentage, dtt.dimension_item_id, od.id as id
from dimensionitemtotransaction dtt
join dimensionitem od on dtt.dimension_item_id = od.id
where od.dimension_id = 1
) dimension_1 on tl.id = dimension_1.transaction_id
left join (
select dtt.transaction_id, dtt.percentage percentage, dtt.dimension_item_id, od.id as id
from dimensionitemtotransaction dtt
join dimensionitem od on dtt.dimension_item_id = od.id
where od.dimension_id = 2
) dimension_2 on tl.id = dimension_2.transaction_id
left join (
select dtt.transaction_id, dtt.percentage percentage, dtt.dimension_item_id, od.id as id
from dimensionitemtotransaction dtt
join dimensionitem od on dtt.dimension_item_id = od.id
where od.dimension_id = 3
) dimension_3 on tl.id = dimension_3.transaction_id
) sq
Second table:
select distinct
transaction_id, d1, d2, d3,
coalesce(sum(
amount * coalesce(d1_percentage, 100) / 100 * coalesce(d2_percentage, 100) / 100 * coalesce(d3_percentage, 100) / 100
), 0.0) as amount
from
( above subquery here ) sq
group by transaction_id, d1, d2, d3;
All my foreign keys are indexed.
I feel that this is a bad way to do that, but I'm having difficulties figuring any other way and I would appreciate suggestions.
Thanks!