Aggregating data with dimensions in M2M relation

36 Views Asked by At

I have a problem with sql query, maybe someone will be able to help me figure it out.

My tables:

Transaction Table:

Field Name Data Type
transaction_id INT
amount FLOAT
date DATETIME

Each transaction line can have some dimensions assigned. This is a many to many relation, with additional "percentage" field that determines what part of the transaction is marked with dimension

Dimension Item to Transaction Table (Intermediate Table):

Field Name Data Type
dimension_item_id INT
transaction_id INT
percentage FLOAT

I want to get aggregated data, where value for each combination of dimensions will be a initial amount multiplied by the percentages of the dimensions.

So assuming I have data:

  1. Transaction (only one for this example)
id amount
2885776 10000
  1. Dimension
id name
1 Dimension 1
2 Dimension 2
3 Dimension 3
  1. Dimension Item
id dimension_id name
57112 1 Item 57112 of dimension 1
56093 1 Item 56093 of dimension 1
54232 2 Item 54232 of dimension 2
54340 2 Item 54340 of dimension 2
101 3 Item 101 of dimension 3
  1. DimensionItemToTransaction (M2M intermediate)
transaction_id dimension_item_id percentage
2885776 57112 50
2885776 57093 50
2885776 54232 60
2885776 54340 40
2885776 101 100

So in case I have a

  • 50-50 split in dimension 1
  • 60-40 split in dimension 2
  • no split in dimension 3

I meant to achieve this by

  1. First getting table like below in a subquery:
transaction_id d1 d1_percentage d2 d2_percentage d3 d3_percentage amount
2885776 57112 50.00 54232 60.00 101 100.00 10000
2885776 57112 50.00 54340 40.00 101 100.00 10000
2885776 57093 50.00 54232 60.00 101 100.00 10000
2885776 57093 50.00 54340 40.00 101 100.00 10000
  1. Then in the main query I can calculate proper amount for each case.
transaction_id d1 d2 d3 amount
2885776 57112 54232 101 3000
2885776 57112 54340 101 2000
2885776 57093 54232 101 3000
2885776 57093 54340 101 2000
  1. Then I can do the rest in Python (grouping by dimensions with proper amount per percentage)

However I am struggling a bit with figuring out the proper way to do that.

I have a query currently that works fine, but it is very slow on larger datasets. In my query for each dimension I want to display I make a separate join on a subquery. This makes this query slow and even slower when I'll add more dimensions to the mix.

First table:

(  
    SELECT distinct  
       tl.id transaction_id,  

    dimension_1.id as d1,  
    dimension_1.percentage as d1_percentage,  

    dimension_2.id as d2,  
    dimension_2.percentage as d2_percentage,  

    dimension_3.id as d3,  
    dimension_3.percentage as d3_percentage,  

    tl.amount  

    FROM transaction tl  
  
  
left join (  
    select dtt.transaction_id, dtt.percentage percentage, dtt.dimension_item_id, od.id as id  
        from dimensionitemtotransaction dtt  
        join dimensionitem od on dtt.dimension_item_id = od.id  
        where od.dimension_id = 1  
    ) dimension_1 on tl.id = dimension_1.transaction_id  
  
left join (  
    select dtt.transaction_id, dtt.percentage percentage, dtt.dimension_item_id, od.id as id  
        from dimensionitemtotransaction dtt  
        join dimensionitem od on dtt.dimension_item_id = od.id  
        where od.dimension_id = 2  
    ) dimension_2 on tl.id = dimension_2.transaction_id  
  
left join (  
    select dtt.transaction_id, dtt.percentage percentage, dtt.dimension_item_id, od.id as id  
        from dimensionitemtotransaction dtt  
        join dimensionitem od on dtt.dimension_item_id = od.id  
        where od.dimension_id = 3  
    ) dimension_3 on tl.id = dimension_3.transaction_id  
  
) sq

Second table:

select distinct  
 transaction_id, d1, d2, d3,

coalesce(sum(  
    amount * coalesce(d1_percentage, 100) / 100 * coalesce(d2_percentage, 100) / 100 * coalesce(d3_percentage, 100) / 100  
), 0.0) as amount

from
    ( above subquery here ) sq

group by transaction_id, d1, d2, d3;

All my foreign keys are indexed.

I feel that this is a bad way to do that, but I'm having difficulties figuring any other way and I would appreciate suggestions.

Thanks!

0

There are 0 best solutions below