Relational Algebra Join Optimizing

452 Views Asked by At

I've seen in my textbook for school that many of the join operations never seemed to optimize the table on the right of the join, and only on the left. For example, to find the name of the Employee managing the Database Department, you would do this:
name( Mgr_ssn ( Dname = 'Database' ( Department ) ) ⨝Mgr_ssn = ssn Employee )

So I'm wondering if it would be equally correct to do something like:
name( Mgr_ssn ( Dname = 'Database' ( Department ) ) ⨝Mgr_ssn = ssn ( ssn, name Employee ) )

This is assuming of course that Employee has many other attributes. In doing so I would think that the system would be saving time by not having to worry about joining all the other attributes of Employee when in the end they will be projected out anyway. I never have seen projection like this on the right side of a join before and am wondering if it is acceptable and/or unnecessary.

2

There are 2 best solutions below

1
On

Most optimizers use system R optimizer which considers only left deep joins. That is why you never see joins on the right.

The search space of all the options is exponential so optimizers want to find reasonably acceptable solutions quickly (optimizers don't find the best solution, they try to avoid the worst ones).

P.S. The reason for using left deep joins is that it allows pipelining the results without the need to write them to disk, which saves I/Os.

0
On

Any decent query optimizer will push down appropriate restrictions, and sometimes projections too, to minimize the data to be processed. And because the optimizer would do it automatically, and the result is identical, there's no particular need to optimize the expression in the relational algebra.

In a two-table join sequence like this, it is not clear that there would be a benefit to forming the projection before joining with department; the likely processing sequence is find the (probably single) department with Dname = 'Database' and then find the single row in Employee with the E.SSN = D.Mgr_SSN. However, if the sub-expression were used multiple times, it might well be worth doing.

I also note that the design is awful - you should never use anything as sensitive as an SSN as the joining field in a database design. The PCI team would have fits! But maybe the names are a hangover from gentler times now long past, but the content is a generated surrogate and the real SSN is stored in Employee.RealSSN (which might even be encrypted to ensure that the unauthorized do not see it - though setting the permissions correctly on the column so that only the authorized can select it is also effective).