Is this BCNF? If not, what is a decomposition and is it FD preserving?

187 Views Asked by At

Consider relation schema R = (A, B, C, D, E) with functional dependencies BCE -> D, BDE -> C, ABE -> D and AB -> C.

Is the relation schema in BCNF? If not: What is a BCNF decomposition? Is this decomposition dependency preserving?

ABE is the candidate key and a superkey. So for the requirements of BCNF, the third functional dependency ABE -> D is satisfied but the other functional dependencies are not satisfied and need decomposition.

1

There are 1 best solutions below

2
Lifepillar On

I take that the FDs form a cover (all and only the FDs that hold are those that can be derived from the stated FDs), and that the attributes in R are all the attributes.

To determine whether a schema is in BCNF, first of all you have to find all the keys. In this case, there is only one key (ABE), as you have correctly determined. But keep in mind that, in general, a schema may have many keys.

It's not the dependencies that you decompose, but the schemas! The functional dependencies denote constraints, and you want to ideally preserve all of them when decomposing.

The schema is not in BCNF. (It is not even in 3NF, for that matter.) There is a simple to describe (though inefficient) algorithm to decompose R into BCNF. Some preliminaries first:

  • Without loss of generality, we assume that the right-hand side of each functional dependency (FD) consists of a single attribute.
  • attr(R) denotes the set of all the attributes of schema R.
  • The closure of a set of FDs is the set of all the logical consequences of that set of FDs. Note that such a set is always finite. More on that later.

Naive BCNF Decomposition Algorithm

With those preliminaries, the decomposition goes as follows:

  1. Take any functional dependency X -> A in which X is not a superkey. If there is no such dependency, the procedure is over. Otherwise, decompose R into two schemas R1 on attr(R) - {A} and R2 on X ∪ {A}, respectively.
  2. Project the closure of the set of FDs associated with R onto R1 and R2, respectively.
  3. Recursively apply step 1 and 2 to R1 and R2 (each with the corresponding set of FDs as determined in step 2).

Note that R1 and R2 have strictly less attributes than R (otherwise X in X -> A would be a superkey). So, the recursive application is on smaller schemas, hence the procedure must eventually stop.

Since you create a schema on X ∪ {A}, the FD X -> A is projected onto that schema, but X is a superkey in that schema. So, you have removed one violation of BCNF (it's less trivial to see that you have not created new ones in the other sub-schema, but, yes, you are not creating any new violation of BCNF).

Note also that it is crucial that the closure of the given FDs is projected. It is not enough, in general, to project just the explicitly given FDs. This is the inefficient part of the algorithm, because computing the logical consequences of a set of FDs takes exponential time in the size of the FDs in the worst case. There is a variant of this algorithm that requires computing the closure only once at the end of the decomposition rather than at each step, but I won't discuss it. You may find it, IIRC, in Ullman's Principles of database and knowledge-base systems (and certainly elsewhere).

Atomic Closure of a Set of FDs

In fact, it is not necessary to compute all the logical consequences. It is known that a much smaller subset (but still exponential in the worst case), known as the atomic closure, already contains all the FDs that matter for a correct decomposition. The algorithm is simple and elegant and you may find it, for instance, in H. Koehler 2007's PhD thesis entitled On Fast and Space-Efficient Database Normalization.

You start with a minimal cover M.C. of the original set of FDs (I assume you know what a minimal cover is), and you iteratively extend it. Let F* be the atomic closure you want to compute. Then:

F* <- M.C.
for each AY -> B in F* do
  for each X -> A in M.C. such that B is not in X do
    a. Remove redundant attributes wrt M.C. from the lhs of XY -> B
    b. Add the resulting FD to F*
return F*

Step (a) is performed as in the algorithm for computing a minimal cover. The outer for each considers each FD in, or added to, F* exactly once (note that F* may be updated at each iteration).

The Algorithm Applied to Your Example

In your example, the atomic closure of {BCE -> D, BDE -> C, ABE -> D and AB -> C} coincides with the set itself. So, in this specific case you may just project the given set of FDs.

If we choose AB -> C at the first step of the decomposition, we get:

R1(A, B, C)     {AB -> C}
R2(A, B, D, E)  {ABE -> D}

Each of R1 and R2 is in BCNF, so we are over.

If we choose BCE -> D as the first violating FD, we get:

R1(B, C, D, E)  {BCE -> D, BDE -> C}
R2(A, B, C, E)  {AB -> C}

In this case, R1 is in BCNF, but R2 is not, because R2's (only) key is ABE. Therefore, we need to decompose R2 recursively based on AB -> C. We get:

R21(A, B, C)    {AB -> C}
R22(A, B, E)    {}

These schemas are now in BCNF, and the final decomposition is:

R1(B, C, D, E)  {BCE -> D, BDE -> C}
R21(A, B, C)    {AB -> C}
R22(A, B, E)    {}

Note that, differently from the previous one, this one preserves all the FDs.

Finally, if we take BDE -> C at the first step, we get:

R1(B, C, D, E)  {BCE -> D, BDE -> C}
R2(A, B, D, E)  {ABE -> D}

Again, both R1 and R2 are in BCNF. But AB -> C is lost (it is not a logical consequence of the FDs of the decomposition).

So, this algorithm will always find a BCNF decomposition, but it does not guarantee that one that preserves all FDs will be found, even if it exists. As you see, that depends in particular on the order in which you select the FDs.

Faithful Decomposition in BCNF Using Classical 3NF Decomposition

You may also apply the “classical” algorithm for 3NF decomposition, which decomposes based on a minimal cover. Depending on the initial minimal cover, that algorithm may produce a decomposition that is also in BCNF. If a faithful BCNF decomposition exists (“faithful” meaning that no FD is lost in the process), starting from a minimal cover that is not “critical” (as defined below) guarantees that the 3NF decomposition algorithm will find a BCNF decomposition.

A minimal cover is critical if it contains a FD such that, when R is restricted to the attributes of that FD, the resulting schema is not in BNCF (again, the subschema remains associated with projection of the atomic closure of R's FDs).

Your example has only one minimal cover: {BDE -> C, BCE -> D, AB ->C}, which is clearly non-critical.

The 3NF decomposition algorithm is very simple. it starts with a minimal cover, and performs three steps:

  1. Create one or more separate schemas with the attributes that do not appear in the minimal cover, each with an associated empty set of FDs.
  2. Create one schema for each FD in the minimal cover, and project the atomic closure on each schema. (Schemas on the same set of attributes are of course the same schema.)
  3. If the resulting decomposition does not contain a superkey of R, add another schema on a superkey of R, with an associated empty set of FDs.

(Note that many textbooks—sigh!—omit the part about projecting FDs.)

Let's apply it to your example:

  1. Nothing to do.
  2. We get:
R1(B, C, D, E)  {BCE -> D, BDE -> C}
R2(A, B, C)     {AB -> C}
  1. None of the above schemas contains (the only) key of R ABE. So, we add another schema with key ABE:
R3(A, B, E)     {}

Since we started from a non-critical minimal cover, each schema in this decomposition is not only in 3NF, but it is in BCNF! In fact, this is the same as the second decomposition of the previous section. In general, however, this algorithm may result in a decomposition that you cannot obtain with the naive BCNF decomposition. On the other hand, this algorithm does not always produce a BCNF decomposition, while the naive BCNF algorithm does (possibly losing FDs).

Note that many textbooks stress the fact 3NF decomposition is an efficient (polynomial-time) algorithm: but you still have to compute the atomic closure to project the FDs! You cannot just project the explicitly given FDs, in general: you may lose some constraints.

A clarification about this last point: as 3NF decomposition projects all the FDs of a minimal cover (which is a set logically equivalent to the input FDs), of course you don't lose any FD by construction by just projecting the minimal cover, in the sense that the union of the FDs projected on the decomposition is still equivalent to the input FDs. But, it may still be the case that some schema of the decomposition is missing some constraint, allowing tuples that could never be in any projection of the original instance. That's why I insist on projecting the atomic cover.

A final note: normalization theory, as described here, has absolutely nothing to say about inter-relational constraints. But, of course, the decomposed schemas are related to each other. To consider a decomposition complete, one should add the relevant foreign keys (or, in some cases, inclusion dependencies). There is very little research about that, and, afaik, almost no textbook discusses such aspects.