Section 6.5.  Preliminaries for Normalization.

 

Idea in normalization, start with data item names (to be columns in some tables) together with a list of rules of relatedness.  Then start with all data items in one table (universal table).

 

Rules of relatedness and a desire to avoid certain types of bad behavior (anomalies) causes us to factor this big table into smaller tables, achiev­ing more and more restrictive forms (Normal Forms).  1NF, 2NF, 3NF, BCNF.

 

The idea is that from the factored tables can always get back all the original data by joins; this is called a "lossless decomposition".

 

A second desire we have is that the database system will be able to check the rules of relatedness as simply as possible (generally by simply enforc­ing uniqueness of a column in a table).

 

Section 6.6. Functional Dependencies.

 

Def 6.6.1.  Given a table T with at least two attributes A and B, we say that A -> B (A functionally determines B, or B is functionally dependent on A)  iff it is the intent of the designer that for any set of rows that might exist in the table, two rows in T cannot agree on A and disagree on B.

 

More formally, given two rows r1 and r2 in T, if r1(A) = r2(A) then r1(B) = r2(B).

 

Def. 6.6.2.  Now sets of attributes:  X = A1 A2 . . . Ak, and Y = B1 B2 . . . Bm.  Say X -> Y iff it is the intention of the designer that two rows cannot si­mul­ta­ne­ously agree on X and disagree on Y.

 

Same wording as Def. 6.6.1, but note that for a set X, agrees if and only if agrees on ALL column value, disagrees if disagrees on ANY column value.

 

Ex. 6.6.3.  We claim what follows is all FDs of emp_info. 

 

        (1)  emp_id -> emp_name emp_phone dept_name

        (2)  dept_name -> dept_phone dept_mgrname

        (3)  skill_id -> skill_name

        (4)  emp_id skill_id -> skill_date skill_lvl

 

Note that we can conclude from above that designer does not intend that skill_name should be unique for a particular skill.  skill_name -> skill_id is not there, nor is it implied by this set (no skill_name on left).

 

Note that a set of FDs has logical implications to derive OTHER FDs. 

 

Theorem 6.6.3. Inclusion rule.  Given T with Head(T).  If X and Y are sets in Head(T) and Y Í  X, then X -> Y (for ANY content for T). (Venn Diagram.)

 

Proof.  By def, need only demonstrate that if two rows u and v agree on X they must agree on Y.  But Y is a subset of X, so seems obvious.

 

Def. 6.6.4.  A trivial dependency is an FD of the form X -> Y that holds for any table T where X UNION Y Í  Head(T). (Assume ANY content for T.)

 

Theorem 6.6.5.  Given a trivial dependency X -> Y in T, it must be the case that Y Í X.

 

Def. 6.6.6.  Armstrong's Axioms.  From the following small set of basic rules of implication among FDs, we can derive all others that are true.

 

[1]  Inclusion rule:              if Y Í  X, then X -> Y

[2]  Transitivity rule:           if X -> Y and Y -> Z, then X -> Z

[3]  Augmentation rule:       if X -> Y, then X Z -> Y Z

Note X Z for sets is the same as X UNION Z.

 

Everything we can say about FDs is implied by these axioms.

 

Let's prove one of these axioms.  We will be appealing to definitions of Functional Dependency here to prove an Armstrong Axiom.

 

Prove Augmentation Rule.  If X -> Y then X Z -> Y Z.  Consider two rows u and v that agree on X Z, that is they agree on all attributes in X UNION Z.  Then since u and v agree on all attributes of X they agree on all attributes of Y (since X -> Y).  Similarly, since it is given that they agree on all attributes of Z and we have just shown they agree on all attributes of Y, they agree on all attributes of Y UNION Z.  Thus, from the fact that u and v agree on all attributes of X UNION Z we can conclude that they also agree on all attributes of Y UNION Z, and by definition we have X Z -> Y Z.

 

Some implications of Armstrong's Axioms. 

    

(1)   Union Rule:                       If  X -> Y  and  X -> Z  then  X -> Y Z

(2)   Decomposition Rule:         If  X -> Y Z  then  X -> Y  and  X -> Z

(3)   Pseudotransitivity Rule:    If  X -> Y  and  W Y -> Z  then  X W -> Z

(4)   Accumulation Rule:           If  X -> Y Z  and  Z -> B W  then  X -> Y Z B

 

One Example:

 

[1]  Union Rule:        If X -> Y and X -> Z, then X -> Y Z.

 

Proof:  We are given  (a) X -> Y and [3]  Augmentation rule: if X -> Y, then X Z -> Y Z, hence we have (c) X X -> X Y.  But X X is X UNION X = X, so (c) can be rewritten
(d) X -> X Y. 

Similarly we are given (b) X -> Z so by Armstrong's Augmentation rule we have
(e) X Y -> Y Z. 

Finally, by (d) and (e) and transitivity, we have X -> Y Z, the desired result.

The idea is that we can use these new Rules as if they were axioms, as facts to prove yet other rules or special cases.  Like proving theorem in Geometry, can then use to prove other theorems.

 

In what follows, when we list a set of FDs, we normally try to list a MINIMAL set, so that a smaller set doesn't exist that will imply these.  It will turn out that finding a minimal set of FDs is very important in finding the right relational design by Normalization.

 

Example 6.6.4.  Pg. 338.  Consider the table T below fixed in content for all time so the intended FDs can be read off (VERY UNUSUAL).  Let's try to list a minimal set of FDs.

 

             Table T

row #

A

B

C

D

    1

a1

b1

c1

d1

    2

a1

b1

c2

d2

    3

a2

b1

c1

d3

    4

a2

b1

c3

d4

 

Analysis.  Start by considering FDs with a single attribute on the left. 

 

Always have the trivial FDs, A -> A, B -> B, C -> C, and D -> D, but don't list trivial FDs in a minimal set.

 

(a)  All values of the B attribute are the same, so it can never happen for any other at­tribute P (i.e., where P represents A, C, or D) that r1(P) = r2(P) while r1(B) ≠ r2(B);  thus we see that A -> B, C -> B, and D -> B.  At the same time no other attribute P is functionally dependent on B since they all have at least two distinct values, and so there are al­ways two rows r1 and r2 such that r1(P) ≠ r2(P) while r1(B) = r2(B);  thus:  B -/-> A,   B -/-> C, and  B -/-> D.

 

(b)  Because the D values are all different, in addition to D -> B  of part (a), we also have D -> A and D -> C.  We state: a KEY (D) functionally determines everything else, which will turn out to be the point.  At the same time D is not func­tionally dependent on anything else since all other attributes have at least two duplicate values, so in addition to B -/-> D  of part (a), we have  A -/-> D, and C -/-> D.

 

(c)  We have A -/-> C (because of rows 1 and 2) and C -/-> A (because of rows 1 and 3).  Therefore, we can list all FDs (and failed FDs) with a single attribute on the left (we provide a letter in parentheses keyed to the paragraph above that give us each fact).

 

        (a)  A -> B         (a)  B -/-> A      (c)  C -/-> A      (b)  D -> A

       (c)  A -/-> C       (a)  B -/-> C      (a)  C -> B        (a)  D -> B

        (b)  A -/-> D       (a)  B -/-> D      (b)  C -/-> D      (b)  D -> C

 

By the union rule, whenever a single attribute on the left functionally determines several other at­tributes, as with D above, we can combine the attributes on the right: D -> A B C. From the analysis so far, we have the following set of FDs (which we believe to be minimal):

 

    (1) A -> B, (2) C -> B, (3) D -> A B C

 

(What is the key for this table?  Note, really D -> A B C D and
D->A B, D->B C, D->A C)

 

Now consider FDs with pairs of attributes on the left

(d)  Any pair containing D determines all other attributes, by FD (3) above and the augmentation rule, so there is no new FD with D on the left that is not already implied.

 

(e) The attribute B combined with any other at­tribute P on the left, still functionally determines only those attributes already determined by P, as we see by the following argument.  If  P -/-> Q  this means there are rows r1 and r2 such that r1(Q) ≠ r2(Q) while r1(P) = r2(P).  But be­cause B has equal values on all rows, we know that r1(B P) = r2(B P) as well, so  B P -/-> Q.  Thus we get no new FDs with B on the left.

 

(f) Now the only pair of attributes that does not contain B or D is A C, and since A C has distinct values on each row (examine table T again!), we know that A C -> A B C D.  This is new, but is it minimal?

 

It is trivial that A C -> A, and A C -> C, and we already knew that A -> B, so it is easy to show that  A C -> B.  (A C -> B C by augmentation,  B C -> B by inclusion, so  A C -> B by transitiv­ity.)  Thus the only new fact we get from seeing that A C -> A B C D is that A C -> D.

 

Now consider looking for FDs with triples of attributes on the left.

Any triple that does not contain D (which would assure that it functionally determine all other attributes) must con­tain A C (and therefore func­tionally determines all other attributes).  So everything with three attributes on the left is already handled.  Clearly the same holds for any set of four attributes on the left.

 

The complete set of FDs implicit in the table T is therefore the follow­ing:

 

(1)  A -> B, (2) C -> B, (3) D -> A B C, (4) A C -> D.

------------------------------------------begin 2/3/03---------------------------------------------------

NOTE:  We have examined above the concept of a FD, and derived some rules that apply to a SET of FDs.   We have used as an example a single table:            Table T

row #

A

B

C

D

    1

a1

b1

c1

d1

    2

a1

b1

c2

d2

    3

a2

b1

c1

d3

    4

a2

b1

c3

d4

That we assume will never change, and reasoned about all the possible FDs that could exist between the columns of this table (one column DETERMINES (is the domain of a function) the value of another if whenever two rows in the domain column happen to have the same value, then the values in the determined column will also be the same.  

 

The first three FDs come from the earlier list of FDs with single at­tributes on the left, while the last FD, A C -> D, is the new one gener­ated with two attributes listed on the left.  It will turn out that this set of FDs is not quite minimal, despite all our efforts to derive a minimal set.  We will see this after we have had a chance to better de­fine what we mean by a min­imal set of FDs.  This is why we need rigor.

 

(1) A -> B, (2) C -> B, (3) D -> A C, (4) A C -> D is minimal!

 

The original set can be derived as follows:  (3) D -> A C is equivalent (by decomposition and union) to (a) D -> A and (b) D -> C.  Furthermore, since (a) D -> A and (1) A -> B, by transitivity (c) D -> B.  Now by (3) and union, D -> A B C.

 

Def 6.6.9.  Closure of a set of FDs.  Given a set F of FDs on attributes of a table T, we define the CLOSURE of F, symbolized by F+, to be the set of all FDs implied by F.

 

E.g., in above, D -> A B C was in the closure of the reduced set of FDs.  The MINIMAL set of FDs is what we want, so take out implied ones!

 

There can be a LOT of FDs in a closure.  Ex. 6.6.5, From F = {A -> B, B -> C, C -> D, D -> E, E -> F, F -> G, G -> H}, we can add trivial dependencies A -> A, B -> B, etc., and by transitivity and union get A -> A B, A -> A B C, . . . In fact any letter -> any subset of letters equal and to it's right in the al­phabet.  C -> C D E F , C -> D F, B -> C D F G.  In fact, if we name any two sets of letters, one of them has an earliest letter in the alphabet and -> the other (maybe they both have that letter and -> each other).  So the number of implied FDs is the number of subsets of let­ters squared, about (2n)2 = 22n.  E.g., from A -> B, B -> C, . . . , J -> K, 10 FDs, get 220 FDs, about a million.  If go up to . . . , X -> Y, Y -> Z, get 252 FDs, about 4,000,000,000,000,000 FDs (four quadrillion).

 

Really only want to arrive at a MINIMAL set of FDs, so we try to avoid this kind of explosion.  Still we need all these concepts.

 

Def. 6.6.10.  FD Set Cover.  A set F of FDs on a table T is said to COVER another set G of FDs on T if the set G can be derived by implication rules from the set F, i.e., if G Í  F+.  If F covers G and G covers F, we say the two sets of FDs are equivalent, F ~ G.

 

Ex. 6.6.6.  F = {B -> C D, A D -> E, B -> A} and  G = {B -> C D E, B -> A B C, A D -> E}.

        Demostrated in book how F covers G.  But also G covers

 

Def.  6.6.11.  Closure of a set of attributes.  Given a set X of attributes in a table T and a set F of FDs on T, we define the CLOSURE of the set X (under F), denoted by X+, as the largest set of attributes Y such that X -> Y is in F+.

 

We will study closure of set of ATTRIBUTES, not closure of set of FDs.

 

 

Pg. 368. Algorithm 6.6.12. Set Closure. Algorithm to determine X+, the clo­sure of a given set of attributes X, under a given set F of FDs.

 

  I = 0;  X[0] = X;            /* integer I, attr. set X[0]      */

  REPEAT                       /* loop to find larger X[I]       */

    I = I + 1;                 /* new I                          */

    X[I] = X[I-1];             /* initialize new X[I]            */

    FOR ALL Z -> W in F        /* loop on all FDs Z -> W in F     */

      IF Z Í X[I]              /* if Z contained in X[I]         */

        THEN X[I]= X[I]UNION W;/* add attributes in W to X[I]    */

    END FOR                    /* end loop on FDs                */

  UNTIL X[I] = X[I-1];         /* loop till no new attributes    */

  RETURN X+ = X[I];                         /* return closure of X                              */

 

Note that the step in Algorithm 6.6.12 that adds attributes to X[I] is based on a simple infer­ence rule that we call the Set Accumulation Rule, stated thus:  If  X -> Y Z  and  Z -> W  then  X -> Y Z W.

 

In our algorithm we are saying that since X -> X[I] (our inductive as­sumption) and  X[I] can be represented as Y Z (because Z ÍX[I]), we can write X -> X[I] as X -> Y Z, and since F contains the FD  Z -> W, we con­clude by the set ac­cumulation rule that X -> Y Z W or in other words X -> X[I] UNION W.

 

Example 6.6.7.  In Example 6.6.6, we were given the set F of FDs:

               F = {B -> C D,  A D -> E,  B -> A}

Given X = B, we determine that X+ = A B C D E.  In terms of Algorithm 6.6.12, we start with X[0] = B.  Then X[1] = B, and we begin to loop through the FDs. Because of B -> C D, X[1] = B C D.  The next FD, A D -> E, does not apply at this time, since A D is not a subset of  X[1].  Next be­cause of B -> A, we get X[1] -> A B C D.

 

Now X[0] is strictly contained in X[1] (i.e., X[I-1] Ì X[I]) so X[I-1] ≠ X[I].  Thus we have made progress in this last pass of the loop and go on to a new pass, setting X[2] = X[1] = A B C D.  Looping through the FDs again, we see that all of them can be applied (we could skip the ones that have been applied before since they will have no new effect), with the only new FD, A D -> E, giving us X[2] = A B C D E.

 

At the end of this loop, the al­gorithm notes that X[1] Ì  X[2], progress has been made, so we go on to create X[3] and loop though the FDs again, ending up this pass with X[3] = X[2].

 

Finding a Minimal Cover of a set of FDs.

 

Algorithm 6.6.13, pg. 369.  BASIC TO NORMALIZATION!  Given a set F of FDs, construct a set M of FDs that is minimal and covers F.

 

Let's apply this to the (non-reduced) set of FDs above.

 

  F:   (1) A -> B, (2) C -> B, (3) D -> A B C, (4) A C -> D

                                                                     ^ Remember, this can come out!!

Step 1,  From the set F of FDs, create an equivalent set H with only single FDs on the right.  Use decomposition rule. 

 

 H:   (1) A -> B, (2) C -> B, (3) D -> A, (4) D -> B, (5) D -> C, (6) A C -> D

 

Step 2.  Remove inessential FDs from the set H to get the set J.  Determine inessential X -> A if A is in X+ under FDs without X -> A.

 

Try removing (1) A -> B, leaving only (2) C -> B, (3) D -> A, (4) D -> B, (5) D -> C, (6) A C -> D.  Take X = A in closure algorithm, clearly get only X+ = A, because no other FD has its left side contained in X.  So need (1).

 

Try removing others.  (2) stays, since no other C on left.  Therefore if set X = C couldn't get X+ to contain more than C. (That reasoning is OK.)

 

How about (3)?  Would be left with only:  (1) A -> B, (2) C -> B, (4) D -> B, (5) D -> C, (6) A C -> D.  Consider X = D.  Get X+ = D B (4) C (5).  Then stop. In fact A not on right of any FDs if take out (3), so (3) needed.

 

Now try removing (4).  Keep:  (1) A -> B, (2) C -> B, (3) D -> A, (5) D -> C, (6) A C -> D.  Can we derive D -> B?  X = D.  Get: D A (3) C (5) B (2).  Therefore, got D -> B.  So can leave (4) out.

 

Can't leave out (5) because C not on right of any other FD.  Can't leave out (6) because D not on right of any other.  Therefore can only reduce set to:

 

J = (1) A -> B, (2) C -> B, (3) D -> A, (4) D -> C, (5) A C -> D (Renumber)

 

Step 3.  Successively replace FDs in J with FDs that have a smaller num­ber of FDs on the left-hand side so long as J+ remains the same.

 

Test this by successively removing single attributes from multi-attribute left hand sides of FDs, changing X -> A to Y -> A, then checking if Y+ under new FD set is unchanged.  (Clearly if we assume Y -> A, and Y Ã X, can de­rive everything used to be able to:  still true that X -> A.  ONLY RISK is that Y -> A might imply TOO MUCH. I.e., might have Y+ is LARGER than before!)

 

Only one to try is (5).  If we change this to A -> D, does D change?  used to be A+ = A B, now, A+ = A B D C.  No good.  How about changing A C -> D to   C -> D?  Does C+ change?  Used to be C+ = C B.  Now C+ = C B D A.  So no good, can't reduce.  (NO NEED TO TRY STEP 2 AGAIN.)

 

IF WE DID REDUCE and created a new FD, let us say A -> D to replace A C -> D, we would need to apply Step 2 again to test if A -> D could be removed!!!

 

Step 4.  Apply Union rules to bring things back together on the right for common sets of attributes on the left of FDs, renamed M.

 

J:  (1) A -> B, (2) C -> B, (3) D -> A, (4) D -> C, (5) A C -> D

 

M:  (1) A -> B, (2) C -> B, (3) D -> A C, (4) A C -> D

 

This is the reduced set, above.

 

Section 6.7. Lossy and Lossless decomposition.  We're going to be factoring tables into smaller tables (projecting onto two subsets of columns that cover all columns and have some columns in common), but it doesn't al­ways work when join back that keep all information of original table.

 

Always get ALL rows back, but might get MORE.  Lose Information.  See Example 6.7.1 in text, Pg. 374, a Lossy decomposition.

 

Ex 6.7.1.  A Lossy Decomposition.  Consider table, ABC:

 

   Table ABC

A

B

C

a1

100

c1

a2

200

c2

a3

300

c3

a4

200

c4

 

If we factor this table into two parts, AB and BC, we get the following table contents:

 

   Table AB          Table BC

A

B

 

B

C

a1

100

 

100

c1

a2

200

 

200

c2

a3

300

 

300

c3

a4

200

 

200

c4

 

However, the result of joining these two tables is

 

    AB JOIN BC

A

B

C

a1

100

c1

a2

200

c2

a2

200

c4

a3

300

c3

a4

200

c2

a4

200

c4

 

This is NOT the original table content for ABC!  Note that the same decomposed tables AB and BC would have arisen if the table we had started with was ABCX, with content equal to AB JOIN BC above, or either of two other tables, ABCY or ABCZ:

 

    ABCY                 ABCZ

A

B

C

 

A

B

C

a1

100

c1

 

a1

100

c1

a2

200

c2

 

a2

200

c2

a2

200

c4

 

a3

300

c3

a3

300

c3

 

a4

200

c2

a4

200

c4

 

a4

200

c4

 

Since we can't tell what table content we started from, information has been lost by this de­composition and the subsequent join.

 

This is known as a Lossy Decomposition, or sometimes a Lossy-Join Decomposition.

 

Reason we lose information is many to many matching on join columns.  Lose which one on the left was with which one on the right.  E.g. a2, 200 on left and a4, 200 on left EACH match with 200, c2  and 200, c4 on right in a join, but all for combinations were not part of the original table.   Would be OK if always had N-1 relationships on join columns (or 1-1).  E.g., CAP, orders can have multiple rows with same pid, but pid is unique for products.  Must ALWAYS be unique on one side, not an accident, so need join columns to be SUPERKEY on at least ONE SIDE of join.

 

Theorem 6.7.3.  Given a table T with a set of FDs F and a set of attributes X in Head(T) then X is a superkey of T iff X functionally determines all attributes in T, i.e., X -> Head(T) is in F+.

 

Theorem 6.4.7.  Given a table T with a set F of FDs valid on T, then a de­composition of T into two tables {T1, T2} is a lossless decomposition if one of the following functional dependencies is implied by F:

 

(1)  Head(T1) Ç Head(T2) -> Head(T1), or

 

(2)  Head(T1) Ç Head(T2) -> Head(T2)

 

Note that in the first example Head(T1) Ç Head(T2) = B, but B-/-> Head(T1), and B-/-> Head(T2),

 

 

Section 6.8.  Normal Forms

 

We start with a Universal table T and a set of FDs that hold on T.  We then create a lossless decompostion:

 

 T = T1 JOIN T2 JOIN . . . JOIN Tn

 

so that in each of the tables Ti the anomalies we studied earlier not pre­sent.  Consider the following Universal table called emp_info:

 

emp_id,  emp_name,  emp_phone,   dept_name,   dept_phone,   dept_mgrname,   skill_id,   skill_name, skill_date,   skill_lvl

1234        abcdefgh     12345               pqrstu            45678              tuvwxyz                jkl246      mnopqur     012302        5

 

(1) emp_id -> emp_name   emp_phone   dept_name    [in every row with emp_id, name, phone, and dept will have the same value]

(2) dept_name -> dept_phone   dept_mgrname  [in every row with dept_name, phone and dept_mgrname will have the same value]

(3) skill_id -> skill_name

(4) emp_id skill_id -> skill_date   skill_lvl

 

In what follows, we will factor emp_info into smaller tables which form a lossless join.  A set of table headings (decomposition tables) together with a set of FDs on those heading attributes is called a Database Schema.

 

In emp_info, if delete record of only employee with given skill_id, lose information about that skill (in this case, the name of the skill).  So have to have sepa­rate table for skills. 

What we are really saying here is that emp_id skill_id is the key for emp_info, but a SUBSET of this key functionally determines some skills attributes.  This is to be avoided because of the delete anomaly.  Similarly, bunch of attributes are functionally determined by emp_id (including a bunch of dept information by transitivity).  These attributes also don't belong in emp_info with key of two attributes. 

 

There's a cute way of saying what FDs should appear in a normalized table:  every attribute must be determined by "the key, the whole key, and nothing but the key."  Need separate table for skills because of "the whole key".

 

Now have (pg. 385):  emps table (emp_id, emp_name, emp_phone, dept_name, dept_phone, dept_mgrname),

skills table (skill_id, skill_name), and

emp_skills table (emp_id, skill_id, skill_date, skill_lvl).

 

But, if delete only employee in a department (during reor­ganization), lose info about the department, such as departmental phone number.

 

Problem here is that dept_phone, for example, is dependent on dept_name.  So fail on "nothing but the key".

 

This is what is known as a "transitive dependency" or "transitive FD", and cannot exist in a well-behave table with no delete anomaly.

 

OK, now Figure 6.26, pg. 386: emps table, depts table, emp_skills table, skills table.  Three entities and a relationship.  Note that there is one table for each FD.  Not an accident.

 

BCNF, 3NF, 2NF

 

2NF is too weak, no DBA would stop there.  Essentially, means stopping with depts in emps table.  Define near end of section as note.

 

BCNF definition is more intuitive than 3NF, but 3NF and BCNF are equiva­lent for the schema we have just been dealing with.  Need special situa­tion for difference to arise.

 

Def. 6.8.4.  Boyce-Codd Normal Form, or BCNF.  A table T in a database schema with FD set F is in BCNF iff, for any FD X -> A in F+ that lies in T (all attributes of X and A in T), A is a single attribute not in X, then X must be a superkey for T.

 

OR: For a non-trivial FD X -> Y where X is minimal, X must be the key for T.

 

The minimal cover for the FDs whose attributes lie in F are all of the form X -> A, where X is a key for T.  Note NOT saying it is a PRIMARY key — might be more than one key.  Every attribute in T is determined by a key (definition), the whole key (no subset of it) and noth­ing but the key (any FD determining A turns out to contain a key).

 

OK, now assume add attributes to emps table attributes to keep track of the street address (emp_straddr), the city and state (emp_cityst) and ZIPCODE (emp_zip).  These are all determined by emp_id as now:

 

(1) emp_id -> emp_name emp_phone dept_name emp_straddr emp_cityst emp_zip  (every employee has only one mailing address).

 

Now the way the PO has assigned ZIPCODES, turn out have new FDs:

 

(5)  emp_cityst emp_straddr -> emp_zip

(6)  emp_zip -> emp_cityst                              

 

Consider the extended emps table with these added attributes:

 

emps table:  emp_id   emp_name   emp_phone   dept_name   emp_straddr   emp_cityst   emp_zip 

 

This still has key emp_id, but now there are FDs that are NOT determined solely by the key so emps must be factored while all other tables remain the same:

 

emps table:  emp_id   emp_name   emp_phone   dept_name   emp_straddr   emp_cityst 

 

empadds table:  emp_straddr   emp_cityst   emp_zip

 

Note emp_cityst, emp_straddr is key for empadds and in intersection of factored tables (it is a foreign key in emps that determines the address held in empadds), so lossless.

 

But FD (6) still isn't derived from key in empadds.  So for BCNF, empadds must factor further (emps same)

 

empadds table:  emp_zip    emp_straddr

 

zip table:  emp_zip   emp_cityst

 

Now empadds has a key consisting of both columns (there is no FD whose left-hand side is in empadds).  zip table has emp_zip as key (FD (6)). 

 

PROBLEM:  FD 5 doesn't lie in any table.  But we always try to "preserve" each FD X -> Y, meaning that X union Y lies in the head of some Ti.  This allows us (for example) to update the street address, city, & state for an employee and test in one table that ZIP was entered correctly.

 

Thus BCNF has a bad property, and we want to fall back to design of previous database schema.  Need a new normal form.

 

Def. 6.8.5.  A PRIME ATTRIBUTE of a table T is any attribute that is part of a key for that table (not necessarily a primary key).

 

Def. 6.8.6.  Third Normal Form (3NF).  A table T in a database schema with FD set F is in 3NF iff, for any FD X -> A implied by F that lies in T, if A is a single non-prime attribute not in X, then X must be a superkey for T (BCNF with an escape clause, that can allow the implied condition, X must be a superkey of T, to fail if A is a prime attribute.)

 

OK, now with weakened 3NF and two FDs:

 

(5)  emp_cityst emp_straddr -> emp_zip

(6)  emp_zip -> emp_cityst

 

Can have all three attributes in a single empadds table, since emp_cityst emp_straddr is the key, and FD (6) doesn't break the rule for X -> A be­cause A = emp_cityst is a prime attribute.

 

Now have: every non-prime attribute is determined by the key, the whole key, and nothing but the key.

 

OK, Algorithm 6.8.8 to achieved well behaved 3NF decomposition, pg 393

 

Given a universal table T and a set of FDs F, generate a set S of headings for a database schema in 3NF that is a lossless decomposition and pre­serves all FDs in F.

 

Start by finding minimal cover (Algorithm 6.6.13).  Then loop on all FDs, and for every FD, if there is not already a table heading that contains the FD, create a new heading (new table).  Finally, if there is a key K for T that is not in any table, create a new table that contains it.

 

Def. 6.8.8.  Second Normal Form (2NF).  A table T with FD set F is in 2NF iff:  for any X -> A implied by F that lies in T, where A is a single non-prime attribute not in X, X is not properly contained in any key of T.

 

A database schema is 2NF iff all of its tables are 2NF.

 

 

EXAMPLE Assume we wish to construct a database from a set of data items, {A, B, C, D, E, F, G, H}  (which will become attributes in tables), and a set F of FDs given by: 

 

F:   (1)  A -> B C ,  (2) A B E -> C D G H, (3) C -> G D,  (4) D -> G, (5) E -> F

 

(a) Find the minimal cover for this set of FDs, and name this set G.

 

Step 1.  Decompose rhs

H = (1) A->B, (2) A->C, (3) ABE->C, (4) ABE->D, (5) ABE->G, (6) ABE->H, (7) C->G, (8) C->D, (9) D->G, (10) E->F

 

Step 2.  Test extraneous dependences

(1) A->B: J = rest, X+ = AC(2)G(5)D(8), not containing B, so keep.

(2) A->C, J = rest, X+ = A, keep.

(3) ABE->C, X+ = ABEC(2)D(4)G(5)F(6), containing C, drop.

J: (1) A->B, (2) A->C, (4) ABE->D, (5) ABE->G, (6) ABE->H, (7) C->G, (8) C->D, (9) D->G, (10) E->F

(4) ABE->D: X+ = ABEC(2)G(5)D(8)F(10), containing D, drop.

J: (1) A->B, (2) A->C, (5) ABE->G, (6) ABE->H, (7) C->G, (8) C->D, (9) D->G, (10) E->F

(5) ABE->G: X+ = ABEC(2)G(7)D(8)F(10)G, containing G, drop. 

J: (1) A->B, (2) A->C, (6) ABE->H, (7) C->G, (8) C->D, (9) D->G, (10) E->F

(6) ABE->H: X+ = ABEC(2)G(7)D(8)F(10), not containing  H, keep.

(7) C->G: X+ = CD(8)G(9) containing G, drop.

J: (1) A->B, (2) A->C, (6) ABE->H, (8) C->D, (9) D->G, (10) E->F

(8) C->D: X+ = C, keep.

(9) D->G: X+ = D, keep.

(10) E->F: x+ = E, keep.

 

Result J: (1) A->B, (2) A->C, (3) ABE->H, (4) C->D, (5) D->G, (6) E->F

 

Step 3. Try to drop something from lhs of (3) ABE->H:

AB-> H instead in J':  AB+ under J' = ABC(2)H(3)D(4)G(5)

AB+  under J = ABC(2)D(4)G(5)  Don't get H. Not the same.

AE->H instead in J': AE+ under J' = AEB(1)C(2)H(3)D(4)G(5)F(6),

AE+ under J = AEB(1)C(2)H(3)D(4)G(5)F(10).  Same.

 

Rename J new H: (1) A->B, (2) A->C, (3) AE->H, (4) C->D, (5) D->G, (6) E->F.

Must repeat Step 2.

(1) A->B.  Without (1) no B on rhs.

(2) A->C.  Without (2) no C on rhs.

(3) AE->H.  Without (3) no H on rhs.

(4) C->D.  Without (4) no D on rhs.

(5) D->G.  Without (5) no G on rhs.

(6) E->F.  Without (6) no F on rhs.

Step 3.  cannot replace AE->H  with A->H or E->H

Step 4.  Union rhs:  M: (1) A->BC, (2) AE->H, (3) C->D, (4) D->G, (6) E->F

 

(b) Start with the table T containing all these at­tributes, and perform a lossless decomposition into a 2NF but not 3NF schema.

 

    M: (1) A->BC, (2) AE->H, (3) C->D, (4) D->G, (5) E->F

 

T = (A, B, C, D, E, F, G, H).  Since A and E only occur on lhs of FDs in M, AE must be in any key, and AE+ = AEBC(1)H(2)D(3)G(4)F(5) = ABCDEFGH, so AE is the key for T. 

FD (5), E->F, constitutes a key-proper-subset dependency, and thus needs decomposition for 2NF.  E+ = EF(5), so let T1 = (E,F) and  T2 = (A,B,C,D,E,G,H) and  Then E is the key for T1 and AE is the key for T2.  (Note we leave Key in old table for intersection, but take out dependent attribute that would cause bad FD to exist, E->F.)

 

Similarly (1) A->BC is also a key-proper-subset dependency, and A+ = ABC(1)D(3)G(4), so we further decompose: T1= (E,F), T2 = (A,E,H), T3 = (A,B,C,D,G). 

 

Now FD E->F lies in T1, FD AE->H lies in T2, and FDs A->BC, C->D and D->G lie in T3.   C->D and D->G are transitive dependencies, allowed in 2NF, so we go no further here.  These are lossless decompositions by Theorem 6.7.4.  In the first, we note that head(T1) INTERSECT head(T2) = E, the key for T1. In the second, head(T2) INTERSECT head(T3) = A, the key for T3.  The idea here is that A and E are "dimensions" on which other attributes are dependent: need AE, A, and E all separate if there are dependencies in all cases.

 

(c)  Continue decomposition to bring this database to 3NF.  Is this table also BCNF?

 

To complete the decomposition to 3NF, we need to factor out the transitive dependencies.  Have T3 = (A,B,C,D,G), with FDs: (3) C->D and (4) D->G.  Change T3 to (A,B,C) and create new tables T4 = (C,D) and T5=(D,G).  Now we have one FD contained in each table.  In each case the lhs is a key for its table, so the schema is in BCNF.  (Don't have to worry about FDs with non-prime lhs.)

 

(d)  Use Algorithm 6.8.8 and the set G of FDs to achieve a lossless 3NF de­composition that preserves FDs of G. I

 

S = nullset.  Loop through all the FDs of M:

M = { A->BC, AE->H, C->D, D->G, E->F}

(Loop through FDs, and if not already contained in table of S, create new)

A->BC: S = {ABC}.

AE->H:  AEH not contained in ABC, so add it:  S={ABC,AEH}

C->D:  CD not contained in either yet, so add it: S= {ABC,AEH,CD}

D->G:  DG not contained in any yet, so add it: S = {ABC,AEH,CD,DG}

E->F:  EF not contained in any yet, so add it: S = {ABC,AEH,CD,DG,EF}

AE is the only candidate key, and it is contained in AEH, so done.

Is this is the same decomposition as in c?