[8.1](c) 20480 + (1 + 2 + 4 + 8 + 16 + 32 + 64) * 20480 = 2,621,440 bytes in 8 extents.  The initial and first next were both set at the same size, and the 6 following next extents doubled.

 

[8.4] (b) On all index nodes we have 0.80 * 2048 bytes/page = 1638 usable bytes/page. 6 byte rowid + 1 byte overhead + 4 bytes eid = 11 bytes/entry. FLOOR(1600bytes/page / 11 bytes/entry) = 145 entries/page. CEIL(200,000 entries / 145 entries/page) = 1380 leaf pages.

(c) At next level up, we need 1380 entries to distinguish the leaves.It is acceptable to assume the same size of entry at this level, although in actuality the np pointing to nodes below will probably be shorter than a ROWID at the leaf level. With this assumption there will still be 145 entries per node page and CEIL(1380 / 145) gives us 10 nodes at the next level up. The level above that will be a root.

(e) Since the rows will be clustered, we can fit 10000 rows at about 15 rows per page (see part (a)) on about CEIL(10,000/15) = 667 pages. Recalculating part (d), we get the number of disk accesses as: (1 + 1 + 68 + 667) = 69 index accesses = 737 accesses. At 80 accesses/second, this will require about 737/80 = 9.2 seconds.

[8.7) (a) 80% of 2000 is 1600 bytes usable, 1600/200 = 8 rows/page. 400,000/8 = 50,000 pages.

(c) In the higher levels, we would have separator keyvals, here 7 bytes, and node pointers, disk pointers just like rowids, so these entries are basically the same size as leaf-level entries. (If you calculated np length at 4 bytes, that's OK too.) At 108 entries/page, we have 3704/108 = 35 pages. The next level up would be the root page, with 35 entries.

(d) We will access the root page, plus 1/20 of the 35 directory level pages, so about 2 of them, and 1/20 of the leaf level index pages, 3704/20 = 186 pages. Then 1/20 of the data records, 400000/20 = 20000 rows out of 50000 pages. Without buffering, it doesn't help that some of these rows will be on the same page, since almost always other accesses will come between the colliding accesses. Thus there are 20000 disk I/Os to data pages. The total is 20189 disk page accesses.

This calculation assumes that the scan progresses along the directory level index pages as well as the leaf-level, but in fact with sibling pointers at leaf level, this is not necessary. The chapter text does not state that Oracle has sibling pointers, but in fact it does. Thus only one directory page at each level needs to be accessed, and the total count is 185 pages.

(f) This is an index-only query. Thus only 189 page I/Os are needed to bring in the relevant index entries.

(8.8) (a) 20 slots/page means SIZE = 2000/20 = 100, assuming the usual assumed page header size for Oracle's 2K page. The hash table is HASHKEYS*SIZE in size, held in the usable parts of various pages, so HASHKEYS*100 = 100,000*2000, or HASHKEYS = 100,000*20 = 2,000,000 = number of slots in the hash table. In fact, the assumed usable storage on a page (2000 bytes) cancels out in this calculation. We may simply say that each of the 100,000 pages has 20 slots, so that the whole hash table has 100,000*20 slots.

(b) (Requires probability theory.) Each row has a probability of p = 1/100000 of landing in a particular page. The total number of hits among n=1,000,000 inserts has a binomial distribution with an expected value of n*p = 10, and a standard deviation of sqrt(n*p*(1-p)) = 3.1, so 21 = 10 + 3.5*3.1 represents about 3.5 standard deviations away from the mean. F(3.5) = .9998, so there is only a .02% chance that an individual slot would have a collision list with 21 or more entries.

(c) (i) Yes, since the page cannot hold 21 rows.

(ii) No, a page may overflow because of the total number of rows for all its slots. For example with 20 slots/page and 1 row/slot, there might be 10 rows for each of 3 hash values, a total of 30 rows for the page.

 

[8.9] (b) Here N = M = 16384 = 214.

E(S) = 16384 * (1 - .999939016384)= 16384 * (1-.3681) = 10,353.

Using 8.6.4,

16384*(1 - exp(-1)) = 16384 * (1-.3679) = 10357.

Yes, the numbers are very close, within .1%. Yes, the fraction hit is (1-exp(-1)) to a very good approximation.

  1. Here N=30,000, M=10,000.

E(slots not hit) = 10000 * exp(-30000/10000) = 10000 * exp(-3) = 10000*.0498 = 498.