B-trees are balanced search trees designed to work well on magnetic disks or other direct-access secondary storage devices. B-trees are similar to red-black trees (Chapter 14), but they are better at minimizing disk I/O operations.

Data structures on secondary storage

There are many different technologies available for providing memory capacity in a computer system. The * primary memory* (or

Figure 19.2 shows a typical disk drive. The disk surface is covered with a magnetizable material. The read/write head can read or write data magnetically on the rotating disk surface. The read/write arm can position the head at different distances from the center of the disk. When the head is stationary, the surface that passes underneath it is called a **track****.** Theinformation stored on each track is often divided into a fixed number of equal-sized * pages*; for a typical disk, a page might be 2048 bytes in length.The basic unit of information storage and retrieval is usually a page of information--that is, disk reads and writes are typically of entire pages.The

the number of disk accesses, and

We model disk operations in our pseudocode as follows. Let *x* be a pointer to an object. If the object is currently in the computer's main memory, then we can refer to the fields of the object as usual: *key*[*x*], for example. If the object referred to by *x* resides on disk, however, then we must perform the operation DISK-READ(*x*) to read object *x* into main memory before its fields can be referred to. (We assume that if *x* is already in main memory, then DISK-READ(*x*) requires no disk accesses; it is a "noop.") Similarly, the operation DISK-WRITE(*x*) is used to save any changes that have been made to the fields of object *x*. That is, the typical pattern for working with an object is as follows.

1 . . .

2xa pointer to some object

3 DISK-READ(x)

4 operations that access and/or modify the fields ofx

5 DISK-WRITE(x) Omitted if no fields ofxwere changed.

6 other operations that access but do not modify fields ofx

7 ...

To keep things simple, we assume, as we have for binary search trees and red-black trees, that any "satellite information" associated with a key is stored in the same node as the key. In practice, one might actually store with each key just a pointer to another disk page containing the satellite information for that key. The pseudocode in this chapter implicitly assumes that the satellite information associated with a key, or the pointer to such satellite information, travels with the key whenever the key is moved from node to node. Another commonly used B-tree organization stores all the satellite information in the leaves and only stores keys and child pointers in the internal nodes, thus maximizing the branching factor of the internal nodes.

A **B-tree***T* is a rooted tree (with root *root*[*T*]) having the following properties.

1. Every node *x* has the following fields:

a. *n*[*x*], the number of keys currently stored in node *x*,

b. the *n*[*x*] keys themselves, stored in nondecreasing order: *key*_{1}[*x*] *key _{2}*[

c. *leaf *[*x*], a boolean value that is TRUE if *x* is a leaf and FALSE if *x* is an internal node.

k_{1}key_{1}[x]k_{2}key_{2}[x] key_{n[x]}[x]k[_{n}x]+1 .

*4. *Every leaf has the same depth, which is the tree's height* h*.

5. There are lower and upper bounds on the number of keys a node can contain. These bounds can be expressed in terms of a fixed integer *t* 2 called the * minimum degree* of the B-tree:

b. Every node can contain at most 2*t* - 1 keys. Therefore, an internal node can have at most 2*t* children. We say that a node is * full* if it contains exactly 2

The simplest B-tree occurs when *t* = 2. Every internal node then has either 2, 3, or 4 children, and we have a **2-3-4*** tree.* In practice, however, much larger values of

The number of disk accesses required for most operations on a B-tree is proportional to the height of the B-tree. We now analyze the worst-case height of a B-tree.

If *n* 1, then for any *n*-key B-tree *T* of height *h* and minimum degree *t* 2,

Here we see the power of B-trees, as compared to red-black trees. Although the height of the tree grows as *O*(lg *n*) in both cases (recall that *t* is a constant), for B-trees the base of the logarithm can be many times larger. Thus, B-trees save a factor of about lg *t* over red-black trees in the number of nodes examined for most tree operations. Since examining an arbitrary node in a tree usually requires a disk access, the number of disk accesses is substantially reduced.

Why don't we allow a minimum degree of *t* = 1?

For what values of *t* is the tree of Figure 19.1 a legal B-tree?

Show all legal B-trees of minimum degree 2 that represent {1, 2, 3, 4, 5}.

Describe the data structure that would result if each black node in a red-black tree were to absorb its red children, incorporating their children with its own.

Searching a B-tree is much like searching a binary search tree, except that instead of making a binary, or "two-way," branching decision at each node, we make a multiway branching decision according to the number of the node's children. More precisely, at each internal node *x*, we make an (*n*[*x*] + 1)-way branching decision.

B-TREE-SEARCH(x, k)

1i1

2whilein[x] andkkey[_{i}x]

3doii+ 1

4ifin[x] andk = key[_{i}x]

5then return(x, i)

6ifleaf[x]

7then returnNIL

8elseDISK-READ(c[_{i}x])

9returnB-TREE-SEARCH(c[_{i}x], k)

As in the TREE-SEARCH procedure for binary search trees, the nodes encountered during the recursion form a path downward from the root of the tree. The number of disk pages accessed by B-TREE-SEARCH is therefore (*h*) = (log_{t}*n*), where *h* is the height of the B-tree and *n* is the number of keys in the B-tree. Since *n*[*x*] < 2*t*, the time taken by the **while** loop of lines 2-3 within each node is *O*(*t*), and the total CPU time* *is* O*(*th*) = *O*(*t* log* _{t} n*).

To build a B-tree *T*, we first use B-TREE-CREATE to create an empty root node and then call B-TREE-INSERT to add new keys. Both of these procedures use an auxiliary procedure ALLOCATE-NODE, which allocates one disk page to be used as a new node in *O*(1) time. We can assume that a node created by ALLOCATE-NODE requires no DISK-READ, since there is as yet no useful information stored on the disk for that node.

B-TREE-CREATE(T)

1xALLOCATE-NODE()

2leaf[x] TRUE

3n[x] 0

4 DISK-WRITE(x)

5root[T]x

B-TREE-CREATE requires *O*(1) disk operations and *O*(1) CPU time.

Inserting a key into a B-tree is significantly more complicated than inserting a key into a binary search tree. A fundamental operation used during insertion is the * splitting* of a full node

B-TREE-SPLIT-CHILD(x,i,y)

1 z ALLOCATE-NODE()

2 leaf[z] leaf[y]

3n[z] t -1

4forj1tot-1

5dokey[^{j}z]key^{j}_{+}[^{t}y]

6ifnotleaf[y]

7then forj1tot

8doc_{j}[z] c_{j+t}[y]

9n[y] t -1

10forjn[x] + 1idownto+ 1

11doc_{j+1}[x] c_{j}[x]

12c+1[_{i}x] z

13forjn[x]downtoi

14dokey_{j+1}[x] key_{j}[x]

15key[_{i}x] key_{t}[y]

16n[x]n[x] + 1

17 DISK-WRITE(y)

18 DISK-WRITE(z)

19 DISK-WRITE(x)

Inserting a key *k* into a B-tree* T* of height *h* is done in a single pass down the tree, requiring *O(h*) disk accesses. The CPU time required is *O*(*th*)* = O*(*t *log* _{t} n*). The B-TREE-INSERT procedure uses B-TREE-SPLIT-CHILD to guarantee that the recursion never descends to a full node.

B-TREE-INSERT(T,k)

1rroot[T]

2ifn[r] = 2t- 1

3thensALLOCATE-NODE()

4root[T]s

5leaf[s] FALSE

6n[s] 0

7c^{1}[s]r

8 B-TREE-SPLIT-CHILD(s,1,r)

9 B-TREE-INSERT-NONFULL(s,k)

10elseB-TREE-INSERT-NONFULL(r,k)

B-TREE-INSERT-NONFULL(x,k)

1in[x]

2ifleaf[x]

3then whilei1 andk<key[^{i}x]

4dokey+1 [_{i}x]key[^{i}x]

5ii- 1

6key+1[_{i}x]k

7n[x] n[x] + 1

8 DISK-WRITE(x)

9else whilei1 andk<key[^{i}x]

10doii- 1

11ii+ 1

12 DISK-READ(c[^{i}x])

13ifn[c[^{i}x]] = 2t- 1

14thenB-TREE-SPLIT-CHILD(x,i,c[^{i}x])

15ifk>key[^{i}x]

16thenii+ 1

17 B-TREE-INSERT-NONFULL(c[^{i}x],k)

Show the results of inserting the keys

F, S, Q, K, C, L, H, T, V, W, M, R, N, P, A, B, X, Y, D, Z, E

Explain how to find the minimum key stored in a B-tree and how to find the predecessor of a given key stored in a B-tree.

Suppose that B-TREE-SEARCH is implemented to use binary search rather than linear search within each node. Show that this makes the CPU time required *O*(lg *n*), independently of how *t* might be chosen as a function of *n*.

Deletion from a B-tree is analogous to insertion but a little more complicated. We sketch how it works instead of presenting the complete pseudocode.

Assume that procedure B-TREE-DELETE is asked to delete the key *k *from the subtree rooted at *x*. This procedure is structured to guarantee that whenever B-TREE-DELETE is called recursively on a node *x*, the number of keys in *x* is at least the minimum degree *t*. Note that this condition requires one more key than the minimum required by the usual B-tree conditions, so that sometimes a key may have to be moved into a child node before recursion descends to that child. This strengthened condition allows us to delete a key from the tree in one downward pass without having to "back up" (with one exception, which we'll explain). The following specification for deletion from a B-tree should be interpreted with the understanding that if it ever happens that the root node *x* becomes an internal node having no keys, then *x* is deleted and *x'*s only child *c*_{1}[*x*] becomes the new root of the tree, decreasing the height of the tree by one and preserving the property that the root of the tree contains at least one key (unless the tree is empty).

Figure 19.8 illustrates the various cases of deleting keys from a B-tree.

1. If the key *k* is in node *x* and *x* is a leaf, delete the key *k* from *x*.

2. If the key *k* is in node *x* and *x* is an internal node, do the following.

Show the results of deleting *C, P*, and *V*, in order, from the tree of Figure 19.8(f).

Write pseudocode for B-TREE-DELETE.

19-1 Stacks on secondary storage

Consider implementing a stack in a computer that has a relatively small amount of fast primary memory and a relatively large amount of slower disk storage. The operations PUSH and POP are supported on single-word values. The stack we wish to support can grow to be much larger than can fit in memory, and thus most of it must be stored on disk.

A simple, but inefficient, stack implementation keeps the entire stack on disk. We maintain in memory a stack pointer, which is the disk address of the top element on the stack. If the pointer has value *p*, the top element is the (*p* mod *m*)th word on page *p/m* of the disk, where *m* is the number of words per page.

* d. *Describe how to manage the stack pages so that the amortized number of disk accesses for any stack operation is

19-2 Joining and splitting 2-3-4 trees

The * join* operation takes two dynamic sets

In 1970, J. E. Hopcroft invented 2-3 trees, a precursor to B-trees and 2-3-4 trees, in which every internal node has either two or three children. B-trees were introduced by Bayer and McCreight in 1972 [18]; they did not explain their choice of name.