Submitting:

  • All homeworks should be turned in electronically to myCourses.
  • Paper submissions may be scanned in if legible.
  • If you take a photo of your paper solution, please take the photo outside or close to a bright window, for legibility.

Due dates:

  • Homeworks are due by 10pm on the due date (Thu).
  • No late submissions are accepted.
  • If there are extenuating circumstances, contact the instructor as soon as possible to make arrangements.

References:

  • If you find a resource helpful (from internet search, blog post, Stack Overflow, online courses, our lecture notes, etc), be sure to cite it. Citation format doesn’t matter, but make it easy for the TA to find the source: e.g., a clickable link.
  • Code / pseudocode should be your own work. If you find code online, understand how it works, then go write up your solution on your own.
  • The basic machine model was discussed in Lecture 1. Don’t use library functions which obviate the bulk of a question. E.g., if you’re asked to implement a sort, don’t use Python’s sorted().

Programming assignments:

  • Please submit all your code, either separately uploaded or in a ZIP (or tar) file. Up to 20 files (20MB total) may be uploaded to myCourses.
  • Make it easy for the TA to recompile and run it if needed.
  • Include a short, informal write-up describing your code. Make it easy for the TA to understand your code.
  • Include extensive test cases (valid and invalid input, simple and complex) demonstrating each aspect of your code. Document the purpose of each test case, and include a demo with actual output.
  • You are encouraged but not required to use revision control like Git and Github.

HW6 (20pts) (solutions)

  1. Consider an empty B-tree of t=2.
    • (a) (3pts) Demonstrate insertion of the following character keys in order:
      D H T L S M V X K Z Q
    • (b) (1pt) How many splits were needed in part (a)?
    • (c) (1pt) Show the tree after deleting M.
  2. (2pts) Derive the maximum number of keys that can be stored in a B-tree (including internal nodes, not a B+-tree), in terms of t and the height h. Show your work.
  3. In the Linux BTRFS filesystem (default configuration), each node is 4KB = 4096 bytes (the block size). Each key takes 20 bytes, and each pointer to a child node is 8 bytes.
    • (a) (1pt) What is the largest value of t for a B-tree with such nodes? (In reality, the header also takes up some space, but you can ignore that.)
    • (b) (1pt) A 3-bit field tracks the height of each node, which means the B-tree can have at most 8 levels. Approximately how many keys can be stored in such a B-tree?
    • (c) (1pt) Compute the maximum number of key comparisons that need to be done in order to find a key in such a B-tree. How many disk accesses (fetching nodes/blocks from disk)?
  4. Programming Project: Implement a B-tree for integer keys. Use pre-emptive split/merge as described in lecture. You may limit yourself to t ≤ 10 if it makes things easier. The policies stated above for programming assignments apply. Your code should have:
    • (a) (2pts) Class definitions and constructors for a single node and for the tree as a whole. t should be a parameter the user can set when instantiating the tree.
    • (b) (3pts) insert( key ): the return value should be the number of disk accesses needed (excluding the root node). If the key already exists, don’t modify the tree.
    • (c) (1pt) size(): return the number of keys currently stored in the tree.
    • (d) (3pts) Pre-order and in-order tree traversals to print out the tree. (In-order traversal should print the keys in order.) You may use these to verify insertion works correctly.
    • (e) (2pts) Extra credit (optional): delete( key ), returning the number of disk accesses. Handle non-existent key gracefully.