HARVARD CS124
DATA STRUCTURES AND ALGORITHMS
MIDTERM EXAMINATION
Spring 1997
Exam Date: March 17, 1997
Suggested Solutions

This is a CLOSED BOOK exam. You have 80 minutes to answer the following questions in any order. The number in brackets is the estimated time in minutes to answer the question and is the maximum score for that question. The highest grade for the exam is 80.

GIVE REASONS FOR ALL OF YOUR ANSWERS. CORRECT ANSWERS WITHOUT REASONS WILL NOT RECEIVE FULL CREDIT. FOR PROBLEMS WHICH ASK YOU TO GIVE TWO OR MORE ANSWERS, EACH ADDITIONAL CORRECT ANSWER YOU PROVIDE WILL EARN CREDIT OF INCREASING WEIGHT.


1. [12] An objective of CS124 has been to learn algorithm design principles, each applicable to a wide class of algorithms. For example, the divide-and-conquer principle applies to merge sort, Strassen's fast matrix multiplication, and many other algorithms. List three additional principles that you have learned from the class so far. You need to name two algorithm examples for each of these principles.

  1. Randomization: randomized quicksort, universal hashing
  2. Dynamic programming: LCS and matrix chain multiplication
  3. Approximation: Approximately balanced trees (red-black trees), approximate medium for selection algorithm
  4. Change of representation: Decision tree for sorting, array for heap tree


2. [10] The order statistics problem is to find the ith smallest of n elements for any i. The median finding problem is a special case of the order statistics problem when i = n/2 or (n+1)/2. Show that any fast algorithm for the median finding problem can actually lead to a fast algorithm for the order statistics problem. In particular, show that if there is an algorithm which can find the median in time c*n, then there must be an algorithm which, for any i, can find the ith smallest element in time 2(c+d)*n, where d*n is the time to partition n elements given any pivot element.

Using the fast selection algorithm, we have T(n) = T(n/2) + (c+d)*n. Solving this recurrence gives us T(n) = 2(c+d)*n.


3. [6] Heap and hashing are two well-known data structures for maintaining a dynamic set of elements. Describe an application scenario where you should use heap instead of hashing, and another scenario where you should use hashing instead of heap.

If the elements obey an ordering relationship, and we want to find min or max, we should use heap. If we need to search for any element, not just min or max, we should use hash.


4. [10] Encouraged by the nice amortized cost analysis showing that a heap of n elements can be built in O(n) time, your friend, Bob, has designed an algorithm of building a binary search tree from any given set of n elements, and claims that his algorithm takes only O(n) comparisons. Your other friend, John, says that Bob's claim must be flawed because any comparison-based sorting algorithm takes at least Omega(n lg n) comparisons. Is John correct?
Answer: Yes, John is correct. If Bob's claim were true, then his tree building algorithm followed by an inorder tree walk would be able to sort n elements with only O(n) comparisons.


5. [10] Your friend Bob does not sleep and strikes again. He now claims that he has another algorithm that can print out n keys stored in an n-node heap in sorted order, using only O(n) comparisons. Prove that Bob's algorithm must be wrong. Answer: Suppose Bob's algorithm were right. Then we could sort n keys with O(n) comparisons:

(i) Given n keys, build a heap of the n keys in O(n) comparisons.

(ii) Use Bob's algorithm to print out the n keys in sorted order in O(n) comparisons.

This contradicts the fact that Omega(n lg n) is a lower bound on number of comparisons for comparison-based sorting algorithms.


6. [10] Give an algorithm that, given an array of n elements, finds the k smallest elements in sorted order in O(n + k log k) time.

Select the kth smallest element p in O(n) time, partition the array around p in O(n) time, and then sort the partition having elements smaller than p in O(k log k) time.


7. Counting Sort and Radix Sort are two sorting algorithms that do not use any comparison and sort in linear time given that the range of the integers to be sorted is not too large. Moreover, Radix Sort usually uses Counting Sort as the intermediate sorting algorithm. Suppose you want to sort an array of n integers in [0..n^3].

(7a) [4] How much time does Counting Sort take on this array?

Answer: O(n + n3) = O(n3).

(7b) [8] Using the fact that we can represent integers in [0..n^3] with 3 lg n bits, show that you can use Radix Sort to sort this array in O(n) time using 3 passes of Counting Sort.

Answer: Break each integer into 3 blocks of lg n bits each, and run 3 passes of Counting Sort. The time is therefore 3 * O(n + 2lg n) = 3 * O(n + n) = 3*O(n) = O(n).


8. In the class, you learned two methods for resolving collisions in a hash table, namely, chaining and open addressing. Although chaining is very popular and is faster, one disadvantage of chaining is its space requirement: when collision occurs, extra memory has to be allocated for the chain even when the hash table is not full, whereas in open addressing all the keys are stored in the hash table as long as the hash table is not full. Suppose that you store n records in a hash table of size m by chaining, where m > n, and suppose that you have a good hash function so that the probability that a key is hashed into any of the m slots is 1/m.

(8a) [2] For a particular slot in the hash table, what is the probability that a particular key does not hash into that slot?

Answer: 1 - 1/m

(8b) [4] For a particular slot in the hash table, what is the probability that this slot is empty, that is, none of the n keys hashes into this slot?

Answer: (1 - 1/m)n

(8c) [4] If open addressing is used, what is the probability that a slot is empty.

Answer: (m-n)/m