HARVARD CS 124 DATA STRUCTURES AND ALGORITHMS MIDTERM EXAMINATION Spring 1996 (Exam Date: March 18, 1996) This is a CLOSED BOOK exam. You have 80 minutes to answer the following questions in any order. The number in brackets is the estimated time in minutes to answer the question and is the maximum score for that question. The highest grade is therefore 80. GIVE REASONS FOR ALL YOUR ANSWERS. CORRECT ANSWERS WITHOUT REASONS WILL NOT RECEIVE FULL CREDIT. FOR PROBLEMS WHICH ASK YOU TO GIVE TWO OR THREE ANSWERS, EACH ADDITIONAL CORRECT ANSWER YOU PROVIDE WILL EARN CREDIT OF INCREASING WEIGHT. (1) [15] Given two strings (X[1], X[2], ..., X[m]) and (Y[1], Y[2], ..., Y[n]), we have learned from the class that, by using dynamic programming, the length C(m,n) of their longest common subsequences can be computed in O(m*n) time. Now given three strings (X[1], X[2], ..., X[l]), (Y[1], Y[2], ..., Y[m]) and (Z[1], Z[2], ..., Z[n]), you will use the same dynamic programming technique to compute the length C(l,m,n) of their longest common subsequences. (1a) [10] Write down the recurrence relation on C(i,j,k) for the three-string algorithm. (Hint: Recall the recurrence relation for the two-string case, C(i,j) = C(i-1,j-1) + 1, if X[i] = Y[j] or max (C(i-1,j), C(i,j-1)), otherwise (1b) [5] Give the running time of the three-string algorithm. ************************* Solution: (1a) C(i,j,k) = C(i-1,j-1,k-1) + 1, if X[i] = Y[j] = Z[k], or max (C(i-1,j,k), C(i,j-1,k), C(i,j,k-1)), otherwise (1b) O(l*m*n) ************************* (2) [8] Consider the following binary search tree, which has five internal nodes 1, 2, 3, 4 and 5, and six leaves a, b, c, d, e and f: 1 / \ a 2 / \ b 3 / \ c 4 / \ d 5 / \ e f Note that the tree has height 5, and is not balanced. Demonstrate two consecutive tree rotation operations that will reduce the tree height from 5 to 3. ************************* Solution: Rotation 1: 2 / \ 1 3 / \ / \ a b c 4 / \ d 5 / \ e f Rotation 2: 3 / \ 2 4 / \ / \ 1 c d 5 / \ / \ a b e f ************************* (3) [13] Suppose that we are interested in efficient searching of an N-element set. Hashing tables and binary search trees are two data structures that could be used. (3a) [8] For the purpose of searching, describe two scenarios under which hashing is more desirable than binary search. (3b) [5] For the purpose of searching, describe one scenario under which binary search is more desirable than hashing. ************************* Solution: (3a) Scenario 1: The given elements are not ordered. Scenario 2: N is so large that the average theta (lg N) time for binary search can be significant (3b) scenario 1: The search time in the worst case is guaranteed not to exceed O(lg N), while the worst case time for hash can be O(N). ************************* (4) [10] Consider a radix sort of N numbers of b bits each. The radix sort will view each number as having b/r digits of r bits each, and perform b/r passes of counting sort. (4a) [5] What is the running time of this radix sort? (4b) [5] What is the required storage, in bits, for this radix sort? ************************* Solution: (4a) O((b/r)*(N + 2**r)) (4b) N*b + (lg N)*2**r bits ************************* (5) [12] We have studied several algorithms in class based on the divide-and-conquer principle. For each of these algorithms, either the dividing or combining phase must be nontrivial in computation, otherwise the whole algorithm would be trivial. (5a)[6] List two algorithms for which the combining phase is more expensive than the dividing phase. (5b)[6] List two algorithms for which the dividing phase is more expensive than the combining phase. ************************* Solution: (5a) Strassen's fast matrix multiplication and merge sort (5b) Quicksort and linear-time median algorithm ************************* (6) [9] John Smith has learned the divide-and-conquer principle, and wants to apply it to searching. To search a key in a set of N keys, his algorithm will divide the set into two subsets of N/2 keys each and then search each subset recursively. What is the running time of his algorithm? Show your analysis. ************************* Solution: T(N) = 2T(N/2) + O(1). Thus T(N) = O(N). ************************* (7) [13] Suppose that a cousin of V. Strassen comes up with an ingenious 3x3 matrix multiplication algorithm which takes 16 scalar multiplications and 50 scalar additions. Moreover, suppose that this algorithm works even if the entries in the 3x3 matrices are themselves matrices. (7a) [3] Outline how you could use this 3x3 result to devise a fast NxN matrix multiplication algorithm for large N? (7b) [10] What is the running time of your NxN matrix multiplication algorithm? ************************* Solution: (7a) Divide each input matrix into a 3x3 matrix of (N/3)x(N/3) entries. Use the 3x3 algorithm on these 3x3 matrices, recursively. (7b) T(N) = 16*T(N/3) + O(N**2) Since N**2/[N**(log3 16)] = O(N**-epsilon) which epsilon ~ .5, T(N) = N**(log3 16) ~ O(N**2.5) by the Master method. *************************