HARVARD CS124 DATA STRUCTURES AND ALGORITHMS FINAL EXAMINATION Spring 1996 Exam Date: May 24, 1996 This is a CLOSED BOOK exam. You have 180 minutes to answer the following questions in any order. The number in brackets is the estimated time in minutes to answer the question and is the maximum score for that question. The highest grade for the exam is 180. GIVE REASONS FOR ALL YOUR ANSWERS. CORRECT ANSWERS WITHOUT REASONS WILL NOT RECEIVE FULL CREDIT. FOR PROBLEMS WHICH ASK YOU TO GIVE TWO OR THREE ANSWERS, EACH ADDITIONAL CORRECT ANSWER YOU PROVIDE WILL EARN CREDIT OF INCREASING WEIGHT. (1) [16] Consider performing breath-first search and depth-first search on an n-node tree of height H and of maximal branch factor B. (a) [8] Give an upper bound on the size of the queue that breath-first search uses. (b) [8] Give an upper bound on the size of the stack that depth-first search uses. Answer: (a) O(N) (b) O(H) (2) [10] Prove that if every edge of a flow network has capacity of 17, then the Edmonds-Karp algorithm runs in time O(EV). (Hint: Use the fact that the Edmonds-Karp algorithm runs in time O(E|f*|), where f* is a maximum flow and |f*| is the value of f*.) Answer: Note that |f*| = O(V), since there are at most |V|-1 edges going out of the source s, each with capacity O(1). (3) [10] For each of the following statements, indicate whether it is true or false. To get credit, you must give reasons for your answer. (3a) [3] The (lg n)th smallest element of n unsorted numbers can be determined in O(n) worst-case time. Answer: True. In fact, any i-th smallest element can be found in O(n) using the linear-time order statistics algorithm covered in the class. (3b) [3] Insertion sort can be used as the intermediate sorting algorithm for radix sort. Answer: False. Insertion is not stable. (3c) [4] By carefully selecting a fixed set of n keys to be stored in a hash table of size m>=n with universal hashing, an adversary can make the retrieval of each key take Omega(n) time each time after these n keys have been hashed into the table. Answer: False. With universal hashing, the hash function will change each time when the n keys are hashed. Thus, it is impossible for the adversary to pick up this fixed set of n keys which will always behave badly between different runs. (4) [6] Name four algorithms you learned in the class that use dynamic programming. Answer: Matrix-chain multiplication LCS All-pair shortest path algorithm with the matrix multiplication formulation Floyd-Warshall all-pair shortest path algorithm (5) [15] Your friend claims that to sort an array of numbers, counting sort is always as fast as the version of radix sort presented in the text because the radix sort uses counting sort as its intermediate sorting algorithm. Is your friend right? Give an example scenario to illustrate your reasons. Answer: No, my friend is wrong. Here is an example in which radix sort is faster than counting sort. Suppose we are sorting n integers in the range [0, n^c] where c is any constant greater than 1. Counting sort takes time O(n + n^c) = O(n^c). (Remember that counting sort takes time O(n+k) where [0, k] is the range of the inputs.) With radix sort, we can decompose each input into c blocks of lgn bits each, and run c passes of counting sort on the lgn-bit blocks. Each pass takes time O(n + 2^(lgn)) = O(n). Therefore, the entire radix sort algorithm takes time c*O(n) = O(n). (6) [7] Dijkstra's algorithm and Prim's algorithm both use the adjacency list representation for graphs. Your friend A claims that with the adjacency matrix representation one can speed up both algorithms. Your friend B claims that using the adjacency matrix representation in fact slows down both algorithms. Who is right? Answer: Friend B is right, as the adjacency matrix representation does not readily identify neighboring vertices of a given vertex quickly. (7) [15] In the class, you learned an O(n^3 lgn) algorithm for the all-pair shortest path problem using matrix multiplication, where n is the number of nodes in the given graph. The algorithm raises the adjacency matrix to the n-th power, by lgn repeated squaring, each of which takes O(n^3) time with the usual matrix multiplication algorithm. Thus the entire algorithm takes O(n^3 lgn) time. Your friend claims that by using Strassen's fast matrix multiplication algorithm for each of this lgn matrix multiplications, he can improve the running time of this algorithm to O(n^2.81 lgn), which beats the O(n^3) bound of the Floyd-Warshall algorithm. Is your friend right? Why or why not? Answer: No, my friend is not right. The Strassen's equations which allow the 2x2 matrix multiplication to be performed in 7 scalar multiplications no longer hold, when scalar multiplication and addition are interpreted as addition and minimum, respectively. In particular, the new interpretation allows no cancellation. (8) [20] Suppose you want to multiply 2 n-bit integers. The usual algorithm you learned from primary school runs in O(n^2) time or O(n^2) bit operations. Using a divide-and-conquer method, one can in fact do the multiplication in only O(n^1.59) time. This divide-and-conquer method computes x*y where x and y are n-bit integers, with n being a power of 2, as follows: Divide x and y into two blocks of n/2 bits each. Let x = x1x0 and y = y1y0, where x0 and y0 are the least significant n/2 bits of x and y respectively, and x1 and y1 are the most significant n/2 bits of x and y respectively. - Recursively compute C = x1*y1, D= x0*y0, and E = (x1 + x0)*(y1 + y0). - Then compute x*y = C* 2^n + (E-C-D)* 2^(n/2) + D. (Note that multiplying an integer by a power of 2 involving only some shift of the bits). (8a) [10] Prove that the above divide-and-conquer algorithm correctly multiplies 2 n-bit integers. Answer: x = x1*2^(n/2) + x0 y = y1*2^(n/2) + y0 x*y = x1*y1*2^n + (x1*y0 + y1*x0)*2^(n/2) + x0*y0 = C*2^n + (E-C-D)*2^(n/2) + D (8b) [4] Let T(n) be the running time of this algorithm on 2 n-bit inputs. Give the recurrence equation for T(n). Answer: T(n) = 3*T(n/2) + O(n) (8c) [6] Give the running time of this algorithm by solving the above recurrence equation. (Hint: lg 3 is approximately 1.584962501.) Answer: By the Master method, we have T(n) = O(n^{lg3}) where lg is the base-2 log. Thus T(n) = O(n^1.584962501) (Corresponding to the Master formula, T(n) = aT(n/b) + f(n), we have a=3, b=2 and f(n)=n. Thus, f(n)/n^(logb a)= n/n^1.58.. = O(n^-.58). Thus, T(n) = O(n^(logb a)) or O(lg3). (9) [12] A monotonically increasing subsequence of a given sequence, X = , is a subsequence of X such that xi1 <= xi2 <= ... <= xik. For example, <2,6,7,9> is a monotonically increasing subsequence of <2,8,6,7,9,1>. Design an algorithm that given a sequence of length of n, finds its longest monotonically increasing subsequence in time O(n^2). (Hint: Use the longest common subsequence algorithm you learned in the class). Answer: Sort X to produce a sorted sequence Y in O(n lg n) time, and then find LCS between X and Y in O(n*n) time. For example, from X = <2,8,6,7,9,1>, we obtain Y =<1,2,6,7,8,9>. Then we use the LCS algorithm on X and Y to find <2,6,7,9>. (10) [12] Let G=(V,E) be a flow network with non-negative integral weights.. Suppose that after a maximum flow for G has been computed, a new edge with capacity 3 is added to E. Design an algorithm that updates the maximum flow after the new edge is added in time O(E). Answer: In the residual network, do a BFS to see if we can find a path from S to T. If so, the new edge must be on the path, and the new maximum flow is the old one augmented by the maximum flow over the path. The number of such augmentation is bounded above by 3, as each will increase the flow by at least one and after at most 3 no more augmenting path can be found. (11) [17] Suppose a university has m departments and n dormitories. The ith department has xi students and the jth dormitory can hold up to yj students. Let dij be the distance between the ith department and the jth dormitory. We are interested in determining, for any given value of d, whether it is possible to assign the students to the dormitories so that for each student, the distance between his/her department and dormitory is no more than d. It turns out that this assignment problem can be reformulated as a max flow problem, so that efficient max flow algorithms such as the Edmonds-Karp algorithm can be used. Below is the reformulation. Construct a flow network G with a source s and a sink t. Introduce m intermediate nodes u1, ..., um each representing a department, and n intermediate nodes v1, ..., vn each representing a dormitory. Add an edge from s to ui for every i, and let the capacity c(s, ui) be xi. Add an edge from vj to t for every j, and let the capacity c(vj, t) be yj. For every pair (i,j), add an edge from ui to vj IF AND ONLY IF dij <= d, and for every edge (ui, vj) added to the flow network, let c(ui, vj) = infinity. Such an edge is depicted in the following diagram: xi s ---> ui \ if dij <= d \ capacity(ui, vj) = infinity \ \ yj -> vj ---> t (11a) [2] Prove that the maximum flow on G has value <= x1 + x2 + ... + xm. Answer: The capacity from s to its neighbors x1 + x2 + ... + xm. (11b) [15] Prove that if the maximum flow on G has value exactly equal to x1 + x2 + ... + xm, then this assignment problem has an affirmative answer, that is, the students can be assigned to the dormitories so that for each student, the distance between his/her department and dormitory is no more than d. Answer: Consider a maximum flow f on G that has value x1 + x2 + ... + xm. Note that f(ui, vj) is the number of students from department ui who are assigned to dormitory vj. If f(ui, vj) > 0, then dij must not be greater than d. Thus, these students does not travel a distance larger than d. Since the maximum flow f has value x1 + x2 + ... + xm, we know all the students have been successfully assigned to dormitories. (12) [15] You have learned that an n-input bitonic sorting network has depth (log^2 n) and thus can sort a sequence of n inputs in O(log^2 n) time. (12a) [10] Draw a bitonic sorting network with n=8 inputs. The drawing should show the direction of each comparison, that is, indicate which output line will receive the the maximum of the two inputs. Answer: | | | | | | v | v | | | | v | | | | v | v | | | | | v | ^ | | | | | | | | v | v v | | | | v | | | | | v | ^ ^ | | | | | | v | | v | | | | | v | ^ v | | v | | | | | | v ^ | ^ v | | | | | | | v v v v (Since it is hard to draw with text, this diagram gives an approximation.) (12b) [5] Now suppose you want to sort m sequences of n inputs each. Design an algorithm that does this in time O((log^2 n) + m) using an n-input bitonic sorting network. Answer: Just pipeline these m sequences into the network, with the input lines take in a new sequence every time step. (13) [10] Consider any parallel sorting network that can perform up to n comparisons at each time step, such as the bitonic sorting network. Prove that any such sorting network must take at least Omega(lg n) time steps to sort n inputs. Answer: Since any comparison-based sorting method takes at least Omega(n lg n) comparisons, any such parallel sorting network must use a total of Omega(n lg n) comparators. Since each stage uses up to n comparators, there must be at Omega(lg n) stages. (14) [5] Describe two set operations for which using a balanced binary search tree are more desirable than a heap. Answer: Searching for an element, and deleting an element. (15) [10] Design an algorithm that given a graph of n vertices, identify all the vertices that are on some negative weight cycle in O(V^3) time. (Hint: Consider the use of all-pairs shortest paths.) Answer: Run the Floyd-Warshall algorithm to compute the length d(u,v) of the shortest path between every pair of nodes u and v, including from each node to itself. If node v is on a negative weight cycle, then d(v,v) < 0. Therefore, to identify all the nodes that are on some negative weight cycle, we run the Floyd-Warshall algorithm to compute all-pairs shortest paths, and store their lengths in a VxV matrix. This takes O(V^3) time. Then we check whether d[i,i] < 0, for each i, and this takes O(V) time.