Difficulty: Hard, Asked-in: Google, Microsoft, Amazon
Key takeaway: An excellent problem to learn problem-solving using two pointers idea similar to merging and divide and conquer idea similar to binary search.
There are two sorted arrays A[] and B[] of size n each, write a program to find the median of array obtained after merging both arrays i.e. merged array of size 2n.
Example 1
Input: A[] = [1, 3, 6], B[] = [2, 8, 12]
Output: 4.5
Explanation: After merging sorted arrays, we get the larger sorted array [1, 2, 3, 6, 8, 12]. The total number of elements is 6, so the median would be the average of two middle elements at index 2 and 3 i.e. (3 + 6)/2 = 4.5
Example 2
Input: A[] = [1, 3, 4, 6, 9], B[] = [2, 5, 7, 8, 10]
Output: 5.5
Explanation: After merging sorted arrays, we get the larger sorted array [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. So median would be the average of two middle elements at index 4 and 5 i.e. (5 + 6)/2 = 5.5
So one basic idea is to merge both sorted arrays using extra space and get the median by taking an average of both middle elements in merged array. If we follow 0 based indexing, median will be the average of value at (n-1)th and nth indexes. Note: For merging sorted arrays, we can use an idea similar to the merging procedure of merge sort.
double median(int A[], int B[], int n)
{
int MergedArr[2n]
int i = 0, j = 0, k = 0
while (i < n && j < n)
{
if (A[i] <= B[j])
{
MergedArr[k] = A[i]
i = i + 1
k = k + 1
}
else
{
MergedArr[k] = B[j]
j = j + 1
k = k + 1
}
}
while (i < n)
{
MergedArr[k] = A[i]
i = i + 1
k = k + 1
}
while (j < n)
{
MergedArr[k] = B[j]
j = j + 1
k = k + 1
}
return (double)(MergedArr[n - 1]+ MergedArr[n])/2
}
We need to traverse each element to merge both sorted arrays. In other words, at each iteration of while loops, we place one value to MergedArr[]. So time complexity of merging = O(n)
Overall time complexity = Time complexity of merging + Time complexity of finding median = O(n) + O(1) = O(n). We are using 2n size extra space for merging, so space complexity = O(n)
Now critical questions are: Do we really need to perform complete merging of sorted arrays to find the median? Can we reduce extra space used in the merging process? Let's Think!
Here are few critical observations: We are traversing both arrays from start in incremental order. At each step of iteration, we are increasing either pointer i or pointer j and placing one value from any one of smaller arrays to the larger sorted array. At any general step of iteration, k = i + j - 1.
Here is an optimized solution idea: While comparing elements of two sorted arrays during the merging process, we can track count of elements in merged array using variable k. We stop when count becomes n + 1 and calculate the median by taking an average of nth and (n+1)th element of the merged array. So there is no need to use extra space and perform complete merging.
double median(int A[], int B[], int n)
{
int i = 0, j = 0, count = 0
int middle1 = 0, middle2 = 0
while (count <= n)
{
if (A[i] <= B[j])
{
middle1 = middle2
middle2 = A[i]
i = i + 1
}
else
{
middle1 = middle2
middle2 = B[j]
j = j + 1
}
if (i == n)
{
middle1 = middle2
middle2 = B[0]
break
}
else if(j == n)
{
middle1 = middle2
middle2 = A[0]
break
}
count = count + 1
}
return (double)(middle1 + middle2)/2
}
While loop is running n times and doing O(1) operation at each iteration. So time complexity = n * O(1) = O(n). We are using constant number of extra variables, so space complexity = O(1).
Now critical questions are: How do we improve the time complexity? This is a searching problem. Can we take advantage of sorted array property? Let's think!
Here idea is to compare medians of both sorted arrays and recursively reduce search space by half. Suppose median of the first array is m1, and second array is m2. We can get this value in O(1) using the formula: m1 = A[n/2], m2 = B[n/2] (We have assumed that n is odd)
Case 1 if (m1 == m2): In this case, n - 1 elements are less than m1 and n - 1 elements are greater than m2. In other words, both m1 and m2 would be middle elements in the merged array i.e. element at (n - 1)th and nth index. We take average of both, which is equal to m1 or m2. We return any one of theses values as an output. (Think!)
Case 2 if (m1 < m2): In the merged array, both middle elements of median will lie between the range [m1, m2] i.e. m1 <= (middle1, middle2) <= m2.
Case 3 if (m1 > m2): In the merged array, both middle elements of the median will lie between the range [m2, m1] i.e. m2 <= (middle1, middle2) <= m1.
The above idea is quite similar to binary search: We reject half elements in both sorted arrays after each comparison. Overall, we are reducing search space by half after each step. So, we can think of implementing a solution using idea similar to recursive binary search.
Suppose we use function median(int A[], int B[], int n).
If (m1 > m2): Both middle elements of the median must be present in one of these two sub-arrays: subarray starting from the first element of array A[] and ending at element m1 (including m1) and subarray starting from element m2 (including m2) and ending at the last element of array B[]. In other words, we are rejecting n/2 elements from both subarrays. So, we recursively call the same function with input size n - n/2 i.e. median(A, B + n/2, n - n/2)
For example, if n = 7, then n/2 = 3, m1 = A[3] and m2 = B[3]. So we reject 3 elements in A[] from index 4 to 6 and 3 elements in B[] from index 0 to 2. Remaining size of array A and B = n - n/2 = 7 - 3 = 4.
If (m2 > m1): Both middle elements of the median must be present in one of these two sub-arrays: subarray starting from element m1 (including m1) and ending at the last element of array A[] and subarray starting from the first element of array B[] and ending at element m2 (including m2). In other words, we are rejecting n/2 elements from both subarrays. So, we recursively call the same function with input size n - n/2 i.e. median(A + n/2, B, n - n/2)
For example, if n = 9, then n/2 = 4, m1 = A[4] and m2 = B[4]. So we reject: 4 elements in A[] from index 0 to 3, and 4 elements in B[] from index 5 to 8. Remaining size of array A and B = n - n/2 = 9 - 4 = 5.
double median(int A[], int B[], int n)
{
int m1, m2
if (n = 0)
return 0
if (n == 1)
return (double)(A[0] + B[0])/2
m1 = getMedian(A, n)
m2 = getMedian(B, n)
if (m1 == m2)
return m1
if (m1 < m2)
return median(A + n/2, B, n - n/2)
else
return median(A, B + n/2, n - n/2)
}
int getMedian(int C[], int n)
{
if (n % 2 == 0)
return (C[n/2 - 1] + C[n/2])/2
else
return C[n/2]
}
Input size given in the problem = Size of A[] + Size of B[] = 2n. At each step of recursion, we are reducing input size by 1/2 i.e. we are rejecting one-half of both smaller arrays or reducing search space by half.
Recurrence relation, T(2n) = T(n) + O(1), or we can write T(x) = T(x/2) + O(1), where x = 2n. This looks similar to the recurrence relation of binary search. Time complexity = O(log n).
Space complexity = O(log n), which is equal to the size of recursion call stack (Think!)
Enjoy learning, Enjoy algorithms!
Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.