Median of Two Sorted Arrays

Difficulty: Hard, Asked-in: Google, Microsoft, Amazon.

Key takeaway: An excellent problem to learn problem-solving using two pointers approach similar to merging, and a divide-and-conquer approach similar to binary search.

Let's understand the problem

There are two sorted arrays A[] and B[] of size n each. Write a program to find the median of the array obtained after merging both sorted arrays, i.e., the merged array of size 2n.

  • The median of a sorted array of size n is defined as the middle element when n is odd and the average of the two middle elements when n is even.
  • After merging both sorted arrays, the size of the larger sorted array will be 2n, an even value.
  • For the convenience of the solution, let's assume that n is odd.

Median of two sorted arrays of equal size visualization

Example 1

Input: A[] = [1, 3, 6], B[] = [2, 8, 12], Output: 4.5.

Explanation: After merging the sorted arrays, we obtain the larger sorted array [1, 2, 3, 6, 8, 12]. The total number of elements is 6, so the median is the average of the two middle elements, 3 and 6, which equals (3 + 6)/2 = 4.5.

Example 2

Input: A[] = [1, 3, 4, 6, 9], B[] = [2, 5, 7, 8, 10], Output: 5.5.

Explanation: After merging the sorted arrays, we obtain a larger sorted array of size 10: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. The median is the average of the two middle elements, 5 and 6, which equals (5 + 6)/2 = 5.5.

Discussed solution approaches

  • Brute force approach: Merging using extra space.
  • Two-pointer approach: Counting while merging.
  • Efficient approach: Divide and conquer idea similar to binary search.

Brute force approach: Merging using extra space

Solution idea

One basic idea is to merge both sorted arrays using extra space and get the median by taking the average of the two middle elements in the merged array. If we follow 0-based indexing, the median will be the average of the values at the (n-1)th and nth indexes. Note: For merging sorted arrays, we can use an approach similar to the merging procedure of merge sort.

Solution code C++

double medianSortedArrays(int A[], int B[], int n)
{
    int MergedArr[2 * n];
    int i = 0, j = 0, k = 0;
    while (i < n && j < n)
    {
        if (A[i] <= B[j])
        {
            MergedArr[k] = A[i];
            i = i + 1;
            k = k + 1;
        }
        else
        {
            MergedArr[k] = B[j];
            j = j + 1;
            k = k + 1;
        }
    }
    while (i < n)
    {
        MergedArr[k] = A[i];
        i = i + 1;
        k = k + 1;
    }
    while (j < n)
    {
        MergedArr[k] = B[j];
        j = j + 1;
        k = k + 1;
    }
    return (double)(MergedArr[n - 1] + MergedArr[n]) / 2;
}

Solution code Python

def medianSortedArrays(A, B, n):
    MergedArr = [0] * (2 * n)
    i, j, k = 0, 0, 0
    while i < n and j < n:
        if A[i] <= B[j]:
            MergedArr[k] = A[i]
            i = i + 1
            k = k + 1
        else:
            MergedArr[k] = B[j]
            j = j + 1
            k = k + 1
    while i < n:
        MergedArr[k] = A[i]
        i = i + 1
        k = k + 1
    while j < n:
        MergedArr[k] = B[j]
        j = j + 1
        k = k + 1
    return (MergedArr[n - 1] + MergedArr[n]) / 2

Time and space complexity analysis

We need to traverse each element to merge both sorted arrays. In other words, at each iteration of the while loop, we place one value to MergedArr[]. So the time complexity of merging loop = O(n).

The overall time complexity = Time complexity of merging + Time complexity of finding median = O(n) + O(1) = O(n). We are using 2n size extra space for merging, so space complexity = O(n).

Two-pointers approach: Counting while merging

Solution idea

Now, the critical questions are: Do we really need to perform a complete merge of sorted arrays to find the median? Can we reduce the extra space used in the merging process? Let's think!

Here are a few critical observations: We are traversing both arrays from the start in incremental order. At each iteration step, we are either increasing pointer i or pointer j and placing one value from either of the smaller arrays into the larger sorted array. At any general step of iteration, k = i + j - 1.

Here is an optimized solution idea: While comparing elements of the two sorted arrays, we can track the count of elements in the merged array using the variable k. We stop when the count becomes n + 1 and calculate the median by taking the average of the nth and (n + 1)th elements of the merged array. So, there is no need to use extra space and perform a complete merge.

Solution steps

  1. We initialize two pointers i and j to move forward in A[] and B[]. i = 0 and j = 0.
  2. We initialize three variables: middle1 and middle2 to keep track of the middle elements and count to track the count of elements in the merged array. middle1 = 0, middle2 = 0 and count = 0.
  3. Now we start the merging loop and move forward by comparing elements in A[] and B[] until the count becomes equal to the n + 1. We also keep track of middle elements middle1 and middle2.
  4. By the end of the loop, we return the median by taking an average of middle1 and middle2.

Solution code C++

double medianSortedArrays(int A[], int B[], int n)
{
    int i = 0, j = 0, count = 0;
    int middle1 = 0, middle2 = 0;
    while (count <= n)
    {
        if (A[i] <= B[j])
        {
            middle1 = middle2;
            middle2 = A[i];
            i = i + 1;
        }
        else
        {
            middle1 = middle2;
            middle2 = B[j];
            j = j + 1;
        }
        if (i == n)
        {
            middle1 = middle2;
            middle2 = B[0];
            break;
        }
        else if (j == n)
        {
            middle1 = middle2;
            middle2 = A[0];
            break;
        }
        count = count + 1;
    }
    return (double)(middle1 + middle2) / 2;
}

Solution code Python

def medianSortedArrays(A, B, n):
    i, j, count = 0, 0, 0
    middle1, middle2 = 0, 0
    while count <= n:
        if A[i] <= B[j]:
            middle1 = middle2
            middle2 = A[i]
            i = i + 1
        else:
            middle1 = middle2
            middle2 = B[j]
            j = j + 1
        if i == n:
            middle1 = middle2
            middle2 = B[0]
            break
        elif j == n:
            middle1 = middle2
            middle2 = A[0]
            break
        count = count + 1
    return (middle1 + middle2) / 2

Time and Space complexity analysis

We are running while loop n + 1 times and performing O(1) operation at each iteration. So the time complexity = (n + 1)* O(1) = O(n). Since we are using a constant number of extra variables, space complexity = O(1).

Efficient Approach: Divide and Conquer Idea Similar to Binary Search

Now, the critical questions are: How can we improve the time complexity? Since this is a searching problem, can we take advantage of the sorted array property? Let's think!

Solution idea

The idea is to compare the medians of both sorted arrays and recursively reduce the search space by half. Suppose the median of the first array is m1, and the median of the second array is m2. We can get these values in O(1) time using this formula: m1 = A[n/2], m2 = B[n/2] (We assume that n is odd).

Case 1: if (m1 == m2): In this case, there are n - 1 elements less than m1 and n - 1 elements greater than m2. In other words, both m1 and m2 will be the middle elements in the merged array, i.e., the element at the (n - 1)th and nth index. The average of both values will be equal to m1 or m2, so we return either of these values as the output.

Case 2: if (m1 < m2): In the merged array, both middle elements of the median will lie between the range [m1, m2] i.e. m1 <= (middle1, middle2) <= m2.

  • We can ignore the left half of array A[]: All values less than m1 will be the starting elements in the merged array (n/2 values if n is odd) i.e. they will be present in the range of indices from 0 to n - 2 in the merged array. So these values will never be middle values in the merged array and we can ignore the left half of array A[] from index 0 to n/2 - 1.
  • We can ignore the right half of array B[]: All values greater than m2 will be the last elements in the merged array (n/2 values if n is odd) i.e. they will be present in the range of indices from n + 1 to 2n - 1 in the merged array. So these values will never be middle values in the merged array and we can ignore the right half of array B[] from index n/2 + 1 to n - 1.
  • In simple words: Both middle values of the median can be present in one of these two subarrays: the right half of array A[] and the left half of array B[].

Case 3 if (m1 > m2): In the merged array, both middle elements of the median will lie between the range [m2, m1], i.e., m2 <= (middle1, middle2) <= m1.

  • We can ignore the right half of array A[]: All values greater than m1 will be the last elements in the merged array (n/2 values if n is odd), i.e., they will be somewhere present in the range of index from n + 1 to 2n - 1 in the merged array. So these values will never be middle values in the merged array and we ignore the right half of array A[] from index n/2 + 1 to n - 1.
  • We can ignore the left half of array B[]: All values less than m2 will be the starting elements in the merged sorted array (n/2 values if n is odd), i.e., they will be somewhere present in the range of index from 0 to n - 2 in the merged array. So these values will never be middle values in the merged array and we ignore the left half of array B[] from index 0 to n/2 - 1.
  • In simple words: Both middle values of the median will be present in one of these two subarrays: the left half of array A[] and the right half of array B[].

The above idea is quite similar to binary search: We reject half of the elements in both sorted arrays after each comparison. So we can think of implementing a solution using an idea similar to recursive binary search.

Solution steps

Suppose we use the function medianSortedArrays(int A[], int B[], int n):

  1. We first find the medians m1 and m2. 
  2. If (m1 == m2), we return m1 or m2.
  3. If (m1 > m2): Both middle elements of the median must be present in one of these two subarrays: 1) Subarray starting from the first element of A[] and ending at m1 (including m1) 2) Subarray starting from m2 (including m2) and ending at the last element of B[]. In other words, we are rejecting n/2 elements from both subarrays. So, we recursively call the same function with input size n - n/2, i.e. medianSortedArrays(A, B + n/2, n - n/2).

    For example, if n = 7, then n/2 = 3, m1 = A[3], and m2 = B[3]. So we reject 3 elements in A[] from index 4 to 6 and 3 elements in B[] from index 0 to 2. The remaining size of arrays A and B is n - n/2 = 7 - 3 = 4.

  4. If (m2 > m1): Both middle elements of the median must be present in one of these two subarrays: 1) Subarray starting from m1 (including m1) and ending at the last element of A[]. 2) Subarray starting from the first element of B[] and ending at m2 (including m2). In other words, we are rejecting n/2 elements from both subarrays. So, we recursively call the same function with input size n - n/2, i.e. medianSortedArrays(A + n/2, B, n - n/2).

    For example, if n = 9, then n/2 = 4, m1 = A[4], and m2 = B[4]. So we reject 4 elements in A[] from index 0 to 3 and 4 elements in B[] from index 5 to 8. The remaining size of arrays A and B is n - n/2 = 9 - 4 = 5.

  5. Base case: n is odd, so the base case will occur when n <= 2.

    if (n == 0)
       return 0
    if (n == 1)
       return (double)(A[0] + B[0]) / 2
    if (n == 2)
       return (max(A[0], B[0]) + min(A[1], B[1])) / 2.0

Solution code C++

int getMedian(int C[], int n)
{
    if (n % 2 == 0)
        return (C[n / 2 - 1] + C[n / 2]) / 2;
    else
        return C[n / 2];
}

double medianSortedArrays(int A[], int B[], int n)
{
    int m1, m2;
    if (n == 0)
        return 0;
    if (n == 1)
        return (double)(A[0] + B[0]) / 2;      
    if (n == 2)
        return (max(A[0], B[0]) + min(A[1], B[1])) / 2.0;
        
    m1 = getMedian(A, n);
    m2 = getMedian(B, n);

    if (m1 == m2)
        return m1;
    if (m1 < m2)
        return medianSortedArrays(A + n / 2, B, n - n / 2);
    else
        return medianSortedArrays(A, B + n / 2, n - n / 2);
}

Solution code Python

def getMedian(C, n):
    if n % 2 == 0:
        return (C[n // 2 - 1] + C[n // 2]) / 2
    else:
        return C[n // 2]

def medianSortedArrays(A, B, n):
    if n == 0:
        return 0
    if n == 1:
        return (A[0] + B[0]) / 2.0
    if n == 2:
        return (max(A[0], B[0]) + min(A[1], B[1])) / 2.0
    
    m1 = getMedian(A, n)
    m2 = getMedian(B, n)

    if m1 == m2:
        return m1
    if m1 < m2:
        return medianSortedArrays(A[n // 2:], B, n - n // 2)
    else:
        return medianSortedArrays(A, B[n // 2:], n - n // 2)

Time and space complexity analysis

Input size given in the problem = Size of A[] + Size of B[] = 2n. At each step of recursion, we are reducing input size by 1/2 i.e. we are rejecting one-half of both smaller arrays.

The recurrence relation of the time complexity: T(2n) = T(n) + O(1), or we can write T(x) = T(x/2) + O(1), where x = 2n. This looks similar to the recurrence relation of binary search. So time complexity = O(log x) = O(log 2n) = O(log n).

Space complexity = O(log n), which is equal to the size of recursion call stack. Think and explore!

Critical ideas to think!

  • Can we solve this problem in O(logn) time complexity using an iterative approach?
  • How do we modify the above algorithms when n is even?
  • How do we modify the above algorithms when n can be both odd or even?
  • How can we solve the above problem if the size of both arrays is different? What would be the time complexity? What are the boundary conditions?
  • In the second approach, why are there two different boundary cases for i = n and j = n?
  • In the third approach, why are we including m1 and m2 during the recursive call?
  • Visualize and perform a dry run of the efficient approach.

Comparison of time and space complexities

  • Merging using extra space: Time = O(n), Space = O(n).
  • Using a two-pointer approach: Time = O(n), Space = O(1).
  • Using divide and conquer approach: Time = O(logn), Space = O(logn).

Suggested coding problems to practice

  • Inversion count in an array
  • Median of two sorted arrays of different sizes
  • K-th element of two sorted arrays
  • Josephus problem
  • Find median in the row-wise sorted matrix
  • Merge two sorted arrays 

If you have any queries or feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!

Share Your Insights

☆ 16-week live DSA course
☆ 16-week live ML course
☆ 10-week live DSA course

More from EnjoyAlgorithms

Self-paced Courses and Blogs

Coding Interview

Machine Learning

System Design

Our Newsletter

Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.