Find Intersection of Two Arrays

Difficulty: Medium, Asked-In: Google, Facebook

Key takeaway: An excellent problem to learn time complexity optimization using various approaches. The two-pointer and hash table solutions are intuitive and worth exploring.

Let’s understand the problem

Given two unsorted arrays X[] and Y[] of sizes m and n respectively, write a program to find the intersection of these two arrays. Suppose m > n and the elements in both arrays are distinct.

  • The intersection represents a list of common elements present in both arrays.
  • The elements in the output can be in any order.

Example 1

Input: X[] = [3, 4, 62, 8, 5, 9], Y[] = [632, 7, 5, 1]

Output: [3, 6, 2, 5]

Explanation: Common elements are 3, 6, 2, and 5. So the intersection of both arrays is 3, 6, 2, and 5.

Example 2

Input: X[] = [3, 4, 6, 7, 10, 12, 5], Y[] = [7, 11, 18, 15, 12]

Output: [7, 12]

Explanation: Common elements are 7 and 12. So the intersection of both arrays is 7 and 12.

Example 3

Input: X[] = [3, 4, 6, 10, 5], Y[] = [7, 11, 18, 15, 12]

Output: There are no common elements. So the output is an empty array.

Discussed solution approaches

  • Brute force approach using nested loops
  • Using sorting and binary search
  • Using sorting and two pointers
  • Efficient approach using hash table

Brute force approach using nested loops

Solution idea

One basic idea is to traverse the array X[] and linearly search each element X[i] in Y[]. If element X[i] is present in Y[], then X[i] will be a common element in both arrays.

  • We run two nested loops: an outer loop from i = 0 to m - 1 to pick each element X[i], and an inner loop from j = 0 to n - 1 to search X[i] linearly in Y[].
  • Whenever we find an element that is common to both X[] and Y[], i.e., X[i] == Y[j], we add one of them to the output list intersection[].
  • By the end of the nested loops, we return the output list intersection[].

Solution code C++

vector<int> arrayIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection;
    for (int i = 0; i < m; i = i + 1)
    {
        for (int j = 0; j < n; j = j + 1)
        {
            if (X[i] == Y[j])
            {
                intersection.push_back(X[i]);
                break;
            }
        }
    }
    return intersection;
}

Solution code Python

def arrayIntersection(X, Y, m, n):
    intersection = []
    for i in range(m):
        for j in range(n):
            if X[i] == Y[j]:
                intersection.append(X[i])
                break
    return intersection

Solution analysis

We are running two nested loops and performing O(1) operations at each iteration. In the worst case, the inner loop will run n times. So time complexity = O(m) * O(n) * O(1) = O(mn).

We are using constant extra space, so space complexity = O(1). Why are we not considering the output list intersection as part of space complexity? Think!

Using sorting and binary search

Solution idea

The critical question is: How can we improve the time complexity of the above approach? Searching is an essential operation in the above code. Can we think to improve the efficiency of searching?

One idea is: If we sort the array Y[], we can search each element X[i] using binary search. This will help us to improve time complexity because binary search will take O(logn) time to search an element in the sorted array.

Solution steps

  • We declare an output list intersection[] to store common elements.
  • Now we sort the array Y[] in increasing order. We can use O(nlogn) sorting algorithms like heap sort, quick sort, or merge sort.
  • We run a loop from i = 0 to m - 1 and search each element X[i] in the sorted array Y[] using binary search. Whenever we find an element common to both X[] and Y[], i.e., X[i] == Y[j], we add it to the output list intersection[].
  • By the end of the loop, we return the output list intersection[].

Solution code C++

int binarySearch(int Y[], int n, int target)
{
    int l = 0, r = n;
    while (l <= r)
    {
        int mid = l + (r - l) / 2;
        if (Y[mid] == target)
            return mid;
        else if (Y[mid] > target)
            r = mid - 1;
        else
            l = mid + 1;
    }
    return -1;
}

vector<int> arrayIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection;
    sort(Y, Y + n);
    for (int i = 0; i < m; i = i + 1)
    {
        if (binarySearch(Y, n, X[i]) != -1)
            intersection.push_back(X[i]);
    }
    return intersection;
}

Solution code Python

def binarySearch(Y, n, target):
    l, r = 0, n
    while l <= r:
        mid = l + (r - l) // 2
        if Y[mid] = target:
            return mid
        elif Y[mid] > target:
            r = mid - 1
        else:
            l = mid + 1
    return -1

def arrayIntersection(X, Y, m, n):
    intersection = []
    Y = sorted(Y)
    for i in range(m):
        if binarySearch(Y, n, X[i]) != -1:
            intersection.append(X[i])
    return intersection

Solution analysis

Suppose, we are using heap sort and iterative binary search for the implementation. So time complexity = Time complexity of sorting array Y[] using heap sort + Time complexity of searching each element X[i] in Y[] using binary search = O(nlogn) + m * O(logn) = O(nlogn + mlogn). If m > n, then mlogn > nlogn, and the time complexity will be O(mlogn).

Space complexity = Space complexity of heap sort + Space complexity of the iterative binary search = O(1) + O(1) = O(1). What will be the space complexity if we use merge sort and recursive binary search for the implementation? Think!

Using sorting and two pointers

Solution idea

In the above approach, the time complexity is dominated by the searching operation of m elements, which is O(mlogn). The critical question is: Can we improve the time complexity of searching further? Here's an idea: If we sort both arrays, we can find the intersection elements in both arrays by applying the two-pointers approach, similar to the merging algorithm of merge sort. How can we implement this? Let's think!

After sorting both arrays, we start comparing the elements of both arrays from the start using two pointers, i and j. If X[i] == Y[j], then we have found a common element. So we add one of them to the output list and move to the next elements in both arrays. Otherwise, we move to the next element in the array that includes the smaller element. Think!

Solution steps

  1. We define an output list intersection[] to store common elements.
  2. Now we sort both arrays in increasing order. We can use efficient sorting algorithms like heap sort or quick sort to sort the array in O(nlogn) time.
  3. Before starting the two-pointers loop, we initialize the pointers i and j i.e. i = 0 and j = 0.
  4. Now we run a loop while i < m and j < n:

    • If X[i] == Y[j], we add X[i] to the intersection[] and increment pointers i and j by 1.
    • If X[i] < Y[j], we increment pointer i by 1 to find element equal to Y[j] in X[].
    • If X[i] > Y[j], we increment pointer j by 1 to find element equal to X[i] in Y[].
  5. By the end of the loop, we return the output list intersection[].

Solution code C++

vector<int> arrayIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection;
    sort(X, X + m);
    sort(Y, Y + n);
    int i = 0, j = 0;
    while (i < m && j < n)
    {
        if (X[i] == Y[j])
        {
            intersection.push_back(X[i]);
            i = i + 1;
            j = j + 1;
        }
        else if (X[i] < Y[j])
            i = i + 1;
        else
            j = j + 1;
    }

    return intersection;
}

Solution code Python

def arrayIntersection(X, Y, m, n):
    intersection = []
    X.sort()
    Y.sort()
    i, j = 0, 0
    while i < m and j < n:
        if X[i] == Y[j]:
            intersection.append(X[i])
            i = i + 1
            j = j + 1
        elif X[i] < Y[j]:
            i = i + 1
        else:
            j = j + 1

    return intersection

Solution analysis

Suppose we are using heap sort for the implementation.

  • At each iteration of the loop, we increment pointer i or j or both by 1. So in the worst case, the loop will access each element of X and Y once. Time complexity of the while loop = O(m) + O(n) = O(m + n).
  • Overall time complexity = Time complexity of sorting X[] + Time complexity of sorting Y[] + Time complexity of two pointers while loop = O(mlogm) + O(nlogn) + O(n + m) = O(mlogm + nlogn). If m > n, mlogm > nlogn and time complexity will be O(mlogm).
  • Space complexity = O(1).

For m > n, if we compare the time complexity of this approach with the previous approach, then it is not better! The idea is: We have improved the time complexity of searching, but time complexity is dominated by the sorting algorithm, which is O(mlogm). The time complexity of the previous approach is O(mlogn), which is better than O(mlogm). Think!

An efficient approach using a hash table

Solution idea

We can improve the time complexity of searching using a hash table because a hash table can perform searching in O(1) time average. How can we implement this? Let's think!

The idea is to first insert all elements of Y[] into the hash table and then search each element X[i]. If X[i] is present in the hash table, then it is a common element to both arrays. So we add it to the output list.

Solution code C++

vector<int> arrayIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection;
    unordered_set<int> H;

    for (int i = 0; i < n; i = i + 1)
        H.insert(Y[i]);

    for (int i = 0; i < m; i = i + 1)
    {
        if (H.find(X[i]) != H.end())
            intersection.push_back(X[i]);
    }
    
    return intersection;
}

Solution code Java

List<Integer> arrayIntersection(int[] X, int[] Y, int m, int n) 
{
    List<Integer> intersection = new ArrayList<>();
    HashSet<Integer> H = new HashSet<>();

    for (int i = 0; i < n; i = i + 1)
        H.add(Y[i]);

    for (int i = 0; i < m; i = i + 1) 
    {
        if (H.contains(X[i]))
            intersection.add(X[i]);
    }

    return intersection;
}

Solution code Python

def arrayIntersection(X: List[int], Y: List[int], m: int, n: int) -> List[int]:
    intersection = []
    H = set()

    for i in range(n):
        H.add(Y[i])

    for i in range(m):
        if X[i] in H:
            intersection.append(X[i])

    return intersection

Solution analysis

Time complexity = Time complexity of inserting n elements of Y[] in hash table + Time complexity of searching m elements of X[] in hash table = n * O(1) + m * O(1)) = O(m + n). Space complexity = O(n), for storing values in the hash table.

Critical ideas to think!

  • Do we need to modify the above algorithms if elements are repeated?
  • What would be the time and space complexity if we create a hash table for the larger array X[] instead of the smaller array Y[]?
  • Can we think of using some other data structures to solve this problem? Is a self-balancing BST useful?
  • In the second approach, we sorted the smaller array. Can we solve this problem by sorting the larger array? If yes, then what will be the time complexity?
  • In the third approach, why do we increment pointers when X[i] > Y[j] or X[i] < Y[j]?
  • In the third approach, what would be the worst and best-case scenarios?

Suggested coding problems to solve

If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!

Share Your Insights

More from EnjoyAlgorithms

Self-paced Courses and Blogs

Coding Interview

Machine Learning

System Design

Our Newsletter

Subscribe to get well designed content on data structure and algorithms, machine learning, system design, object orientd programming and math.