Find Intersection of Two Arrays

Difficulty: Medium, Asked-In: Google, Facebook

Key takeaway: An excellent problem to learn time complexity optimization using various approaches. The two-pointer and hash table solutions are intuitive and worth exploring.

Let’s understand the problem

Given two unsorted arrays X[] and Y[] of size m and n, write a program to find the intersection of these two arrays.

  • Suppose m > n and elements in both arrays are distinct.
  • The intersection is a list of common elements present in both arrays.
  • The elements in the output can be in any order.

Example 1

Input: X[] = [3, 4, 62, 8, 5, 9], Y[] = [632, 7, 5, 1]

Output: [3, 6, 2, 5]

Explanation: Common elements are 3, 6, 2, and 5. So the intersection of both arrays is 3, 6, 2, and 5.

Example 2

Input: X[] = [3, 4, 6, 7, 10, 12, 5], Y[] = [7, 11, 18, 15, 12]

Output: [7, 12]

Explanation: Common elements are 7 and 12. So the intersection of both arrays is 7 and 12.

Example 3

Input: X[] = [3, 4, 6, 10, 5], Y[] = [7, 11, 18, 15, 12]

Output: There are no common elements. So the output is an empty array.

Discussed solution approaches

  • Brute force approach  using nested loops
  • Using sorting and binary search
  • Using sorting and two pointers
  • Efficient approach  using hash table

Brute force approach  using nested loops

Solution idea

A basic idea would be to pick each element from X[] and search linearly in Y[]. We add an element to the output list if it is present in both X[] and Y[].

  • We run two nested loops: An outer loop from i = 0 to m - 1 to pick each element X[i] and an inner loop from j = 0 to n - 1 to search X[i] linearly in Y[].
  • Whenever we found an element common to both X[] and Y[] i.e. X[i] == Y[j], we add any one of it to the output list intersection[].
  • By the end of nested loops, we return the output list intersection[].

Solution pseudocode

vector<int> findIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection
    for (int i = 0; i < m; i = i + 1)
    {
        for (int j = 0; j < n; j = j + 1)
        {
            if (X[i] == Y[j])
            {
                intersection.push_back(X[i])
                break
            }    
        }
    }
    return intersection
}

Solution analysis

We are running two nested loops and performing O(1) operations at each iteration. In the worst case, inner loop will run n times. So time complexity = O(m) * O(*n) * O(1) = O(mn). We are using constant extra space, so space complexity = O(1). Why are we not considering output list intersection as a part of space complexity? Think!

Using sorting and binary search

Solution idea

The critical question is: How can we improve the time complexity of above approach? Searching is an essential operation in the above code. Can we think to improve the efficiency of searching element X[i] in Y[]? One idea would be simple: If we sort the array Y[], we can search each element X[i] using the binary search. This could help to improve time complexity because binary search takes O(logn) time.

Solution steps

  • We declare an output list intersection[] to store common elements.
  • Now we sort array Y[] in increasing order. We can use O(nlogn) sorting algorithms like heap sort or quick sort or merge sort.
  • We run a loop from i = 0 to m - 1 and search each element X[i] in the sorted array Y[] using binary search. Whenever we found an element common to both X[] and Y[] i.e. X[i] == Y[j], we add it to the output list intersection[].
  • By the end of loop, we return the output list intersection[].

Solution pseudocode

vector<int> findIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection
    heapSort(Y, n)
    for (int i = 0; i < m; i = i + 1)
    {
        if (binarySearch(Y, n, X[i]) != -1)
            intersection.push_back(X[i])
    }
    return intersection
}

int binarySearch(int Y[], int n, int target)
{
    int l = 0, r = n
    while (l <= r)
    {
        int mid = l + (r - l)/2
        if (Y[mid] == target)
            return mid
        else if (Y[mid] > target)
            r = mid - 1
        else
            l = mid + 1
   }
   return -1
}

Solution analysis

We are using heap sort and iterative binary search for the implementation. Time complexity = Time complexity of sorting array Y[] using heap sort + Time complexity of searching each elements X[i] in Y[] using binary search = O(nlogn) + m * O(logn) = O(nlogn + mlogn). If m > n, then mlogn > nlogn and time complexity would be O(mlogn).

Space complexity = Space complexity of heap sort + Space complexity of iterative binary search = O(1) + O(1) = O(1). What would be the space complexity if we use merge sort and recursive binary search for the implementation? Think!

Using sorting and two pointers

Solution idea

In the above approach, overall time complexity of searching is O(mlogn). The critical question is: Can we improve the time complexity of searching further? Here is an idea: If we sort both arrays, we can apply two-pointers approach similar to the merging procedure of merge sort to find common elements. Think!

After sorting both arrays, the idea would be to start comparing one element of X[] with one element of Y[]. If both are equal, we add it to the output list and move to next elements in both arrays. Otherwise, we move to the next element in the array that includes the smaller element. Think!

Solution steps

  1. We define an output list intersection[] to store common elements.
  2. Now we sort both arrays in increasing order. We can use efficient sorting algorithms like heap sort or quick sort to sort the array in O(nlogn) time.
  3. Before starting two pointers loop, we initialize the pointers i and j i.e. i = 0 and j = 0
  4. Now we run a loop while i < m and j < n:
    • If X[i] == Y[j], we add X[i] to the output list intersection[]and increment pointers i and j by 1.
    • If X[i] < Y[j], we increment i by 1.
    • If X[i] > Y[j], we increment j by 1.
  5. By the end of loop, we return the output list intersection[].

Solution pseudocode

vector<int> findIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection
    heapSort(X, m)
    heapSort(Y, n)
    int i = 0, j = 0
    while (i < m && j < n)
    {
        if (X[i] == Y[j])
        { 
            intersection.push_back(X[i])
            i = i + 1
            j = j + 1
        }    
        else if (X[i] < Y[j])
            i = i + 1
        else
            j = j + 1
    }
    return intersection
}

Solution analysis

  • We are using heap sort for the implementation.
  • Inside the loop, we increment pointer i or j or both by 1, based on the comparison with X[i] and Y[j]. In the worst case, it will access each element of X and Y once. Time complexity of this loop = O(m) + O(n) = O(m + n)
  • Overall time complexity = Time complexity of sorting X[] using heap sort + Time complexity of sorting Y[] using heap sort + Time complexity of two pointers loop = O(mlogm) + O(nlogn) + O(n + m) = O(mlogm + nlogn). If m > n, mlogm > nlogn and time complexity will be O(mlogm).
  • Space complexity = Space complexity of heap sort + Space complexity of two pointers loop = O(1) + O(1) = O(1).

For m > n, if we compare time complexity of this approach with the previous approach, then it is not better! The idea is: We have improved the time complexity of searching after sorting both the arrays, but, here, time complexity is dominated by the sorting algorithm, which is O(mlogm). The time complexity of above approach is O(mlogn) which is better than O(mlogm). Think!

Efficient approach  using a hash table

Solution idea

We can improve the time complexity of searching further using hash table because it can perform searching in O(1) time average. The idea is to insert all elements of Y[] in hash table and search each element X[i]. If element X[i] is present in the hash table, we add it to the output list.

Solution pseudocode

vector<int> findIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection
    HashTable H
    for (int i = 0; i < n; i = i + 1)
        H.insert(Y[i])
    for (int i = 0; i < m; i = i + 1)
    {
        if (H.search(X[i]) == true)
            intersection.push_back(X[i])
    }
    return intersection
}

Solution analysis

Time complexity = Time complexity of inserting n elements of Y[] in hash table + Time complexity of searching m elements of X[] in hash table = n * O(1) + m * O(1)) = O(m + n). Space complexity = O(n), for storing values in hash table. 

Critical ideas to think!

  • Do we need to modify the above algorithms if elements are repeated?
  • What would be the time and space complexity, if the hash table is created for the larger array X[] instead of the smaller array Y[]?
  • Can we think to use some other data structures to solve this problem? Is self-balancing BST useful?
  • In the 2nd approach, we sorted the smaller array. Can we solve this problem by sorting the larger array? How does the time complexity change?
  • In the 3rd approach, why are we incrementing pointers when X[i] > Y[j] or X[i] < Y[j] ?
  • In the 3rd approach, what would be the worst and best-case scenario of finding the common elements using the two-pointers loop?

Suggested coding problems to solve

Enjoy learning, Enjoy algorithms!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.