Find Intersection of Two Arrays

Difficulty: Medium, Asked-In: Google, Facebook

Key takeaway: An excellent problem to learn time complexity optimization using various approaches. The two-pointer and hash table solutions are intuitive and worth exploring.

Let’s understand the problem

Given two unsorted arrays X[] and Y[] of size m and n, write a program to find the intersection of these two arrays.

  • Suppose m > n and elements in both arrays are distinct.
  • The intersection is a list of common elements present in both arrays.
  • The elements in the output can be in any order.

Example 1

Input: X[] = [3, 4, 62, 8, 5, 9], Y[] = [632, 7, 5, 1]

Output: [3, 6, 2, 5]

Explanation: Common elements are 3, 6, 2, and 5. So the intersection of both arrays is 3, 6, 2, and 5.

Example 2

Input: X[] = [3, 4, 6, 7, 10, 12, 5], Y[] = [7, 11, 18, 15, 12]

Output: [7, 12]

Explanation: Common elements are 7 and 12. So the intersection of both arrays is 7 and 12.

Example 3

Input: X[] = [3, 4, 6, 10, 5], Y[] = [7, 11, 18, 15, 12]

Output: There are no common elements. So the output is an empty array.

Discussed solution approaches

  • Brute force approach using nested loops
  • Using sorting and binary search
  • Using sorting and two pointers
  • Efficient approach using hash table

Brute force approach using nested loops

Solution idea

A basic idea would be to pick each element from X[] and search linearly in Y[]. We add an element to the output list if it is present in both X[] and Y[].

  • We run two nested loops: an outer loop from i = 0 to m - 1 to pick each element X[i], and an inner loop from j = 0 to n - 1 to search for X[i] linearly in Y[].
  • Whenever we find an element that is common to both X[] and Y[], i.e., X[i] == Y[j], we add either one of them to the output list, intersection.
  • By the end of the nested loops, we return the output list intersection[].

Solution code C++

vector<int> arrayIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection;
    for (int i = 0; i < m; i = i + 1)
    {
        for (int j = 0; j < n; j = j + 1)
        {
            if (X[i] == Y[j])
            {
                intersection.push_back(X[i]);
                break;
            }
        }
    }
    return intersection;
}

Solution code Java

List<Integer> arrayIntersection(int[] X, int[] Y, int m, int n) 
{
    List<Integer> intersection = new ArrayList<>();
    for (int i = 0; i < m; i = i + 1) 
    {
        for (int j = 0; j < n; j = j + 1) 
        {
            if (X[i] == Y[j]) 
            {
                intersection.add(X[i]);
                break;
            }
        }
    }
    return intersection;
}

Solution code Python

def arrayIntersection(X, Y, m, n):
    intersection = []
    for i in range(m):
        for j in range(n):
            if X[i] == Y[j]:
                intersection.append(X[i])
                break
    return intersection

Solution analysis

We are running two nested loops and performing O(1) operations at each iteration. In the worst case, the inner loop will run n times. So the time complexity is O(m) * O(n) * O(1) = O(mn). We are using constant extra space, so the space complexity is O(1). But why are we not considering the output list intersection as part of space complexity? Think!

Using sorting and binary search

Solution idea

The critical question is: How can we improve the time complexity of the above approach? Searching is an essential operation in the above code. Can we think of ways to improve the efficiency of searching for element X[i] in Y[]? One idea would be simple: If we sort the array Y[], we can search each element X[i] using binary search. This could help to improve time complexity because binary search takes O(logn) time.

Solution steps

  • We declare an output list, intersection, to store common elements.
  • Now we sort the array Y[] in increasing order. We can use O(nlogn) sorting algorithms like heap sort, quick sort, or merge sort.
  • We run a loop from i = 0 to m - 1 and search for each element X[i] in the sorted array Y[] using binary search. Whenever we find an element common to both X[] and Y[], i.e., X[i] == Y[j], we add it to the output list, intersection[].
  • By the end of the loop, we return the output list, intersection[].

Solution code C++

int binarySearch(int Y[], int n, int target)
{
    int l = 0, r = n;
    while (l <= r)
    {
        int mid = l + (r - l) / 2;
        if (Y[mid] == target)
            return mid;
        else if (Y[mid] > target)
            r = mid - 1;
        else
            l = mid + 1;
    }
    return -1;
}

vector<int> arrayIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection;
    sort(Y, Y + n); // sort the array Y
    for (int i = 0; i < m; i = i + 1)
    {
        if (binarySearch(Y, n, X[i]) != -1)
            intersection.push_back(X[i]);
    }
    return intersection;
}

Solution code Java

int binarySearch(int[] Y, int n, int target)
{
    int l = 0, r = n;
    while (l <= r)
    {
        int mid = l + (r - l) / 2;
        if (Y[mid] == target)
            return mid;
        else if (Y[mid] > target)
            r = mid - 1;
        else
            l = mid + 1;
    }
    return -1;
}

Vector<Integer> arrayIntersection(int[] X, int[] Y, int m, int n)
{
    Vector<Integer> intersection = new Vector<Integer>();
    Arrays.sort(Y); // sort the array Y
    for (int i = 0; i < m; i = i + 1)
    {
        if (binarySearch(Y, n, X[i]) != -1)
            intersection.add(X[i]);
    }
    return intersection;
}

Solution code Python

def binarySearch(Y: List[int], n: int, target: int) -> int:
    l, r = 0, n
    while l <= r:
        mid = l + (r - l) // 2
        if Y[mid] = target:
            return mid
        elif Y[mid] > target:
            r = mid - 1
        else:
            l = mid + 1
    return -1

def arrayIntersection(X: List[int], Y: List[int], m: int, n: int) -> List[int]:
    intersection = []
    Y = sorted(Y) # sort the array Y
    for i in range(m):
        if binarySearch(Y, n, X[i]) != -1:
            intersection.append(X[i])
    return intersection

Solution analysis

We are using heap sort and iterative binary search for the implementation. The time complexity is the time complexity of sorting array Y[] using heap sort plus the time complexity of searching for each element X[i] in Y[] using binary search. This gives us O(nlogn) + m * O(logn) = O(nlogn + mlogn). If m > n, then mlogn > nlogn, and the time complexity would be O(mlogn).

The space complexity is the space complexity of heap sort plus the space complexity of the iterative binary search, which is O(1) + O(1) = O(1). What would be the space complexity if we use merge sort and recursive binary search for the implementation? Think!

Using sorting and two pointers

Solution idea

In the above approach, the overall time complexity of searching is O(mlogn). The critical question is: Can we improve the time complexity of searching further? Here's an idea: If we sort both arrays, we can apply the two-pointers approach, similar to the merging procedure of merge sort, to find common elements. Think!

After sorting both arrays, the idea would be to start comparing one element of X[] with one element of Y[]. If both are equal, we add them to the output list and move to the next elements in both arrays. Otherwise, we move to the next element in the array that includes the smaller element. Think!

Solution steps

  1. We define an output list intersection to store common elements.
  2. Now we sort both arrays in increasing order. We can use efficient sorting algorithms like heap sort or quick sort to sort the array in O(nlogn) time.
  3. Before starting two pointers loop, we initialize the pointers i and j i.e. i = 0 and j = 0
  4. Now we run a loop while i < m and j < n:

    • If X[i] == Y[j], we add X[i] to the output list intersection[]and increment pointers i and j by 1.
    • If X[i] < Y[j], we increment i by 1.
    • If X[i] > Y[j], we increment j by 1.
  5. By the end of loop, we return the output list intersection[].

Solution code C++

vector<int> arrayIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection;
    sort(X, X + m);
    sort(Y, Y + n);
    int i = 0, j = 0;
    while (i < m && j < n)
    {
        if (X[i] == Y[j])
        {
            intersection.push_back(X[i]);
            i = i + 1;
            j = j + 1;
        }
        else if (X[i] < Y[j])
            i = i + 1;
        else
            j = j + 1;
    }

    return intersection;
}

Solution code Java

List<Integer> arrayIntersection(int[] X, int[] Y, int m, int n) 
{
    List<Integer> intersection = new ArrayList<>();
    Arrays.sort(X);
    Arrays.sort(Y);
    int i = 0, j = 0;
    while (i < m && j < n) 
    {
        if (X[i] == Y[j]) 
        {
            intersection.add(X[i]);
            i = i + 1;
            j = j + 1;
        } 
        else if (X[i] < Y[j])
            i = i + 1;
        else
            j = j + 1;
    }
    return intersection;
}

Solution code Python

def arrayIntersection(X: List[int], Y: List[int], m: int, n: int) -> List[int]:
    intersection = []
    X.sort()
    Y.sort()
    i, j = 0, 0
    while i < m and j < n:
        if X[i] == Y[j]:
            intersection.append(X[i])
            i = i + 1
            j = j + 1
        elif X[i] < Y[j]:
            i = i + 1
        else:
            j = j + 1

    return intersection

Solution analysis

Suppose we are using heap sort for the implementation.

  • Inside the loop, we increment pointer i or j or both by 1, based on the comparison with X[i] and Y[j]. In the worst case, it will access each element of X and Y once. The time complexity of this loop = O(m) + O(n) = O(m + n).
  • The overall time complexity = Time complexity of sorting X[] using heap sort + Time complexity of sorting Y[] using heap sort + Time complexity of two pointers loop = O(mlogm) + O(nlogn) + O(n + m) = O(mlogm + nlogn). If m > n, mlogm > nlogn and time complexity will be O(mlogm).
  • Space complexity = Space complexity of heap sort + Space complexity of two pointers loop = O(1) + O(1) = O(1).

For m > n, if we compare the time complexity of this approach with the previous approach, then it is not better! The idea is: We have improved the time complexity of searching after sorting both arrays, but here, the time complexity is dominated by the sorting algorithm, which is O(mlogm). The time complexity of the above approach is O(mlogn), which is better than O(mlogm). Think!

An efficient approach using a hash table

Solution idea

We can improve the time complexity of searching further using a hash table because it can perform searching in O(1) time on average. The idea is to insert all elements of Y[] into the hash table and search for each element X[i]. If element X[i] is present in the hash table, we add it to the output list.

Solution code C++

vector<int> arrayIntersection(int X[], int Y[], int m, int n)
{
    vector<int> intersection;
    unordered_set<int> H;

    for (int i = 0; i < n; i = i + 1)
        H.insert(Y[i]);

    for (int i = 0; i < m; i = i + 1)
    {
        if (H.count(X[i]))
            intersection.push_back(X[i]);
    }
    
    return intersection;
}

Solution code Java

List<Integer> arrayIntersection(int[] X, int[] Y, int m, int n) 
{
    List<Integer> intersection = new ArrayList<>();
    HashSet<Integer> H = new HashSet<>();

    for (int i = 0; i < n; i = i + 1)
        H.add(Y[i]);

    for (int i = 0; i < m; i = i + 1) 
    {
        if (H.contains(X[i]))
            intersection.add(X[i]);
    }

    return intersection;
}

Solution code Python

def arrayIntersection(X: List[int], Y: List[int], m: int, n: int) -> List[int]:
    intersection = []
    H = set()

    for i in range(n):
        H.add(Y[i])

    for i in range(m):
        if X[i] in H:
            intersection.append(X[i])

    return intersection

Solution analysis

Time complexity = Time complexity of inserting n elements of Y[] in hash table + Time complexity of searching m elements of X[] in hash table = n * O(1) + m * O(1)) = O(m + n). Space complexity = O(n), for storing values in hash table. 

Critical ideas to think!

  • Do we need to modify the above algorithms if elements are repeated?
  • What would be the time and space complexity, if the hash table is created for the larger array X[] instead of the smaller array Y[]?
  • Can we think to use some other data structures to solve this problem? Is self-balancing BST useful?
  • In the 2nd approach, we sorted the smaller array. Can we solve this problem by sorting the larger array? How does the time complexity change?
  • In the 3rd approach, why are we incrementing pointers when X[i] > Y[j] or X[i] < Y[j] ?
  • In the 3rd approach, what would be the worst and best-case scenario of finding the common elements using the two-pointers loop?

Suggested coding problems to solve

If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!

More From EnjoyAlgorithms

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.